How I'd Think About This Problem
Streaming chat looks like one feature but is really three independent systems pretending to cooperate: a producer emitting tokens over time, a reducer aggregating those tokens into message state, and a viewport that renders the message and reacts to scroll position. Most candidates conflate the three — they write a single useEffect that fetches, renders, and scrolls — and then everything breaks the moment the user does something unexpected like scrolling up while a reply is streaming, or hitting Stop, or backgrounding the tab.
Before I write a line, I draw three boxes on the whiteboard: Source → State → View. The source emits tokens. The state aggregates them into a message. The view renders and reacts to scroll. Each arrow is a contract — and most bugs in chat UIs come from violating those contracts. Once you internalize this separation, the rest of the question becomes mechanical.
The Cancellation Pattern — Why Refs Beat State
// ❌ The trap that catches juniors every single time
const [cancelled, setCancelled] = useState(false);
const stream = async () => {
for await (const chunk of source) {
if (cancelled) return; // STALE — captures the value at function creation
setMessage(m => m + chunk);
}
};
// ✅ What you actually want
const cancelRef = useRef(false);
const stream = async () => {
for await (const chunk of source) {
if (cancelRef.current) return; // reads the latest, always
setMessage(m => m + chunk);
}
};
Closures in JavaScript freeze their captured variables at the moment the function was defined. By the time your tenth setTimeout fires, the cancelled variable inside that callback is whatever it was 300ms ago — not what state has now. Refs sidestep this because ref.current is read fresh on every access. Same lesson applies to intervals, websocket handlers, drag listeners, and any other long-lived async work.
The mental shortcut to internalize: "If a value is read inside an async loop, it must live in a ref or be passed as an argument." State is for rendering. Refs are for control flow. Mixing them is the #1 source of "works in dev, breaks in prod" bugs in React apps.
Real Streaming, Not Mock Streaming
The mock above uses setTimeout to simulate token arrival. In production you read from a fetch ReadableStream with Server-Sent Events. The shape every senior frontend engineer should be able to write from memory:
const ctrl = new AbortController();
const res = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ messages }),
signal: ctrl.signal,
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
if (cancelRef.current) { ctrl.abort(); break; }
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const events = buffer.split('\n\n');
buffer = events.pop(); // last is incomplete — save for next iteration
for (const evt of events) {
const data = evt.replace(/^data: /, '');
if (data === '[DONE]') return;
const parsed = JSON.parse(data);
setMessage(m => m + (parsed.delta || ''));
}
}
Two non-obvious things matter here. First, TextDecoder with { stream: true } handles UTF-8 boundaries correctly — without it, multibyte characters can split across chunks and you render garbage on emoji or non-Latin text. Second, the buffer/split('\n\n') dance is essential because chunks can split in the middle of an SSE frame; you have to keep the partial trailing frame for the next iteration. Skipping this gives you intermittent JSON.parse errors that only happen under network load.
Auto-Scroll: The Hardest Part of the Whole Problem
Naive auto-scroll on every message looks fine for 30 seconds and then becomes the most-hated feature in your product. The user scrolls up to copy a code snippet, a token arrives, you yank them back to the bottom, they swear at you. Every chat UI lives or dies on getting this right.
The pattern that actually works is pinned-to-bottom: track whether the user is currently within ~60px of the bottom. If they are, auto-scroll on new content. If they're not, leave them alone — but show a "Jump to latest" pill so they can opt back in.
const onScroll = () => {
const el = scrollRef.current;
const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight;
setPinned(distanceFromBottom < 60);
};
useEffect(() => {
if (pinned) scrollRef.current?.scrollTo({ top: scrollRef.current.scrollHeight });
}, [messages, pinned]);
The 60px threshold is not magic — it's a buffer for two reasons. iOS momentum scrolling continues for ~200ms after touch-up, so strict equality (scrollTop === scrollHeight - clientHeight) flickers between pinned/unpinned. And subpixel rounding on retina displays means "exactly at the bottom" is rarely literally equal. Pick a number bigger than your largest line-height and you're done.
⚠ Common Pitfall: Calling scrollTo from inside the streaming reducer
If you call scrollTo directly inside your token-aggregation callback (not in an effect), you fight the user's scroll input every token — each chunk cancels their drag. Always do scroll work inside useEffect with messages as a dep. The effect runs after React commits the DOM, when the layout is consistent.
⚠ Common Pitfall: Allowing the user to send while streaming
If the input isn't disabled during streaming, fast users will hit Enter again, your component will fire a second fetch, and now you have two streams writing into possibly-the-same message state. Disable the textarea (or queue the next message) while streaming === true. This is also why we tie Send/Stop to the same button slot — it's stateful, not parallel.
Backpressure: When Tokens Outpace React
Real LLM APIs deliver chunks well below 60fps, so a setState-per-token approach is fine for most cases. But if you connect to a model running on dedicated hardware (Groq, custom inference), or render multiple concurrent conversations on one screen, you can hit a regime where token arrival outpaces React's commit. Symptoms: dropped frames, jittery scroll, the typing indicator stuttering.
The fix is to batch token writes onto animation frames:
const pending = useRef('');
const rafId = useRef(null);
const onToken = (chunk) => {
pending.current += chunk;
if (rafId.current) return;
rafId.current = requestAnimationFrame(() => {
setMessage(m => m + pending.current);
pending.current = '';
rafId.current = null;
});
};
This caps state updates at one per frame regardless of how fast tokens arrive. Combined with React 18's automatic batching, you'll never drop a frame on user-input scroll. Don't add this from day one. Measure first. Premature optimization is a senior-interview red flag — the right move is to mention you'd profile under load and apply this if the trace shows long tasks.
The Things That Will Bite You In Production
- Network blip mid-stream — connection drops, you have half a message. Your UI should mark it as incomplete (italic, a small "interrupted" badge) rather than silently leaving partial text. Optionally offer "Retry from here" with the partial as context.
- Tab visibility — when the tab is backgrounded, browsers throttle setTimeout/rAF. Your stream may stall and resume. Listen for
visibilitychange and show a subtle indicator that the response is paused.
- aria-live for accessibility — without
aria-live="polite" on the assistant message container, screen readers won't announce streamed content. Set it on the bot message wrapper specifically (not the whole list, otherwise every user message also re-announces).
- Memory growth — long conversations hold the entire transcript in state. At ~50KB per message and 200 messages you're at 10MB of React-tracked strings, plus your reconciler walking that array on every keystroke. Virtualize the message list (react-virtuoso) once you cross ~50 turns.
- Race on rapid sends — user sends, hits Stop, sends again before the first stream's cleanup runs. Track the in-flight stream's id; only apply chunks if their id matches the latest. Otherwise late chunks from the cancelled stream will append to the new message.
ℹ Interview Tip — The Senior-vs-Junior Tell
The single biggest senior signal on this question is whether you bring up scroll discipline before the interviewer asks. Junior candidates ship a working stream and look surprised when the interviewer says "now scroll up while it's responding." Senior candidates pre-empt: "I want to make sure I respect the user's scroll position — let me add pinned-to-bottom logic up front." That one sentence reframes the rest of the conversation from "can you build it" to "have you shipped this before."