AI Streaming Chat Window

React35 minmediumFreeNewAI

Prompt

Streaming chat is the signature UI of the AI era — every product from ChatGPT to Perplexity to Cursor renders bot replies token-by-token as they arrive. The reason isn't just aesthetics. Streaming dramatically improves perceived latency: a 4-second response that starts rendering at 200ms feels faster than a 1-second response that arrives all at once.

Build a chat interface that progressively renders an assistant message as it streams in. Include a Send button that becomes a Stop button while a response is streaming, a typing indicator that appears before the first token, and an auto-scroll behavior that respects the user — if they've scrolled up to read, don't yank them back to the bottom.

The architectural insight is separating three orthogonal concerns: token aggregation (state mutation as chunks arrive), cancellation (clean teardown via a ref flag or AbortController), and scroll discipline (a pinned-to-bottom mode that releases when the user takes scroll control). Conflating any two will produce subtle bugs.

Requirements

→Render assistant messages progressively, not all at once
→Show a typing indicator (animated dots) before the first token arrives
→Send button becomes a Stop button while streaming; Stop halts the stream
→Auto-scroll to bottom only when the user is already near the bottom
→Show a 'Jump to latest' pill when the user has scrolled up during streaming
→Enter sends the message; Shift+Enter inserts a newline
→Empty input cannot be submitted

Example

Loading preview...

For the best coding experience, we recommend using a desktop device.

Preparing Sandbox...

SolutionRead-only · Live Preview

Technical Explanation

How I'd Think About This Problem

Streaming chat looks like one feature but is really three independent systems pretending to cooperate: a producer emitting tokens over time, a reducer aggregating those tokens into message state, and a viewport that renders the message and reacts to scroll position. Most candidates conflate the three — they write a single useEffect that fetches, renders, and scrolls — and then everything breaks the moment the user does something unexpected like scrolling up while a reply is streaming, or hitting Stop, or backgrounding the tab.

Before I write a line, I draw three boxes on the whiteboard: Source → State → View. The source emits tokens. The state aggregates them into a message. The view renders and reacts to scroll. Each arrow is a contract — and most bugs in chat UIs come from violating those contracts. Once you internalize this separation, the rest of the question becomes mechanical.

The Cancellation Pattern — Why Refs Beat State

// ❌ The trap that catches juniors every single time
const [cancelled, setCancelled] = useState(false);
const stream = async () => {
  for await (const chunk of source) {
    if (cancelled) return;  // STALE — captures the value at function creation
    setMessage(m => m + chunk);
  }
};

// ✅ What you actually want
const cancelRef = useRef(false);
const stream = async () => {
  for await (const chunk of source) {
    if (cancelRef.current) return;  // reads the latest, always
    setMessage(m => m + chunk);
  }
};

Closures in JavaScript freeze their captured variables at the moment the function was defined. By the time your tenth setTimeout fires, the cancelled variable inside that callback is whatever it was 300ms ago — not what state has now. Refs sidestep this because ref.current is read fresh on every access. Same lesson applies to intervals, websocket handlers, drag listeners, and any other long-lived async work.

The mental shortcut to internalize: "If a value is read inside an async loop, it must live in a ref or be passed as an argument." State is for rendering. Refs are for control flow. Mixing them is the #1 source of "works in dev, breaks in prod" bugs in React apps.

Real Streaming, Not Mock Streaming

The mock above uses setTimeout to simulate token arrival. In production you read from a fetch ReadableStream with Server-Sent Events. The shape every senior frontend engineer should be able to write from memory:

const ctrl = new AbortController();
const res = await fetch('/api/chat', {
  method: 'POST',
  body: JSON.stringify({ messages }),
  signal: ctrl.signal,
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  if (cancelRef.current) { ctrl.abort(); break; }
  const { value, done } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const events = buffer.split('\n\n');
  buffer = events.pop();  // last is incomplete — save for next iteration
  for (const evt of events) {
    const data = evt.replace(/^data: /, '');
    if (data === '[DONE]') return;
    const parsed = JSON.parse(data);
    setMessage(m => m + (parsed.delta || ''));
  }
}

Two non-obvious things matter here. First, TextDecoder with { stream: true } handles UTF-8 boundaries correctly — without it, multibyte characters can split across chunks and you render garbage on emoji or non-Latin text. Second, the buffer/split('\n\n') dance is essential because chunks can split in the middle of an SSE frame; you have to keep the partial trailing frame for the next iteration. Skipping this gives you intermittent JSON.parse errors that only happen under network load.

Auto-Scroll: The Hardest Part of the Whole Problem

Naive auto-scroll on every message looks fine for 30 seconds and then becomes the most-hated feature in your product. The user scrolls up to copy a code snippet, a token arrives, you yank them back to the bottom, they swear at you. Every chat UI lives or dies on getting this right.

The pattern that actually works is pinned-to-bottom: track whether the user is currently within ~60px of the bottom. If they are, auto-scroll on new content. If they're not, leave them alone — but show a "Jump to latest" pill so they can opt back in.

const onScroll = () => {
  const el = scrollRef.current;
  const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight;
  setPinned(distanceFromBottom < 60);
};

useEffect(() => {
  if (pinned) scrollRef.current?.scrollTo({ top: scrollRef.current.scrollHeight });
}, [messages, pinned]);

The 60px threshold is not magic — it's a buffer for two reasons. iOS momentum scrolling continues for ~200ms after touch-up, so strict equality (scrollTop === scrollHeight - clientHeight) flickers between pinned/unpinned. And subpixel rounding on retina displays means "exactly at the bottom" is rarely literally equal. Pick a number bigger than your largest line-height and you're done.

⚠ Common Pitfall: Calling scrollTo from inside the streaming reducer

If you call scrollTo directly inside your token-aggregation callback (not in an effect), you fight the user's scroll input every token — each chunk cancels their drag. Always do scroll work inside useEffect with messages as a dep. The effect runs after React commits the DOM, when the layout is consistent.

⚠ Common Pitfall: Allowing the user to send while streaming

If the input isn't disabled during streaming, fast users will hit Enter again, your component will fire a second fetch, and now you have two streams writing into possibly-the-same message state. Disable the textarea (or queue the next message) while streaming === true. This is also why we tie Send/Stop to the same button slot — it's stateful, not parallel.

Backpressure: When Tokens Outpace React

Real LLM APIs deliver chunks well below 60fps, so a setState-per-token approach is fine for most cases. But if you connect to a model running on dedicated hardware (Groq, custom inference), or render multiple concurrent conversations on one screen, you can hit a regime where token arrival outpaces React's commit. Symptoms: dropped frames, jittery scroll, the typing indicator stuttering.

The fix is to batch token writes onto animation frames:

const pending = useRef('');
const rafId = useRef(null);

const onToken = (chunk) => {
  pending.current += chunk;
  if (rafId.current) return;
  rafId.current = requestAnimationFrame(() => {
    setMessage(m => m + pending.current);
    pending.current = '';
    rafId.current = null;
  });
};

This caps state updates at one per frame regardless of how fast tokens arrive. Combined with React 18's automatic batching, you'll never drop a frame on user-input scroll. Don't add this from day one. Measure first. Premature optimization is a senior-interview red flag — the right move is to mention you'd profile under load and apply this if the trace shows long tasks.

The Things That Will Bite You In Production

Network blip mid-stream — connection drops, you have half a message. Your UI should mark it as incomplete (italic, a small "interrupted" badge) rather than silently leaving partial text. Optionally offer "Retry from here" with the partial as context.
Tab visibility — when the tab is backgrounded, browsers throttle setTimeout/rAF. Your stream may stall and resume. Listen for visibilitychange and show a subtle indicator that the response is paused.
aria-live for accessibility — without aria-live="polite" on the assistant message container, screen readers won't announce streamed content. Set it on the bot message wrapper specifically (not the whole list, otherwise every user message also re-announces).
Memory growth — long conversations hold the entire transcript in state. At ~50KB per message and 200 messages you're at 10MB of React-tracked strings, plus your reconciler walking that array on every keystroke. Virtualize the message list (react-virtuoso) once you cross ~50 turns.
Race on rapid sends — user sends, hits Stop, sends again before the first stream's cleanup runs. Track the in-flight stream's id; only apply chunks if their id matches the latest. Otherwise late chunks from the cancelled stream will append to the new message.

ℹ Interview Tip — The Senior-vs-Junior Tell

The single biggest senior signal on this question is whether you bring up scroll discipline before the interviewer asks. Junior candidates ship a working stream and look surprised when the interviewer says "now scroll up while it's responding." Senior candidates pre-empt: "I want to make sure I respect the user's scroll position — let me add pinned-to-bottom logic up front." That one sentence reframes the rest of the conversation from "can you build it" to "have you shipped this before."

Interview Criteria

Progressive rendering

Tokens append visibly; no all-at-once flash

Cancellation correctness

Stop halts the stream cleanly; uses ref not state

Smart auto-scroll

Sticks to bottom only if user was already there

Typing indicator

Shows during the wait before first token

Input UX

Enter to send, Shift+Enter newline, disabled while streaming

Time Checkpoints

0–3 min

Clarify scope: streaming source, cancellation, scroll behavior

3–8 min

Scaffold message list, input form, message data shape

8–14 min

Implement streaming via setTimeout tick + token append

14–18 min

Add cancellation ref + Stop button toggle

18–24 min

Add scroll-aware auto-scroll + Jump-to-latest pill

24–28 min

Polish: typing indicator, keyboard shortcuts, edge cases