Let me walk you through this the way I'd explain it to a teammate before an interview, because most people get this wrong by treating "streaming" as one thing. It isn't. There are at least four protocols that all let the server push data to the browser, and each one was built for a different problem. Picking the right one is a senior-level decision — and the AI era has made this question hot again because every chat product has to choose.
The four contenders
1. Chunked Transfer Encoding (HTTP/1.1) — the original. It's not really a separate protocol; it's a feature of HTTP that lets the server send a response in pieces without declaring the total Content-Length upfront. The body is split into chunks, each prefixed with its size in hex. When the server sends a zero-length chunk, the response is over. This is the primitive every higher-level streaming protocol is built on top of.
2. Server-Sent Events (SSE) — HTTP plus a convention. The server sets Content-Type: text/event-stream, then writes lines like data: hello\n\n. The browser's EventSource API parses these and fires events. It's one-directional (server → client), it auto-reconnects on network failures, and it works over plain HTTP/1.1 or HTTP/2.
3. WebSockets — a full protocol upgrade. The connection starts as HTTP, then both sides switch to a binary frame protocol over the same TCP socket. After that, either side can send messages anytime. Bidirectional, low-overhead, but you've left HTTP behind — load balancers, CDNs, and proxies treat it differently.
4. HTTP/2 Server Push — let me get this out of the way: it's effectively dead. Chrome removed it in 2022. It was designed to let servers push resources (CSS, JS) ahead of the browser asking, not to stream application data. If anyone brings this up in an interview, the right answer is "deprecated for the use case you'd actually want it for — most people use 103 Early Hints now." Don't reach for it.
So in practice the choice is between SSE, WebSockets, and chunked transfer.
The intuition: ChatGPT chose SSE — why?
When OpenAI shipped ChatGPT, they used SSE. Anthropic's Claude, the same. So did GitHub Copilot Chat, Perplexity, and basically every major chat product. Why not WebSockets, which sounds more "real-time"?
Three reasons, all of which an interviewer wants to hear you reason through:
Reason 1: The data flow is fundamentally one-directional. When you type a prompt, you send one HTTP request. The server then streams back tokens. The user doesn't interrupt with mid-stream messages — at most they hit Stop, which can be a separate DELETE request or just an AbortController on the client. WebSockets give you bidirectional capability you don't actually use.
Reason 2: SSE is just HTTP. It runs through every CDN, every load balancer, every reverse proxy without special configuration. It works with HTTP/2 multiplexing. It survives corporate firewalls that block WebSockets. It uses the same auth cookies, the same CORS policy, the same caching headers. WebSockets opt out of all of this.
Reason 3: SSE has built-in reconnection. EventSource automatically retries on connection drops, and the server can send a Last-Event-ID so the client can resume where it left off. With WebSockets you write reconnection logic yourself, including the retry backoff and the resumption protocol.
The mental model: SSE is HTTP that doesn't end. That's it. If your data flow is "client asks once, server streams back," SSE is almost always the right answer.
When WebSockets actually win
WebSockets aren't a worse SSE — they're a different tool. Reach for them when:
- The user genuinely sends data while the server is sending data. Collaborative editing (Figma, Notion), multiplayer games, voice/video signaling, live trading dashboards where you push order updates to the server while it pushes price ticks down.
- You need binary frames. SSE is text-only (UTF-8). Sending audio chunks for voice mode, or compact binary protocol buffers, you'd reach for WebSockets.
- You're inside an existing real-time stack. If you're already running Phoenix Channels, ActionCable, or Socket.IO, the cost of adding "stream LLM responses" to that pipe is near zero.
OpenAI's Realtime API for voice uses WebSockets. The text Chat Completions API uses SSE. That's not arbitrary — voice is bidirectional and binary, text is one-shot and text-only.
Where chunked transfer fits in
Chunked transfer is the underlying transport. Both SSE and the lower-level "stream a fetch response" pattern use it under the hood. When you write:
1const res = await fetch('/api/generate');2const reader = res.body.getReader();3while (true) {4 const { value, done } = await reader.read();5 if (done) break;6 // value is a Uint8Array chunk7}
…you're consuming chunked transfer encoding directly. No SSE protocol layer, no data: prefixing, just raw bytes as they arrive. Vercel's AI SDK, the Anthropic and OpenAI SDKs, and most modern frameworks default to this pattern because it's the most flexible — you decide the wire format.
The trade-off: you lose SSE's auto-reconnection and the browser's built-in event parser. You write your own parser (NDJSON is the most common — newline-delimited JSON), and you handle reconnection yourself.
The decision matrix interviewers want
| Use case | Pick | Why |
|---|---|---|
| LLM chat (text) | SSE or fetch+ReadableStream | One-way, text-only, HTTP-friendly |
| LLM chat (voice / multimodal) | WebSockets | Bidirectional, binary frames |
| Real-time collab (Figma-style) | WebSockets | Bidirectional, low-latency |
| Server-to-client notifications | SSE | One-way, auto-reconnect for free |
| File upload progress | Neither — use fetch + ReadableStream on the request body | The streaming is upload-side |
| Push CSS/JS resources before request | 103 Early Hints | HTTP/2 Push is dead |
What interviewers actually probe
When you've laid this out, expect three follow-ups:
-
"How do you stop a stream from the client?" —
AbortController.abort()for fetch,EventSource.close()for SSE,socket.close()for WebSockets. There's a separate question dedicated to AbortController in this track because it composes nontrivially with the body stream. -
"What about head-of-line blocking?" — over HTTP/1.1, a stuck stream blocks other requests on the same connection. Over HTTP/2, multiplexing fixes this for SSE — multiple SSE connections share one TCP connection without blocking each other. WebSockets don't multiplex, so each one needs its own TCP connection.
-
"Why don't you just poll?" — you should be able to articulate that polling at 100ms gives you 5–10× the request volume of streaming, with worse latency (you only see new data on the next poll boundary). The trade-off is that polling is dead simple to implement and debug. For a status indicator that updates every few seconds, polling is fine. For LLM tokens at 30/sec, it's a non-starter.
The senior-level move is treating the protocol choice as a consequence of the data-flow shape, not an independent decision. If you can articulate "the data is one-way, so I want the simplest one-way primitive," the rest follows.