System Design

Design ChatGPT (Streaming Chat with Branching & Share-link)

hardreal_timeNEWAI

Design ChatGPT (Streaming Chat with Branching & Share-link)

Interview Prep mode active

Interview answer templates and key talking points. Switch the mode in the header to change your experience.

Asked In

OpenAIAnthropicPerplexityCursorGitHubNotionLinearVercel

Key Challenges

  • ·Token-by-token streaming that feels instant — the user must see characters appearing as the model generates them
  • ·A conversation tree, not a flat list — every regenerate creates a sibling branch under the same parent message
  • ·A Stop button that actually stops the model server, not just hides the UI
  • ·Share-links that capture a frozen snapshot of one branch, viewable by anyone without the full chat history
  • ·Resilience over flaky mobile networks — partial responses must be recoverable when the connection drops

Key Takeaways

  • ChatGPT is a tree of messages, not a list. A regenerate creates a sibling under the same parent — exactly like a git branch.
  • SSE is the right transport for one-way LLM streaming. WebSockets give you bidirectional capability you don't need for text chat.
  • AbortController must thread through fetch → response.body → reader.read so a Stop click actually tears down the TCP connection. The server must detect close to stop generating tokens you'll never receive.
  • Render every 16ms (one frame) instead of every token. Token batching trades a tiny bit of perceived speed for huge layout/Markdown re-parse savings.
  • A share-link is a snapshot of one path through the tree, frozen at a point in time. Different problem from /chats/:id — read-only, cacheable, no auth required.
  • TTFT (time-to-first-token) is the metric users feel. Mask it with an immediate typing indicator, then optimize for inter-token latency.