hardreal_timeNEWAI

Design ChatGPT (Streaming Chat with Branching & Share-link)

Interview Prep mode active

Interview answer templates and key talking points. Switch the mode in the header to change your experience.

Asked In

OpenAIAnthropicPerplexityCursorGitHubNotionLinearVercel

Key Challenges

·Token-by-token streaming that feels instant — the user must see characters appearing as the model generates them
·A conversation tree, not a flat list — every regenerate creates a sibling branch under the same parent message
·A Stop button that actually stops the model server, not just hides the UI
·Share-links that capture a frozen snapshot of one branch, viewable by anyone without the full chat history
·Resilience over flaky mobile networks — partial responses must be recoverable when the connection drops

Key Takeaways

✓ChatGPT is a tree of messages, not a list. A regenerate creates a sibling under the same parent — exactly like a git branch.
✓SSE is the right transport for one-way LLM streaming. WebSockets give you bidirectional capability you don't need for text chat.
✓AbortController must thread through fetch → response.body → reader.read so a Stop click actually tears down the TCP connection. The server must detect close to stop generating tokens you'll never receive.
✓Render every 16ms (one frame) instead of every token. Token batching trades a tiny bit of perceived speed for huge layout/Markdown re-parse savings.
✓A share-link is a snapshot of one path through the tree, frozen at a point in time. Different problem from /chats/:id — read-only, cacheable, no auth required.
✓TTFT (time-to-first-token) is the metric users feel. Mask it with an immediate typing indicator, then optimize for inter-token latency.