hardreal_timeNEWAI
Design ChatGPT (Streaming Chat with Branching & Share-link)
Interview Prep mode active
Interview answer templates and key talking points. Switch the mode in the header to change your experience.
Asked In
OpenAIAnthropicPerplexityCursorGitHubNotionLinearVercel
Key Challenges
- ·Token-by-token streaming that feels instant — the user must see characters appearing as the model generates them
- ·A conversation tree, not a flat list — every regenerate creates a sibling branch under the same parent message
- ·A Stop button that actually stops the model server, not just hides the UI
- ·Share-links that capture a frozen snapshot of one branch, viewable by anyone without the full chat history
- ·Resilience over flaky mobile networks — partial responses must be recoverable when the connection drops
Key Takeaways
- ✓ChatGPT is a tree of messages, not a list. A regenerate creates a sibling under the same parent — exactly like a git branch.
- ✓SSE is the right transport for one-way LLM streaming. WebSockets give you bidirectional capability you don't need for text chat.
- ✓AbortController must thread through fetch → response.body → reader.read so a Stop click actually tears down the TCP connection. The server must detect close to stop generating tokens you'll never receive.
- ✓Render every 16ms (one frame) instead of every token. Token batching trades a tiny bit of perceived speed for huge layout/Markdown re-parse savings.
- ✓A share-link is a snapshot of one path through the tree, frozen at a point in time. Different problem from /chats/:id — read-only, cacheable, no auth required.
- ✓TTFT (time-to-first-token) is the metric users feel. Mask it with an immediate typing indicator, then optimize for inter-token latency.