AdvancedAI9 min read

Shipping AI Features That Survive Real Users

Loading states, error boundaries, retry semantics, observability. The stuff that separates a demo from a product.

The gap between an AI demo and an AI product is mostly invisible work. The demo handles the happy path beautifully. The product handles the timeouts, the rate limits, the malformed responses, the user who closed their laptop mid-stream, the regex that suddenly fails because the model emitted markdown fences, the cold-start latency that makes the first request feel broken, the third-party API that returns 503s on Wednesday afternoons.

None of this is glamorous. All of it is what users actually experience. Let me walk you through the patterns I reach for whenever a feature has to survive contact with real traffic.

01Skeleton states, not spinners

Spinners are a placeholder for "I have no idea what's happening." Skeletons are a promise: "here is roughly the shape of the content that's about to appear." Users feel fundamentally different about the two. Spinners feel like waiting; skeletons feel like loading.

For AI features, skeletons are even more important because waits are longer (200ms to 8 seconds is a wide range). Three guidelines:

Match the shape — if the answer will be three paragraphs, render three paragraph-shaped placeholder bars. Don't show a giant grey rectangle.
Animate gently — a slow shimmer is fine. A spinning circle inside a skeleton is over-egging it.
Don't show skeletons under 100ms — flash-of-skeleton is more jarring than a brief blank state. Use a setTimeout to delay skeleton appearance.

02Error states that don't lose user work

The cardinal sin of error handling in AI features: rolling back the user's typed message when the server returns 500. They typed it. They sent it. They watched it appear. Then it disappears. They have to retype. They quietly leave.

The pattern that respects user work:

Keep the user's message visible.
Mark the assistant placeholder as errored — distinct color, small icon.
Add a "Retry" affordance directly under the failed message.
If you know the failure was transient (rate limit, 503), surface that in plain language: "Server's overloaded. Try again in a sec."

The user's typed message is sacred. Treat it that way.

03Retries with backoff (and a circuit breaker)

Transient failures — rate limits, 503s, network blips — are common in AI APIs. Don't make the user retry by hand. Implement automatic retries with exponential backoff: first retry after 500ms, second after 1s, third after 2s, fourth after 4s. Cap at 4 retries.

async function withRetry(fn, attempts = 4) {
  for (let i = 0; i < attempts; i++) {
    try {
      return await fn();
    } catch (err) {
      if (i === attempts - 1 || !isTransient(err)) throw err;
      await new Promise((r) => setTimeout(r, 500 * Math.pow(2, i)));
    }
  }
}

A subtle one: if your service has been down for 30 seconds and every user is retrying, you'll dogpile when it comes back. Add jitter to your backoff (random 0–500ms added to each delay) so retries spread out instead of all hitting at once. This is called a "circuit breaker with jitter" — borrowed from backend reliability practice.

04Observability — log what users feel, not just what servers do

Backend logs tell you when a request failed. They don't tell you when a request succeeded but felt slow, when streaming stalled but didn't error, when a user clicked Stop after 4 seconds and didn't try again. Frontend observability captures the user-perceived experience.

Bare minimum to instrument:

Time to first token — the most important AI metric. If this regresses, users notice.
Time to last token — total response time.
Stop clicks per session — high stop rate often means slow responses or wrong answers.
Retries per request — climbing retries means your provider is degrading.
Error rate by error type — distinguish rate limits from real errors from network blips.

Send these to Sentry, Datadog, PostHog, or whatever you use. Build a dashboard. Look at it weekly. The first time a regression shows up in this dashboard before users start tweeting, you'll know it was worth the work.

05The cost dimension — your users will eventually ask

AI features cost money per request. Eventually, your finance team or your product manager will ask "how much are we spending on this?" Eventually, your users will ask "how much have I used this month?" Build the rails for both questions while it's cheap.

Surface token usage in cost dashboards. The API returns it; pipe it through.
If you have usage caps, show progress toward them ("18 / 50 chats today") before the user hits the limit.
Cache aggressively. Prompt caching, response caching, semantic caching. Every cache hit is free.
If you let users pick a model, surface the cost difference. Sonnet vs Opus is a 5x cost gap; users will gladly pick Sonnet for routine queries if you tell them.

ℹ Interview signal

Bringing up cost-awareness unprompted is rare and immediately marks you as someone who's shipped AI features in production. Most candidates think only about latency and quality. Cost is the third dimension.

Key Takeaways

01Skeletons that match shape beat spinners. Delay skeleton appearance under 100ms to avoid flash-of-skeleton.
02Never roll back the user's typed message on error. Mark the assistant response as errored and offer retry inline.
03Exponential backoff with jitter handles transient failures without dogpiling on recovery.
04Frontend observability — TTFT, TTLT, stop rate, retry rate, error class — captures user experience that backend logs miss.
05Cost is the third dimension after latency and quality. Surface usage and cache aggressively before someone asks.