A
Back to work

[02] · Settled (TripFix) · Apr — May 2026

Cursor Cloud Agent v1 — conversation timeline rebuild

A flight recorder for cloud agents. Stitches prompt, thinking, and tool calls into a single replayable timeline — so any agent run is auditable in under a minute. Design call: optimise for the operator first, not the model.

Cursor Cloud Agent APISSE streamingLivewirePostgreSQL

The problem

Cloud agents are powerful and opaque. When one of ours got a customer's case wrong, ops humans had to dig through a JSON blob to figure out what happened. That's the kind of debt that quietly kills agentic systems.

[01]

Design goal

An agent run should be auditable in under sixty seconds.

The v0 view was a flat log of API events. Helpful for engineers, useless for ops. I rebuilt it around a single question:“Can a non-technical operator see, in one screen, why the agent did what it did?”

  • Prompts and thinking shown inline, not buried in JSON
  • Tool calls expanded as cards with inputs, outputs, latency
  • One reply per agent “turn” — threaded chronologically
  • Failures highlighted with the exact line that triggered them

[02]

Stream-stitching

Three live SSE streams, one coherent narrative.

Cursor’s cloud-agent surfaces three separate streams — prompt deltas, thinking deltas, and tool-call events. They arrive interleaved, out of order, and at very different rates.

I designed the timeline as an append-only event logthat reconciles streams by causal links, not wall-clock. Every event carries its parent turn, so reorderings — or replays — produce the same coherent view.

[03]

Knock-on benefit

The eval harness rides on top of it.

Once every run is a reconciled timeline, evaluation gets easy. The judge harness reads the same event log a human would, scores the agent’s reasoning chain, and writes its verdict back into the same timeline.

Operators can now scroll to “why did this run fail the citation check?” and see the exact span the judge flagged. Debugging and grading became the same workflow.

The takeaway

If your AI org can't show you, in 60 seconds, why the agent did what it did — you don't have observability. You have a JSON log.

Next case study

Agentic checklist →