ProductJune 28, 20266 min read

Why we don't stream the agent's reasoning live.

Every CX-AI vendor now ships a 'thinking…' panel that narrates the agent's reasoning token by token. We don't, on purpose. A live reasoning stream is a UX pattern borrowed from slow agents: by the time you finish reading the first line, our agent has already acted and moved to the next turn. The reasoning still gets recorded. It just lands where it is useful, on the durable audit row, not in a panel that vanishes when you close the tab.

VVorel EngineeringEngineeringLast updated May 24, 2026

Open any CX-AI demo in 2026 and you will see the same flourish. The agent receives a question and a little panel lights up: 'Looking up the customer's account… Checking the calendar… The customer asked about Thursday, so I should confirm availability… Drafting a reply…' The reasoning scrolls past, token by token, like a confession. It looks like transparency. Buyers love it in the demo. We don't ship it, and the reason is not that we are hiding anything.

The reason is speed. A live reasoning stream is a UX pattern that was designed for slow agents, and it stops making sense the moment the agent gets fast. We think the pattern is going to age badly, the way a spinning hourglass aged badly, and we would rather not build a product around a waiting-room animation.

A reasoning panel is a progress bar wearing a lab coat

Ask what the panel is actually for. It is not for you to read every word; nobody reads the chain of thought in real time, and the agent does not wait for you to finish before it acts. The panel is there to occupy the four to eight seconds the agent spends thinking, so the screen does not sit blank while a frontier model grinds through a long reply. It is a progress bar. It just dressed up as the agent's inner life, because 'thinking…' feels more trustworthy than a spinner. The longer the turn, the more the vendor needs that panel. The dependency runs exactly the wrong way: the slower the agent, the better the reasoning theater looks.

A live reasoning panel is a progress bar wearing a lab coat. The slower the agent, the better the theater looks.

Our turns finish before you can read the stream

We have spent most of the last two months driving turn latency down, because in this category latency is the product. Our turns are fast: the median lands near a second, and even the turns that have to reason and call a tool come back inside a couple of seconds. That number is the whole point of this post, because it changes what a live reasoning stream would feel like. Here is a typical turn on a wall clock:

turn begins
  read your message              ~0 ms
  decide + start tool call     ~400 ms    <- a "thinking..." line would appear here
  tool round-trip              ~300 ms
  compose the reply            ~400 ms
  reply rendered              ~1100 ms    <- you are still on line one of the stream
next turn begins

By the time your eyes finish 'Looking up the customer's account,' the agent has looked it up, decided what to do, written the answer, and the next turn is already rendering. The stream you are reading is a recording of a decision that is already finished. It is not a window into what the agent is doing; it is a slow-motion replay of what it just did, shown to you as if it were live. For a slow agent that gap is hidden, because the reply has not landed yet. For a fast agent the gap is the joke. We are not going to add latency on purpose so the confession has time to land.

By the time you've read 'looking up the account,' the agent has looked it up, acted, and started the next turn.

In voice there is no panel at all

On a phone call the question answers itself. There is no screen, so streaming the reasoning means the agent talks through its inner monologue: 'Okay, you asked about Thursday, let me think, I should probably check the calendar…' That is not transparency on a call. That is filler, and it is the exact failure mode we wrote about when we built context-aware fillers: the agent should cover a real pause with one short honest line while it genuinely looks something up, not narrate a stream of fake deliberation to fill the air. A caller does not want to hear the agent think. They want the answer, and they want it to be right.

The streamed reasoning is also not the audit you think it is

There is a subtler problem, and it matters most to the buyers who care about the panel for the right reasons. The text that scrolls in a live reasoning stream is a narration the model produces, not a verified account of why the action happened. It reads like the cause, but it is generated alongside the action, and it can be confident and wrong at the same time. Treating that stream as the audit trail means trusting a sentence the model wrote about itself, displayed once, and then gone when the tab closes. An auditor cannot do anything with a panel that disappeared. A compliance team cannot query a feeling of having watched the agent think.

Reasoning belongs on the audit row, not in a panel

So we do record the reasoning. We just put it where it is useful. Every action the agent takes writes a row into the operator's own system of record: the tool call in the action field, the reasoning in the explanation field, the human operator's name in the actor field, timestamped, replayable. That row outlives the session. It is queryable by the buyer's auditor without a vendor support ticket, it sits in the CRM the team already uses, and it does not vanish when someone navigates away. This is the operator-led, CRM-native architecture we have written about elsewhere, and the reasoning trace is one of the things it carries. Persisting the trace on the record is strictly more useful than streaming it past you once, because you can read it at your own pace, after the fact, when you actually have a reason to.

Trust is a record you can audit later, not a panel that is already wrong by the time you read it.

What we show instead

We are not arguing for a black box. We show the things that are true and durable: the action the agent took, the tool it called, the record it wrote, and the result, surfaced the moment the turn completes. On the live demo console you can watch tool calls and CRM writes land in real time, because those are events that actually happened, not a guess about what the model was feeling. The difference is between a feed of completed facts and a teleprompter of in-progress narration. One you can audit. The other you can only watch.

If you want to evaluate a vendor on this, the test takes a minute. Ask to see the reasoning for an action that happened ten minutes ago, in the buyer's own system of record, with the operator's name attached. A vendor whose reasoning lives in a live panel will pull up the dashboard and scroll; a vendor whose reasoning lives on the audit row will hand you the row. Streaming the chain of thought is a great demo and a poor record. We optimized for the record, and we made the agent fast enough that the demo did not need the panel.

All posts Book a demo

Why we don't stream the agent's reasoning live.

A reasoning panel is a progress bar wearing a lab coat

Our turns finish before you can read the stream

In voice there is no panel at all

The streamed reasoning is also not the audit you think it is

Reasoning belongs on the audit row, not in a panel

What we show instead

Streaming the first audible word. Why perceived latency beats actual latency in voice.

Voice latency is the LLM. Everything else is a rounding error.

Designing the audit row: what every voice agent should write back to your CRM.

The next call doesn’t have to go to voicemail.