EngineeringJune 29, 20265 min read

A filler phrase is a promise. Most voice agents break it.

'Let me check that for you' means work is happening. When a voice agent says it reflexively, before every turn, callers learn within three exchanges that the agent's words don't mean anything. The rules we ship for when an agent is allowed to speak a filler, and when silence is the honest answer.

VVorel EngineeringEngineeringLast updated July 5, 2026

Call any voice agent demo in 2026 and count how many turns begin with 'Great question!' or 'Let me look into that for you.' On most of them the answer is: all of the turns, including the ones where the agent is about to say 'we close at 6 PM,' a fact it did not need to look up, from a lookup it did not perform.

This is latency masking, and everyone in the industry knows it. The model takes one to three seconds to think, dead air feels broken on a phone call, so vendors paper over the gap with a canned acknowledgment fired the instant the caller stops talking. As an engineering trick it works: perceived latency drops, callers stop saying 'hello? are you there?' As a conversation design, it fails in a way that is harder to see on a dashboard and impossible to miss on a call.

A filler is a speech act

When a human agent says 'let me check that for you,' they are not making noise. They are reporting a fact: I am now going to the system to look something up. The sentence carries information. The caller relaxes because they know where they are in the interaction: request received, work in progress, answer coming.

When a voice agent says the same sentence before every single turn, including turns where nothing is being checked, the sentence stops carrying information. Callers are not stupid. By the third exchange they have learned that 'let me check that' means nothing, which means they can no longer tell when the agent actually is checking something. You have spent your one reliable signal for 'work in progress' on turns that had no work in them, and now the turns that do have work in them feel exactly as arbitrary as the ones that don't.

A filler phrase spent on a turn with no work in it devalues every filler on the turns that have work. Callers learn the words are noise within three exchanges.

There is a second-order cost too. Reflexive fillers make the agent sound like an agent. The 'Absolutely, I'd be happy to help with that!' opener is now so strongly associated with AI phone systems that it functions as a disclosure. Some callers hang up at that point. The ones who stay have recalibrated their expectations downward, and every subsequent stumble confirms the read.

The rules we ship

We rebuilt our filler behavior around one principle: the agent may only claim to be doing work when work is actually happening. In practice that decomposes into five rules.

One: a filler fires only when a tool call is in flight. The pipeline knows, at the moment the model commits to a lookup, that the caller is about to experience a pause. That is the moment a filler is honest, and it is the only moment. A turn that answers from context gets no filler; the answer itself arrives fast enough to be its own acknowledgment.

Two: the filler is specific to the work. 'Let me check Thursday afternoon for you' beats 'one moment please' because it proves the agent heard the request. Specificity is also self-enforcing: you can only generate a specific filler from the actual tool arguments, which means the filler cannot fire on a turn where there is nothing to be specific about. A generic filler can lie. A grounded one can't.

Three: phrasing varies. Humans do not say the identical sentence four times in one call. The second lookup gets 'checking that now,' the third gets a simple 'one second.' Repetition is one of the strongest robot tells on a phone call, and it is one of the cheapest to fix.

Four: fillers never stack. If the lookup runs long, the agent does not fill the silence with a second reassurance, because two fillers in a row reads as stalling, which is exactly what it is. Past a threshold the honest move is to name the delay once ('this is taking a little longer than usual') and then be quiet until there is an answer.

Five: on turns where the model can begin answering quickly, silence wins. A 700-millisecond pause before a direct, correct answer feels like a person thinking. A zero-millisecond 'Great question!' followed by the same answer feels like a machine covering. We stopped treating sub-second pauses as a problem to be masked and started treating them as normal conversational rhythm, which they are.

Measure it like a latency metric

The reason reflexive fillers survive in production is that nobody measures them. Latency has a dashboard; filler behavior has a vibe. So we log filler rate per turn type the same way we log time-to-first-word. On turns with a tool call, a filler is usually correct, and the interesting metric is whether it was specific. On turns without a tool call, the target filler rate is zero, and every filler above zero is a regression with a diff attached.

This also changes how you evaluate the tradeoff against raw speed. A team that has driven easy-turn latency under a second discovers that the filler on those turns was never buying anything: the answer was going to arrive before the caller felt the gap. Fillers are load-bearing only on the slow turns, which is convenient, because the slow turns are exactly the ones where a filler is honest.

What to listen for on a demo call

If you are evaluating voice vendors, this is one of the fastest quality probes available and it requires no dashboard access. Ask the agent something it must look up, then something it must not (its own opening hours, right after it told you). Count the fillers. An agent that says 'let me check that' both times is telling you the words are decoration. An agent that checks the first time, answers plainly the second time, and names what it is checking, is telling you someone built the conversation and not just the pipeline.

Fillers are a small surface. That is exactly why they are diagnostic: a vendor who sweats the honesty of a three-word acknowledgment has, in our experience, also sweated the parts you cannot hear.

Read the full guide