Engine & Streaming
Overview
The Flow Builder engine is a single-loop state machine that processes one user message at a time. Understanding how it works under the hood helps you build faster, more efficient flows — especially for voice, where every millisecond of latency matters.
The Processing Loop
When a user sends a message (or a call connects), the engine runs a single pass:
- Receive message — The user's input is added to the conversation history.
- Process current node — The engine executes the node the conversation is currently on.
- Chain through silent nodes — If the result is a non-conversation node (Variable, Logic, Extraction, Function, Request), the engine processes it and follows the output edge automatically.
- Stop at Conversation or End — The loop stops when it reaches a Conversation node (which generates a response) or an End node (which terminates the session).
Key insight: Only Conversation nodes produce user-facing responses. All other nodes execute silently. The engine can chain through multiple Variable → Logic → Request → Extraction nodes in a single turn, completely transparently to the user.
Processing Order by Node Type
| Node Type | What Happens | Blocks Execution? |
|---|---|---|
| Conversation | LLM call with system prompt + conversation history. Generates response and evaluates transitions. | Yes — waits for LLM |
| Tool/Function | Executes HTTP tool call, stores result in variables | Yes — waits for API response |
| Extraction | LLM extracts structured data into variables | Yes — waits for LLM |
| Variable | Sets variable values synchronously | No — instant |
| Logic | Evaluates branch conditions, picks a path | No — instant |
| Request | Makes HTTP request, stores response | Yes (if await: true) |
| End | Fires webhook if configured, returns end message | No |
Sync vs Async Paths
Every node with an async handle (bottom, yellow) can fork execution:
- Main path (output/transition edges) — Synchronous. The user waits for the entire chain to complete before hearing a response.
- Async path (async edge) — Parallel. Nodes connected via async handles run in the background. The conversation continues without waiting.
When to Use Async
| Scenario | Path | Why |
|---|---|---|
| Look up appointment availability | Sync | Next response depends on the result |
| Log call data to CRM | Async | User doesn't need to wait |
| Extract caller info for analytics | Async | Data isn't needed immediately |
| Check account balance before responding | Sync | Response content depends on the value |
Conversation History
The engine maintains a shared conversation history across all nodes in the flow. Every Conversation node reads from and writes to the same history.
- Cap: 40 messages maximum. Older messages are trimmed when the cap is reached.
- Shared: When a transition moves to a new Conversation node, that node sees the full history — it's a continuous conversation, not isolated segments.
Streaming
Flow Builder uses Server-Sent Events (SSE) to stream responses in real-time. This is critical for voice — the text-to-speech engine starts speaking as soon as the first tokens arrive.
SSE Event Types
| Event | Description |
|---|---|
| token | Individual token chunks as the LLM generates a response |
| filler | Speak-during-execution text from Tool/Function nodes |
| clear | Discard partial response — signals a transition is happening. The TTS engine stops speaking the old response and prepares for the new one. |
| tool_calls | Tool/function call detected by the LLM (transition or function execution) |
| done | Turn complete. Contains final state: current node, variables, etc. |
| error | Error occurred during processing |
The Clear Event
The clear event is key to smooth voice transitions. When the LLM starts generating a response but then decides to transition (via tool call), the engine sends a clear event. This tells the voice platform to stop speaking the partial response and wait for the new node's output. Without this, callers would hear a cut-off sentence followed by the new response.
LLM Resolution Order
When a node needs to make an LLM call, the engine determines which model to use by checking (in order):
- Node override — If the specific node has an LLM model configured
- Flow default — The model set at the flow level
- Legacy field — Backward-compatible model field
- Environment variables — Server-level defaults
- Fallback —
gpt-4o-mini
Supported providers: OpenAI, Anthropic, and OpenRouter (all accessed via the OpenAI SDK format).
Safety Limits
| Limit | Value | Purpose |
|---|---|---|
| Max transitions per turn | 10 | Prevents infinite loops between nodes |
| Max tool call rounds | 5 | Limits recursive tool execution |
| Conversation history cap | 40 messages | Controls context window size and cost |
| Request timeout | 10 seconds | Prevents hung API calls from blocking the flow |
Important: If your flow hits the 10-transition limit in a single turn, it means there's likely a loop in your node connections. Check for Logic nodes or transitions that cycle back without a Conversation node in between.
Cost Estimation
Each turn typically involves:
- 1 LLM call for the Conversation node (prompt + history + response)
- 0-1 LLM calls for Extraction nodes (if in the sync path)
- 0-1 LLM calls for speak-during-execution filler (prompt mode only)
Because transitions happen via tool calls within the Conversation node's LLM call (not as separate calls), Flow Builder is significantly more token-efficient than architectures that require separate LLM calls for routing decisions.