Engine & Streaming

Overview

The Flow Builder engine is a single-loop state machine that processes one user message at a time. Understanding how it works under the hood helps you build faster, more efficient flows — especially for voice, where every millisecond of latency matters.

The Processing Loop

When a user sends a message (or a call connects), the engine runs a single pass:

Receive message — The user's input is added to the conversation history.
Process current node — The engine executes the node the conversation is currently on.
Chain through silent nodes — If the result is a non-conversation node (Variable, Logic, Extraction, Function, Request), the engine processes it and follows the output edge automatically.
Stop at Conversation or End — The loop stops when it reaches a Conversation node (which generates a response) or an End node (which terminates the session).

Key insight: Only Conversation nodes produce user-facing responses. All other nodes execute silently. The engine can chain through multiple Variable → Logic → Request → Extraction nodes in a single turn, completely transparently to the user.

Processing Order by Node Type

Node Type	What Happens	Blocks Execution?
Conversation	LLM call with system prompt + conversation history. Generates response and evaluates transitions.	Yes — waits for LLM
Tool/Function	Executes HTTP tool call, stores result in variables	Yes — waits for API response
Extraction	LLM extracts structured data into variables	Yes — waits for LLM
Variable	Sets variable values synchronously	No — instant
Logic	Evaluates branch conditions, picks a path	No — instant
Request	Makes HTTP request, stores response	Yes (if `await: true`)
End	Fires webhook if configured, returns end message	No

Sync vs Async Paths

Every node with an async handle (bottom, yellow) can fork execution:

Main path (output/transition edges) — Synchronous. The user waits for the entire chain to complete before hearing a response.
Async path (async edge) — Parallel. Nodes connected via async handles run in the background. The conversation continues without waiting.

When to Use Async

Scenario	Path	Why
Look up appointment availability	Sync	Next response depends on the result
Log call data to CRM	Async	User doesn't need to wait
Extract caller info for analytics	Async	Data isn't needed immediately
Check account balance before responding	Sync	Response content depends on the value

Conversation History

The engine maintains a shared conversation history across all nodes in the flow. Every Conversation node reads from and writes to the same history.

Cap: 40 messages maximum. Older messages are trimmed when the cap is reached.
Shared: When a transition moves to a new Conversation node, that node sees the full history — it's a continuous conversation, not isolated segments.

Streaming

Flow Builder uses Server-Sent Events (SSE) to stream responses in real-time. This is critical for voice — the text-to-speech engine starts speaking as soon as the first tokens arrive.

SSE Event Types

Event	Description
token	Individual token chunks as the LLM generates a response
filler	Speak-during-execution text from Tool/Function nodes
clear	Discard partial response — signals a transition is happening. The TTS engine stops speaking the old response and prepares for the new one.
tool_calls	Tool/function call detected by the LLM (transition or function execution)
done	Turn complete. Contains final state: current node, variables, etc.
error	Error occurred during processing

The Clear Event

The clear event is key to smooth voice transitions. When the LLM starts generating a response but then decides to transition (via tool call), the engine sends a clear event. This tells the voice platform to stop speaking the partial response and wait for the new node's output. Without this, callers would hear a cut-off sentence followed by the new response.

LLM Resolution Order

When a node needs to make an LLM call, the engine determines which model to use by checking (in order):

Node override — If the specific node has an LLM model configured
Flow default — The model set at the flow level
Legacy field — Backward-compatible model field
Environment variables — Server-level defaults
Fallback — gpt-4o-mini

Supported providers: OpenAI, Anthropic, and OpenRouter (all accessed via the OpenAI SDK format).

Safety Limits

Limit	Value	Purpose
Max transitions per turn	10	Prevents infinite loops between nodes
Max tool call rounds	5	Limits recursive tool execution
Conversation history cap	40 messages	Controls context window size and cost
Request timeout	10 seconds	Prevents hung API calls from blocking the flow

Important: If your flow hits the 10-transition limit in a single turn, it means there's likely a loop in your node connections. Check for Logic nodes or transitions that cycle back without a Conversation node in between.

Cost Estimation

Each turn typically involves:

1 LLM call for the Conversation node (prompt + history + response)
0-1 LLM calls for Extraction nodes (if in the sync path)
0-1 LLM calls for speak-during-execution filler (prompt mode only)

Because transitions happen via tool calls within the Conversation node's LLM call (not as separate calls), Flow Builder is significantly more token-efficient than architectures that require separate LLM calls for routing decisions.