Engine & Streaming

Overview

The Flow Builder engine is a single-loop state machine that processes one user message at a time. Understanding how it works under the hood helps you build faster, more efficient flows — especially for voice, where every millisecond of latency matters.

The Processing Loop

When a user sends a message (or a call connects), the engine runs a single pass:

  1. Receive message — The user's input is added to the conversation history.
  2. Process current node — The engine executes the node the conversation is currently on.
  3. Chain through silent nodes — If the result is a non-conversation node (Variable, Logic, Extraction, Function, Request), the engine processes it and follows the output edge automatically.
  4. Stop at Conversation or End — The loop stops when it reaches a Conversation node (which generates a response) or an End node (which terminates the session).

Key insight: Only Conversation nodes produce user-facing responses. All other nodes execute silently. The engine can chain through multiple Variable → Logic → Request → Extraction nodes in a single turn, completely transparently to the user.

Processing Order by Node Type

Node TypeWhat HappensBlocks Execution?
ConversationLLM call with system prompt + conversation history. Generates response and evaluates transitions.Yes — waits for LLM
Tool/FunctionExecutes HTTP tool call, stores result in variablesYes — waits for API response
ExtractionLLM extracts structured data into variablesYes — waits for LLM
VariableSets variable values synchronouslyNo — instant
LogicEvaluates branch conditions, picks a pathNo — instant
RequestMakes HTTP request, stores responseYes (if await: true)
EndFires webhook if configured, returns end messageNo

Sync vs Async Paths

Every node with an async handle (bottom, yellow) can fork execution:

  • Main path (output/transition edges) — Synchronous. The user waits for the entire chain to complete before hearing a response.
  • Async path (async edge) — Parallel. Nodes connected via async handles run in the background. The conversation continues without waiting.

When to Use Async

ScenarioPathWhy
Look up appointment availabilitySyncNext response depends on the result
Log call data to CRMAsyncUser doesn't need to wait
Extract caller info for analyticsAsyncData isn't needed immediately
Check account balance before respondingSyncResponse content depends on the value

Conversation History

The engine maintains a shared conversation history across all nodes in the flow. Every Conversation node reads from and writes to the same history.

  • Cap: 40 messages maximum. Older messages are trimmed when the cap is reached.
  • Shared: When a transition moves to a new Conversation node, that node sees the full history — it's a continuous conversation, not isolated segments.

Streaming

Flow Builder uses Server-Sent Events (SSE) to stream responses in real-time. This is critical for voice — the text-to-speech engine starts speaking as soon as the first tokens arrive.

SSE Event Types

EventDescription
tokenIndividual token chunks as the LLM generates a response
fillerSpeak-during-execution text from Tool/Function nodes
clearDiscard partial response — signals a transition is happening. The TTS engine stops speaking the old response and prepares for the new one.
tool_callsTool/function call detected by the LLM (transition or function execution)
doneTurn complete. Contains final state: current node, variables, etc.
errorError occurred during processing

The Clear Event

The clear event is key to smooth voice transitions. When the LLM starts generating a response but then decides to transition (via tool call), the engine sends a clear event. This tells the voice platform to stop speaking the partial response and wait for the new node's output. Without this, callers would hear a cut-off sentence followed by the new response.

LLM Resolution Order

When a node needs to make an LLM call, the engine determines which model to use by checking (in order):

  1. Node override — If the specific node has an LLM model configured
  2. Flow default — The model set at the flow level
  3. Legacy field — Backward-compatible model field
  4. Environment variables — Server-level defaults
  5. Fallbackgpt-4o-mini

Supported providers: OpenAI, Anthropic, and OpenRouter (all accessed via the OpenAI SDK format).

Safety Limits

LimitValuePurpose
Max transitions per turn10Prevents infinite loops between nodes
Max tool call rounds5Limits recursive tool execution
Conversation history cap40 messagesControls context window size and cost
Request timeout10 secondsPrevents hung API calls from blocking the flow

Important: If your flow hits the 10-transition limit in a single turn, it means there's likely a loop in your node connections. Check for Logic nodes or transitions that cycle back without a Conversation node in between.

Cost Estimation

Each turn typically involves:

  • 1 LLM call for the Conversation node (prompt + history + response)
  • 0-1 LLM calls for Extraction nodes (if in the sync path)
  • 0-1 LLM calls for speak-during-execution filler (prompt mode only)

Because transitions happen via tool calls within the Conversation node's LLM call (not as separate calls), Flow Builder is significantly more token-efficient than architectures that require separate LLM calls for routing decisions.


Was this article helpful?