Connecting Voice (Telnyx)
Overview
Flow Builder at the Flow Builder supports voice deployment through Telnyx. You can connect a Telnyx phone number to your flow so that inbound callers interact with your AI agent over a phone call. The integration uses an OpenAI-compatible realtime API format, making it straightforward to configure.
How It Works
When a call comes in on your Telnyx number, Telnyx connects to Flow Builder's voice endpoint. The flow engine processes the caller's speech in real time, runs through your nodes (transitions, extractions, API calls), and streams the agent's responses back as audio via text-to-speech.
The architecture:
- Telnyx receives the inbound call and forwards it to your flow's voice endpoint.
- Flow Builder's engine processes each turn: speech-to-text, node execution, LLM call, response generation.
- Responses are streamed back as Server-Sent Events (SSE) for low-latency text-to-speech playback.
- The conversation continues until an End node is reached or the caller hangs up.
Setting Up the Connection
- Create a Telnyx account and purchase a phone number at telnyx.com.
- In your Telnyx portal, configure the phone number's webhook URL to point to your flow's voice endpoint.
- The endpoint follows an OpenAI-compatible sub-path format. Your flow's deploy settings in the Flow Builder provide the exact URL to use.
- Set the webhook method to POST.
- Save the configuration in Telnyx.
Once configured, calls to your Telnyx number are routed directly to your flow.
OpenAI-Compatible Sub-Paths
Flow Builder exposes voice endpoints that follow the OpenAI realtime API format. This compatibility means any platform that supports OpenAI-style voice integrations can connect to your flow with minimal configuration changes. Telnyx's media streaming works natively with this format.
Eager Streaming
Voice flows use eager streaming to minimize latency. As soon as the LLM begins generating tokens, they are sent to the text-to-speech engine immediately. The caller starts hearing the response before the full message is generated.
This is critical for natural-sounding phone conversations. Without eager streaming, callers would experience noticeable pauses between their question and the agent's response. With it, the agent begins speaking within hundreds of milliseconds of the LLM starting its output.
The clear SSE event handles cases where the LLM starts speaking but then transitions to a different node. The TTS engine stops the partial response and begins the new one seamlessly.
Session Handling
Each phone call creates a unique session. The session persists for the duration of the call and includes:
- Conversation history — all turns between the caller and agent, capped at 40 messages
- Flow variables — data extracted or set during the call
- Current node state — which node the conversation is on
If you have a variables webhook configured, it fires at the start of the call before the first node executes. Use this to pre-load caller data from your CRM based on the caller's phone number.
Voice-Specific Considerations
| Consideration | Recommendation |
|---|---|
| Model selection | Use fast models (gpt-4o-mini, claude-haiku) for low latency |
| Prompt length | Keep prompts concise to reduce LLM processing time |
| Static messages | Use static mode for greetings and disclaimers to skip the LLM call entirely |
| Tool calls | Enable speak-during-execution filler to avoid dead air |
| Interruptions | Use blockInterruptions on nodes with critical information |
Troubleshooting
| Issue | Solution |
|---|---|
| Calls connect but no audio | Verify the webhook URL in Telnyx matches your flow's voice endpoint |
| Long pauses between turns | Switch to a faster LLM model and shorten prompts |
| Agent does not respond | Confirm your API key is configured in flow settings |
| Call drops immediately | Check Telnyx webhook logs for connection errors |
Important: Test your flow thoroughly using the built-in test drawer before connecting it to a live phone number. Voice debugging is harder than chat debugging.