Connecting Voice (Telnyx)

Overview

Flow Builder at the Flow Builder supports voice deployment through Telnyx. You can connect a Telnyx phone number to your flow so that inbound callers interact with your AI agent over a phone call. The integration uses an OpenAI-compatible realtime API format, making it straightforward to configure.

How It Works

When a call comes in on your Telnyx number, Telnyx connects to Flow Builder's voice endpoint. The flow engine processes the caller's speech in real time, runs through your nodes (transitions, extractions, API calls), and streams the agent's responses back as audio via text-to-speech.

The architecture:

  1. Telnyx receives the inbound call and forwards it to your flow's voice endpoint.
  2. Flow Builder's engine processes each turn: speech-to-text, node execution, LLM call, response generation.
  3. Responses are streamed back as Server-Sent Events (SSE) for low-latency text-to-speech playback.
  4. The conversation continues until an End node is reached or the caller hangs up.

Setting Up the Connection

  1. Create a Telnyx account and purchase a phone number at telnyx.com.
  2. In your Telnyx portal, configure the phone number's webhook URL to point to your flow's voice endpoint.
  3. The endpoint follows an OpenAI-compatible sub-path format. Your flow's deploy settings in the Flow Builder provide the exact URL to use.
  4. Set the webhook method to POST.
  5. Save the configuration in Telnyx.

Once configured, calls to your Telnyx number are routed directly to your flow.

OpenAI-Compatible Sub-Paths

Flow Builder exposes voice endpoints that follow the OpenAI realtime API format. This compatibility means any platform that supports OpenAI-style voice integrations can connect to your flow with minimal configuration changes. Telnyx's media streaming works natively with this format.

Eager Streaming

Voice flows use eager streaming to minimize latency. As soon as the LLM begins generating tokens, they are sent to the text-to-speech engine immediately. The caller starts hearing the response before the full message is generated.

This is critical for natural-sounding phone conversations. Without eager streaming, callers would experience noticeable pauses between their question and the agent's response. With it, the agent begins speaking within hundreds of milliseconds of the LLM starting its output.

The clear SSE event handles cases where the LLM starts speaking but then transitions to a different node. The TTS engine stops the partial response and begins the new one seamlessly.

Session Handling

Each phone call creates a unique session. The session persists for the duration of the call and includes:

  • Conversation history — all turns between the caller and agent, capped at 40 messages
  • Flow variables — data extracted or set during the call
  • Current node state — which node the conversation is on

If you have a variables webhook configured, it fires at the start of the call before the first node executes. Use this to pre-load caller data from your CRM based on the caller's phone number.

Voice-Specific Considerations

ConsiderationRecommendation
Model selectionUse fast models (gpt-4o-mini, claude-haiku) for low latency
Prompt lengthKeep prompts concise to reduce LLM processing time
Static messagesUse static mode for greetings and disclaimers to skip the LLM call entirely
Tool callsEnable speak-during-execution filler to avoid dead air
InterruptionsUse blockInterruptions on nodes with critical information

Troubleshooting

IssueSolution
Calls connect but no audioVerify the webhook URL in Telnyx matches your flow's voice endpoint
Long pauses between turnsSwitch to a faster LLM model and shorten prompts
Agent does not respondConfirm your API key is configured in flow settings
Call drops immediatelyCheck Telnyx webhook logs for connection errors

Important: Test your flow thoroughly using the built-in test drawer before connecting it to a live phone number. Voice debugging is harder than chat debugging.


Was this article helpful?