This publish is a collaboration between AWS and Pipecat.
Deploying clever voice brokers that keep pure, human-like conversations requires streaming to customers the place they’re, throughout internet, cell, and cellphone channels, even beneath heavy visitors and unreliable community circumstances. Even small delays can break the conversational stream, inflicting customers to understand the agent as unresponsive or unreliable. To be used circumstances similar to buyer help, digital assistants and outbound campaigns, a pure stream is crucial for person expertise. On this sequence of posts, you’ll find out how streaming architectures assist tackle these challenges utilizing Pipecat voice brokers on Amazon Bedrock AgentCore Runtime.
In Half 1, you’ll learn to deploy Pipecat voice brokers on AgentCore Runtime utilizing totally different community transport approaches together with WebSockets, WebRTC and telephony integration, with sensible deployment steerage and code samples.
Advantages of AgentCore Runtime for voice brokers
Deploying real-time voice brokers is difficult: you want low-latency streaming, strict isolation for safety, and the power to scale dynamically to unpredictable dialog quantity. With out an appropriately designed structure, you’ll be able to expertise audio jitter, scalability constraints, inflated prices attributable to over-provisioning, and elevated complexity. For a deeper dive into voice agent architectures, together with cascaded (STT → LLM → TTS) and speech-to-speech approaches check with our earlier publish, Constructing real-time voice assistants with Amazon Nova Sonic in comparison with cascading architectures.
Amazon Bedrock AgentCore Runtime addresses these challenges by offering a safe, serverless surroundings for scaling dynamic AI brokers. Every dialog session runs in remoted microVMs for safety. It auto-scales for visitors spikes, and handles steady periods for as much as 8 hours, making it ideally suited for lengthy, multi-turn voice interactions. It expenses just for assets actively used, serving to to attenuate prices related to idle infrastructure.
Pipecat, an agentic framework for constructing real-time voice AI pipelines, runs on AgentCore Runtime with minimal setup. Bundle your Pipecat voice pipeline as a container and deploy it on to AgentCore Runtime. The runtime helps bidirectional streaming for real-time audio, and built-in observability to hint agent reasoning and gear calls.
AgentCore Runtime requires ARM64 (Graviton) containers, so be sure that your Docker photographs are constructed for the linux/arm64 system.
Streaming architectures for voice brokers on AgentCore Runtime
This publish assumes your familiarity of widespread voice agent architectures: particularly the cascaded fashions method, the place you join speech-to-text (STT) and text-to-speech (TTS) fashions in a pipeline, and the speech-to-speech mannequin method, like Amazon Nova Sonic. In case you are new to those ideas, begin with our earlier weblog posts on the 2 foundational approaches: cascaded and speech-to-speech earlier than persevering with.
When constructing voice brokers, latency is a crucial consideration, figuring out how pure and dependable a voice dialog feels. Conversations require near-instant responses, usually beneath one second end-to-end, to keep up a fluid, human-like rhythm.
To realize low latency, you should contemplate bi-directional streaming on a number of paths, together with:
- Shopper to Agent: Your voice brokers will run on gadgets and functions, from internet browsers and cell apps to edge {hardware}, every with distinctive community circumstances.
- Agent to Mannequin: Your voice brokers depend on bidirectional streaming to work together with speech fashions. Most speech fashions expose real-time WebSocket APIs, which your agent runtime or orchestration framework can devour for audio enter and textual content or speech output. Mannequin choice performs a key function in reaching pure responsiveness. Choose fashions like Amazon Nova Sonic (or Amazon Nova Lite in a cascaded pipeline method) which are optimized for latency and gives a quick Time-to-First-Token (TTFT).
- Telephony: For conventional inbound or outbound calls dealt with via contact facilities or telephony techniques, your voice agent should additionally combine with a telephony supplier. That is usually achieved via a handoff and/or Session Interconnect Protocol (SIP) switch, the place the reside audio stream is transferred from the telephony system to your agent runtime for processing.
In Half 1 of this sequence, we’ll give attention to the Shopper to Agent connection and methods to decrease the first-hop community latency out of your edge system to your voice agent and discover further concerns in relation to different elements of voice agent structure.
For instance these ideas, we’ll discover 4 community transport approaches with concerns for:
- How customers interface together with your voice brokers (internet/cell functions or cellphone calls)
- Efficiency consistency and resilience throughout variable community circumstances
- Ease of implementation
Strategy
Description
Efficiency consistency
Ease of implementation
Appropriate for
WebSockets
Internet and cell functions connects on to your voice brokers by way of WebSockets.
Good
Easy
Prototyping and light-weight use circumstances.
WebRTC (TURN-assisted)
Internet and cell functions connects on to your voice brokers by way of WebRTC.
Glorious
Medium
Manufacturing use circumstances with latency derived from direct connection of the consumer to the runtime surroundings relayed by way of Traversal Utilizing Relays round NAT (TURN) servers.
WebRTC (managed)
Internet and cell functions connect with your voice brokers via a sophisticated, globally distributed infrastructure by way of WebRTC.
Glorious (International distribution)
Easy
Manufacturing use circumstances with latency optimization offloaded to specialised suppliers with globally distributed community and media relays. Presents further capabilities similar to observability and multi-participant calls.
Telephony
Voice brokers are accessed via conventional cellphone calls.
Glorious
Medium
Contact middle and telephony use circumstances. Latency could also be depending on telephony supplier.
Instance method: Utilizing WebSockets bi-directional streaming
You can begin with WebSockets as the best method: it natively helps most purchasers and AgentCore Runtime. Deploy Pipecat voice brokers on AgentCore Runtime utilizing persistent, bidirectional WebSocket connections for audio streaming between consumer gadgets and your agent logic.
The connection follows an easy three-step stream:
- Shopper requests a WebSocket endpoint: The consumer first sends a POST request to an middleman server (/server) to acquire a safe WebSocket connection endpoint.
- Middleman server handles AWS authentication: The middleman server on the Pipecat pre-built frontend makes use of the AWS SDK to generate an AWS SigV4 pre-signed URL with embedded credentials as question parameter. For instance: X?-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=
- Shopper establishes direct connection: Utilizing the authenticated pre-signed URL, the consumer connects on to the agent on AgentCore Runtime and streams bi-directional audio, bypassing the middleman server for subsequent communications.
You utilize Pipecat’s WebSocket transport to reveal an endpoint on the /ws path as required by AgentCore Runtime. The structure separates credential administration from agent logic for safe consumer entry with out exposing AWS credentials on to browser functions.
To be taught extra, strive the Pipecat on AgentCore code pattern utilizing WebSockets transport.
Instance method: Utilizing WebRTC bi-directional streaming with TURN help
Whereas WebSockets works for easy deployments, WebRTC can provide improved efficiency. It’s designed to ship audio utilizing a quick, light-weight community path that minimizes delay. It usually makes use of UDP for its low latency and smoother real-time expertise, and gives improved resilience throughout variable community circumstances. If UDP shouldn’t be accessible, WebRTC mechanically falls again to TCP, which is extra dependable however can introduce slight delays: much less ideally suited for voice, however useful when connectivity is restricted. This reliability comes from Interactive Connectivity Institution (ICE) servers, which negotiate direct peer-to-peer paths via NATs and firewalls, falling again to streaming media relay by way of Traversal Utilizing Relays round NAT (TURN) servers when direct connections can’t be made.
Pipecat helps SmallWebRTCTransport for direct peer-to-peer WebRTC connections between purchasers and brokers on AgentCore Runtime. In comparison with complete WebRTC architectures requiring devoted media servers (similar to Selective Forwarding Items or SFUs), this light-weight transport can run straight inside AgentCore Runtime, eradicating the necessity for complicated media server administration.
On this situation, connection stream operates as follows:
- Signaling: Shopper sends a Session Description Protocol (SDP) provide to the middleman server, which forwards it to the /invoke/ endpoint in AgentCore Runtime. The agent @app.entrypoint handler processes the provide and returns an SDP reply containing media capabilities and community candidates.
- Connectivity Institution: To ascertain a direct connection, each the consumer and the agent use Interactive Connectivity Institution (ICE) protocol with the intention to uncover the optimum community path. AgentCore Runtime supporting Traversal Utilizing Relays round NAT (TURN) relayed connections. The protocol makes an attempt connectivity on this order:
- Direct Connection: Join peer-to-peer utilizing native community addresses. This path shouldn’t be supported on AgentCore Runtime because the runtime surroundings can’t be assigned to a public IP tackle.
- Session Traversal Utilities for NAT (STUN) assisted connection: Use a STUN server to find public IP/port via Community Deal with Translation (NAT) and try direct connectivity. This path requires each inbound and outbound UDP visitors which isn’t presently supported as AWS NAT Gateways makes use of symmetric NAT, which prevents STUN-based direct connectivity from succeeding.
- Traversal Utilizing Relays round NAT (TURN) relayed connection: Route media via a TURN relay server. Configure TURN utilizing managed companies (similar to Cloudflare or Twilio), Amazon Kinesis Video Streams (KVS) or self-hosted options (similar to coturn in your VPC). This path is really helpful on AgentCore Runtime configured with a VPC (see particulars beneath).
- Connection via VPC: As soon as connectivity is established, visitors will route from the consumer to the runtime surroundings by way of the VPC (extra particulars within the following part).
To be taught extra, strive the Pipecat on AgentCore code pattern utilizing WebRTC transport.
Configuring AgentCore Runtime on VPC for WebRTC connectivity
The code pattern demonstrates a easy voice agent utilizing WebRTC. First, you configure ICE_SERVER_URLS surroundings variables in each: 1) the middleman server on the Pipecat pre-built frontend (/server) and a pair of) the runtime surroundings (/agent). This permits bidirectional visitors between them.
Subsequent, you deploy your brokers to AgentCore Runtime with VPC networking configured to permit for UDP transport to TURN servers. For safety, you expose the runtime to a personal VPC subnet, with a NAT Gateway within the public subnet to route web entry, as illustrated beneath.
With this method, you’ll be able to configure ICE servers for full WebRTC connectivity, with each STUN and UDP with TCP fallback. For instance, you’ll be able to configure Cloudflare managed TURN as follows:
# Configure agent/.env and server/.env
ICE_SERVER_URLS=stun:stun.cloudflare.com,flip:flip.cloudflare.com:53,flip:flip.cloudflare.com:3478,flip:flip.cloudflare.com:5349
Utilizing AWS-native TURN with Amazon Kinesis Video Streams (KVS)
For a completely AWS-native various to managed TURN companies, Amazon Kinesis Video Streams (KVS) handles TURN infrastructure with out third-party dependencies. It gives non permanent, auto-rotating TURN credentials by way of the GetIceServerConfig API, avoiding third-party dependencies for NAT traversal. The stream works as follows:
- One-time setup: Create a KVS signaling channel. The channel is used just for TURN credential provisioning — your agent continues to make use of Pipecat’s WebRTC transport for signaling and media.
- At connection time: Your agent calls GetSignalingChannelEndpoint to get the HTTPS endpoint, then calls GetIceServerConfig to retrieve non permanent TURN credentials (URIs, username, password).
- Configure the peer connection: Go the returned credentials to your RTCPeerConnection as ICE servers. TURN visitors flows via KVS-managed infrastructure.
Concerns when utilizing KVS managed TURN
Issue
KVS Managed TURN
Third-party TURN
AWS native
Sure — no exterior dependency
No — requires exterior account
Credential administration
Computerized rotation
Handbook or provider-managed
Arrange
Create signaling channel + API calls
Configure surroundings variables
Greatest for
AWS centric deployments
Simplicity or current supplier relationships
Further concerns:
- Price: Every lively signaling channel prices $0.03/month. At low to average quantity, that is negligible.
- Fee restrict: GetIceServerConfig is proscribed to five transactions per second (TPS) per channel. For prime-volume deployments exceeding 100,000 periods per 30 days, implement a channel pooling technique the place you distribute requests throughout a number of channels: channels_needed = ceil(peak_new_sessions_per_second / 5).
- No PrivateLink: The VPC nonetheless requires web egress (by way of NAT Gateway) to succeed in KVS TURN endpoints.
- Credential lifetime: KVS TURN credentials are non permanent and auto-rotated, so you do not want to handle credential rotation.
To be taught extra, strive the code pattern utilizing KVS managed TURN.
Instance method: Utilizing managed WebRTC on AWS Market
Whereas direct WebRTC gives management, managed WebRTC suppliers generally present TURN servers and globally distributed SFUs to facilitate dependable connectivity and low-latency media routing. It additionally gives further options similar to built-in analytics and observability, and help for multi-participant rooms past 1:1 agent conversations. For manufacturing voice brokers at scale, contemplate managed suppliers accessible on AWS Market, similar to Every day. Every day runs its globally distributed WebRTC infrastructure on AWS providing a number of deployment fashions:
- Absolutely managed SaaS: You connect with Every day’s hosted infrastructure by way of public API endpoints. That is ideally suited for fast deployment and environments the place operational simplicity is prioritized. On this situation, your agent in AgentCore Runtime can merely connect with the managed WebRTC infrastructure by way of the general public web.
- Buyer VPC deployment: You deploy Every day’s media servers straight into your VPC for full community management and compliance with strict knowledge residency necessities. On this situation, you configure AgentCore Runtime for VPC as outlined above.
- SaaS with AWS PrivateLink: You connect with Every day’s hosted infrastructure and configure AWS PrivateLink in order that visitors flows via VPC endpoints on to Every day’s managed infrastructure with out traversing the general public web, decreasing latency whereas sustaining community isolation to the AWS spine community. On this situation, you configure AgentCore Runtime for VPC as outlined above.
To be taught extra, contact your AWS account group to discover Every day on AWS Market or strive the code pattern utilizing Every day transport and its DAILY_API_KEY on the totally managed SaaS choice.
Instance method: Utilizing a telephony supplier
Whereas WebRTC excels for internet and cell channels, telephony handoff permits conventional Public Switched Phone Community (PSTN) integration for contact facilities, IVR alternative, and outbound campaigns. For real-time dialog, your agent runtime should keep a persistent, bidirectional audio stream together with your speech fashions, enterprise logic, and telephony supplier. These suppliers provide managed voice companies that deal with the complexity of conventional telephony infrastructure via easy APIs. Relying on the capabilities of the telephony supplier, you combine to them utilizing both Session Initiation Protocol (SIP) or streaming WebSocket or WebRTC protocols. Pipecat transports and serializers present connectors for implementation.
To be taught extra, see Pipecat Information on Telephony and Constructing AI-Powered Voice Functions: Telephony Integration Information.
Conclusion
AgentCore Runtime gives a safe and serverless infrastructure to scale voice brokers reliably. On this publish, you realized how low latency is crucial for pure conversations, and key concerns for various transport modes: WebSockets, TURN-assisted WebRTC, managed WebRTC and telephony integrations, based mostly in your latency, reliability, and utilization necessities. When evaluating transport choices, begin easy with WebSockets for fast prototyping, then contemplate WebRTC with AgentCore on VPC mode or managed suppliers for manufacturing deployments. In case your voice brokers intend to deal with telephony or contact middle use circumstances, contemplate accessible integrations to telephony suppliers on your implementation.
In Half 2 of this sequence, you’ll discover further concerns past community transport: overlaying streaming methods throughout agent-to-model communication, instrument execution, reminiscence, and retrieval to realize optimum end-to-end latency.
Get began with the Pipecat on AgentCore code samples and hands-on workshop beneath and decide the transport layer that matches your use case:
For groups preferring extra infrastructure management, the Steerage for Constructing Voice Brokers on AWS on Amazon ECS can also be accessible as a containerized deployment choice.
Further Sources
Concerning the authors
Kwindla Hultman Kramer is the Co-founder and CEO at Every day, pioneering low-latency real-time voice, video, and multimodal AI infrastructure. A number one voice AI thought chief, he created the open-source Pipecat framework for manufacturing voice brokers and shares insights at voice AI meetups and his X account (@kwindla).
Paul Kompfner is a Member of Technical Workers at Every day, the place he’s on the group that maintains the Pipecat open supply framework. He’s an professional in streaming infrastructure and voice-based agentic techniques. He often collaborates with AWS and the voice AI ecosystem to ship first-class help for voice fashions and internet hosting platforms to allow scalable real-time voice AI on Pipecat.
Kosti Vasilakakis is a Principal PM at AWS on the Agentic AI group, the place he has led the design and improvement of a number of Bedrock AgentCore companies from the bottom up, together with Runtime. He beforehand labored on Amazon SageMaker since its early days, launching AI/ML capabilities now utilized by hundreds of firms worldwide. Earlier in his profession, Kosti was a knowledge scientist. Exterior of labor, he builds private productiveness automations, performs tennis, and explores the wilderness together with his household.
Lana Zhang is a Senior Options Architect within the AWS World Extensive Specialist Group AI Providers group, specializing in AI and generative AI with a give attention to use circumstances together with content material moderation and media evaluation. She’s devoted to selling AWS AI and generative AI options, demonstrating how generative AI can remodel traditional use circumstances by including enterprise worth. She assists prospects in reworking their enterprise options throughout various industries, together with social media, gaming, ecommerce, media, promoting, and advertising and marketing.
Sundar Raghavan is a Options Architect at AWS on the Agentic AI group. He formed the developer expertise for Amazon Bedrock AgentCore, contributing to the SDK, CLI, and starter toolkit, and now focuses on integrations with AI agent frameworks. Beforehand, Sundar labored as a Generative AI Specialist, serving to prospects design AI functions on Amazon Bedrock. In his free time, he loves exploring new locations, sampling native eateries, and embracing the good outdoor.
Daniel Wirjo is a Options Architect at AWS, targeted on AI and SaaS startups. As a former startup CTO, he enjoys collaborating with founders and engineering leaders to drive progress and innovation on AWS. Exterior of labor, Daniel enjoys taking walks with a espresso in hand, appreciating nature, and studying new concepts.

