What do I need to build an AI outbound calling system from scratch?

You need four components: a SIP trunk provider (Telnyx or Twilio) for telephony, an STT engine (Deepgram) to transcribe caller speech, an LLM for generating responses, and a TTS engine (Cartesia) to speak those responses. Pipecat orchestrates the pipeline. Dograh adds production tooling like agent management, visual workflow editing, and telephony integration on top.

How much does AI outbound calling cost per minute compared to human agents?

Hosted platforms like Vapi cost $0.07-0.25 per minute all-in. Self-hosted with Dograh costs roughly $0.01-0.02 per minute for STT, TTS, and telephony combined. Traditional human call centers cost $5-25 per interaction. At 1,000 minutes per day, self-hosted saves $25,000-90,000 per year compared to hosted platforms.

Vapi vs self-hosted for outbound AI calls - which is better?

Vapi is faster to start but locks you into per-minute pricing that compounds at scale. At 1,000 min/day, Vapi costs $15-30k/year. Self-hosted with Dograh eliminates the platform fee entirely. You also get full data sovereignty, no vendor lock-in, and the ability to modify agent behavior through a visual workflow builder without depending on a third-party API.

What is the biggest mistake teams make with AI outbound calling?

Choosing a hosted platform without doing the cost math at their target volume. A per-minute fee that seems small at 100 calls/day becomes a significant line item at 1,000+. The second biggest mistake is ignoring latency. If your voice pipeline takes more than 800ms to respond, conversion rates drop sharply because callers hang up or disengage.

How do I make sure my AI outbound calls comply with TCPA regulations?

TCPA requires prior express consent before making automated calls. Scrub your call lists against the National Do Not Call Registry. Log all consent records with timestamps. Self-hosting your voice agent helps with compliance because call recordings and consent data stay on your infrastructure rather than being stored by a third-party platform.

Making AI Outbound Calls Work: Production Guide 2026

Key Takeaways

Self-hosted outbound calling costs 90% less than hosted platforms at scale.
SIP trunking with open-source frameworks is the production-ready path.
Dograh adds deployment plumbing that Pipecat alone does not provide.

Most AI outbound calling guides tell you to sign up for a platform and start dialing. They skip the part where that platform charges $0.10-0.25 per minute and owns your entire call infrastructure. At Dograh, we build open-source voice agents for production use. This guide covers how to make AI outbound calls work without paying the vendor tax, with real numbers, real stack choices, and the self-hosted path that none of the top-ranking guides mention.

A production AI outbound calling system needs four layers: SIP trunking for telephony, an STT-LLM-TTS voice pipeline under 800ms latency, agent logic for qualification and scheduling, and CRM integration. Self-hosted with Dograh, the total cost at 1,000 minutes per day runs $600-$2,400 per year versus $15,000-$90,000 on hosted platforms like Vapi or ElevenLabs.

Voice agent usage grew 9x in 2025, and four out of five companies plan to deploy voice AI for customer interactions this year. The market is moving fast. But most teams pick up hosted platforms without understanding what they actually cost at scale or what alternatives exist.

What a Production Outbound System Looks Like

A production AI outbound calling system has four layers that need to work together: telephony, voice pipeline, agent logic, and integration.

Telephony means SIP trunking. Your voice agent connects to the phone network through a SIP trunk provider like Telnyx or Twilio. Real phone numbers calling real phones. SIP trunking is the most reliable foundation for production outbound because it gives you direct carrier access without proprietary middleware or per-call platform surcharges. Your trunk provider handles number provisioning, call routing, and carrier compliance. Your application handles everything else.

The voice pipeline is the STT-LLM-TTS chain. Speech-to-text converts what the callee says, a language model decides how to respond, and text-to-speech turns that response into audio. The full loop needs to complete in under 800ms to sound natural. Anything slower and callees start disengaging or talking over the agent. STT is usually the latency bottleneck, not the LLM.

Agent logic determines what happens during the call: lead qualification, objection handling, appointment scheduling, CRM updates. This is where business value lives, and it is also where most hosted platforms lock you in. They own the agent runtime, so modifying behavior means working within their constraints.

Integration ties everything to your existing systems. CRM writes after each call, calendar bookings for scheduled meetings, webhook triggers for downstream workflows, and analytics to track what is working. Skip the integration layer and your agents generate conversations that go nowhere.

The Real Cost of Outbound Calling

Every competitor guide talks about efficiency gains but none show the bill. Here is what AI outbound calling actually costs across the main approaches.

Hosted platforms charge per minute, and the headline rates are misleading. Vapi's base rate of $0.05/min does not include STT, LLM, TTS, or telephony. Your actual cost lands between $0.07 and $0.25/min depending on which models you pick. At 1,000 minutes per day, that adds up to $25,500-$91,250 per year. For a mid-range configuration, expect around $15,000-$30,000/year. ElevenLabs Conversational AI charges $0.08-0.10/min on their Business and Pro tiers, working out to $29,200-$36,500/year at the same volume. Their LLM costs are currently absorbed but will eventually be passed through.

Self-hosted with Dograh means zero platform fees. You pay for your SIP trunk (around $0.005/min), your STT API (around $0.007/min), your TTS, and cloud infrastructure. Total at 1,000 min/day: $600-$2,400/year. AI calling systems reduce cost per interaction to $0.50-$5 compared to $5-$25 for traditional call centers, a 70-90% cost saving. But self-hosting drops it further. When calls cost $0.01-0.02/min instead of $0.10-0.25/min, you can run qualification calls, survey calls, and follow-up campaigns that would never make financial sense on a hosted platform. The economics of automation vs traditional systems tilt heavily toward self-hosted ownership.

Picking Your Voice Pipeline Stack

The voice pipeline is where call quality lives or dies. Each component has a latency budget, and the wrong choice in any slot drags down the whole system.

For STT, Deepgram Nova-3 is the default pick for most production voice agents. Streaming transcription at $0.0077/min with accuracy that handles phone-quality audio well. Their Growth tier drops to $0.0065/min. AssemblyAI works too, especially if you need built-in speaker diarization, but Deepgram's latency profile is better for real-time outbound where every millisecond counts.

For TTS, Cartesia Sonic 3 leads on latency at roughly 40ms time-to-first-byte and costs about one-fifth what ElevenLabs charges per character. When you are making hundreds or thousands of outbound calls per day, that pricing gap adds up fast. Deepgram Aura is worth considering if you want to consolidate STT and TTS on one platform, though Cartesia's voice quality edges it out for most use cases. For the LLM, any model with function calling works. Smaller, faster models tend to perform better for outbound because the conversations are structured and predictable. You do not need GPT-4-class reasoning for a lead qualification call. Most production outbound agents run on models in the 7B-70B range or on hosted APIs like GPT-4o-mini or Claude Haiku.

Pipecat orchestrates the full pipeline as an open-source Python framework that streams audio through STT-LLM-TTS with sub-500ms latency. It handles turn detection, interruption handling, and service integration across 50+ providers. AI-driven cold calling teams achieve 5-10% conversion rates compared to the 2.3% industry average, and personalized AI calls see a 36% higher meeting conversion rate. Pipeline quality directly drives these numbers. Slow responses, bad turn detection, or robotic TTS will tank your conversion rates regardless of how good your scripts are. For a deeper look at latency optimization, see our guide on speech latency in voice agents.

Open Source Alternative to Vapi / Retell

Self-hosted voice agent platform — no per-minute fees

dograh-hq/dograh

Star on GitHub

Why the Top Guides Miss the Point

I read through the top-ranking guides for AI outbound calling. The gaps are consistent. ElevenLabs published a strategy guide that reads like a product pitch for their own telephony, voices, and batch calling. No cost analysis anywhere. No acknowledgment that you are locked in at $0.08-0.10/min with no migration path.

RingCentral wrote 3,100 words about AI cold calling without a single architecture decision or cost figure. Telnyx published an industry overview with unsourced claims like "AI will be involved in 100% of customer interactions." None of them are actionable for teams trying to ship production systems.

The pattern across all of them: none mention open-source frameworks, none discuss self-hosting, and none show the actual stack or what it costs to run. Sales teams using AI report a 50% increase in qualified leads, but that outcome does not require a $30k/year platform subscription. It requires a solid voice pipeline, good prompts, and reliable telephony, all achievable with open-source tools.

From Prototype to Production with Dograh

Pipecat gives you the voice pipeline. But going from a working demo to a production outbound system means building agent management, telephony integration, call scheduling, monitoring, and a way for non-developers to edit agent behavior. That is months of engineering work that has nothing to do with voice AI and everything to do with operational plumbing.

Dograh handles that production layer. It is open-source (BSD-2 license), fully self-hostable, and built on top of Pipecat. Dograh gives you the deployment and management infrastructure that Pipecat does not provide, the gap between a working demo and running 1,000 calls a day. The visual workflow builder lets your sales team design agent conversation flows with drag-and-drop. They can edit qualification questions, update objection scripts, or change scheduling logic without writing code. This matters when you are running outbound campaigns that need daily tweaking. Easy editing of live agents is one of the features teams ask for most, and hosted platforms make it unnecessarily hard by burying configuration behind API calls.

Built-in telephony integration supports Twilio, Vonage, Vobiz, and Cloudonix. Connect your SIP trunk, assign phone numbers, and launch outbound campaigns from the dashboard. Bring your own API keys for every component so you are never locked into a specific provider. AI voice agents now handle 50-60% of routine calls autonomously, lowering average cost per contact by 30-50%. With Dograh, that autonomous handling runs on your infrastructure, with your data, at your cost.

Join the Dograh Community

Dograh is an OSS alternative to Vapi. Join our Slack community for queries, releases, best practices & community interactions.

Getting Started and Staying Compliant

Install Dograh locally in under two minutes. The GitHub repo has a one-command setup. Connect a Telnyx or Twilio SIP trunk for outbound telephony, add your Deepgram and Cartesia API keys, and build your first outbound agent in the workflow builder. Test with a real call before scaling.

Before you scale up, handle compliance. TCPA requires prior express consent for automated calls to mobile numbers. Scrub your lists against the National Do Not Call Registry. Log consent records with timestamps. Different states have additional rules on top of federal TCPA requirements, so check your target states before launching campaigns. The FCC has been tightening enforcement on AI-generated calls specifically, making proper consent documentation more important than ever. Self-hosting gives you an advantage here because your call data, recordings, and consent records stay on your infrastructure rather than a third party's servers. For regulated industries like healthcare and finance, this is often a hard requirement.

Gartner predicts 40% of enterprise apps will feature task-specific AI agents by 2026, up from less than 5% in 2025. Outbound calling is one of the clearest production use cases. The teams that own their stack now - telephony, pipeline, agent logic - will have a compounding cost and capability advantage as the market matures.

Glossary

SIP Trunking: A protocol for routing voice calls over IP networks. In outbound AI calling, SIP trunks connect your voice agent to the phone network without per-call platform fees.
Voice Pipeline: The sequential processing chain of STT (speech-to-text), LLM (language model), and TTS (text-to-speech) that gives a voice agent the ability to listen, think, and speak in real time.
Turn Detection: The mechanism that determines when a caller has finished speaking so the AI agent can respond. Poor turn detection causes agents to interrupt callers or leave awkward silences.
Per-Minute Pricing: The billing model used by hosted voice AI platforms where every second of call time incurs a charge. At scale, this model makes outbound calling prohibitively expensive compared to self-hosted alternatives.

The Developer's Guide to AI Outbound Calling That Converts