Asterisk AI PBX Guide 2026: Transform Your Phone System into

✓ Mis à jour : Mars 2026  ·  Par l'équipe AIO Orchestration  ·  Lecture : ~8 min

The Dawn of the Conversational PBX: Asterisk in 2026

AI orchestration platform flow diagram showing asterisk ai pbx : essential guide 7 steps architecture with LLM, STT and TTS integration

Welcome to 2026, where the humble phone call has evolved into a dynamic, intelligent conversation. The technology powering this revolution isn't a new, proprietary black box, but a seasoned veteran of the telephony world: Asterisk. For over two decades, Asterisk has been the open-source engine of communication, and today, it stands as the ultimate platform for building a true Asterisk AI PBX. This guide will show you how to transform your standard Asterisk server from a call router into a sophisticated, conversational AI agent capable of understanding, reasoning, and responding in real-time.

The concept of an Asterisk AI PBX is no longer a futuristic dream. It's a practical reality achieved by integrating modern artificial intelligence stacks directly into Asterisk's call flow. We're moving beyond rigid IVRs ("Press 1 for sales...") into a world of fluid, natural language interactions. Imagine a system that can book appointments, answer complex product questions, triage support tickets, and even detect caller sentiment—all without human intervention. This is the promise of Asterisk voice AI 2026, and this comprehensive guide is your blueprint.

Why Asterisk Remains the Bedrock of Telephony AI

In an era dominated by cloud-native, API-driven services, why does a mature platform like Asterisk continue to be the foundation for cutting-edge voice AI? The answer lies in its core design principles, which are more relevant today than ever before.

Expert Take: The future of telephony isn't about replacing Asterisk; it's about augmenting it. The core call control and media handling that Asterisk excels at are the perfect "plumbing" for the "intelligence" provided by modern LLMs and voice models. It's the ultimate symbiotic relationship.

Choosing Your AI Gateway: EAGI vs. AGI vs. ARI

To bridge the world of telephony (Asterisk) and the world of AI (your scripts and models), Asterisk provides several powerful interfaces. Choosing the right one is the most critical architectural decision you'll make for your Asterisk artificial intelligence project.

AGI: The Classic Request-Response

The Asterisk Gateway Interface (AGI) is the original and most straightforward method. When a call hits an AGI command in the dialplan, Asterisk launches your script (in Python, Perl, PHP, etc.) and communicates with it over standard input and standard output (stdin/stdout). It works on a simple command-response basis: your script receives call variables, you send commands back like `STREAM FILE` or `GET DATA`, and Asterisk executes them.

EAGI: The Real-Time Audio Stream Champion

The Enhanced Asterisk Gateway Interface (EAGI) is the game-changer for Asterisk voice AI 2026. It functions like AGI but with one monumental difference: in addition to stdin (file descriptor 0) and stdout (file descriptor 1), Asterisk passes the raw, inbound audio stream to your script on file descriptor 3.

This means your script can read the caller's voice in real-time, as they are speaking. You can pipe this 8kHz, 16-bit linear audio directly into a Speech-to-Text (STT) engine. This non-blocking, streaming approach is precisely what's needed to build a responsive conversational agent.

For 99% of conversational Asterisk LLM integration use cases, EAGI is the correct choice. It provides the most direct and performant path for the audio data that fuels your AI.

ARI: The Application Controller

The Asterisk REST Interface (ARI) is a more modern, asynchronous interface that allows external applications to control and build communication logic. Instead of Asterisk calling a script, your application connects to Asterisk via a WebSocket and receives JSON events about calls, channels, and more. It can then send commands back via a REST API.

Blueprint for an Asterisk AI PBX: The Core Architecture

Now, let's assemble the pieces. Building a functional Asterisk AI PBX involves a clear, linear flow of data from the phone network to your AI models and back again. This architecture is designed for low latency and leverages the power of EAGI.

  1. Call Arrival: A call comes into your system via a SIP trunk or a local PJSIP endpoint. The Asterisk core, specifically the `chan_pjsip` module, handles the signaling and establishes the media session.
  2. Dialplan Routing: Your `extensions.conf` dialplan receives the call. A specific extension is configured to answer the call and immediately execute an EAGI script. This is the handoff point from Asterisk's static logic to your dynamic AI logic.
    
    [from-outside-ai]
    exten => s,1,Answer()
    exten => s,n,Verbose(1, "Handing call off to AI Agent EAGI script...")
    exten => s,n,EAGI(agent.py)
    exten => s,n,Hangup()
        
  3. EAGI Audio Ingress: Your EAGI script (e.g., `agent.py`) is now running. It immediately begins reading the raw 16-bit signed-linear audio at an 8000Hz sample rate from file descriptor 3. This audio is piped directly to a Speech-to-Text (STT) process.
  4. Speech-to-Text (STT): The STT engine (e.g., Whisper.cpp) receives the audio stream and performs real-time transcription. As it recognizes words or phrases, it outputs the resulting text. This text is captured by your EAGI script.
  5. Large Language Model (LLM) Processing: The transcribed text is formatted into a prompt and sent via an API call to a Large Language Model (e.g., a local Llama 3 model served by LLM backend). The LLM processes the input, accesses any necessary tools or data, and generates a text-based response.
  6. Text-to-Speech (TTS) Synthesis: The LLM's text response is sent to a Text-to-Speech (TTS) engine (e.g., Coqui mixael-TTS). The TTS engine synthesizes the text into a natural-sounding audio waveform (e.g., a WAV file).
  7. Audio Playback: Your EAGI script uses an AGI command like `STREAM FILE` to send the generated audio file back to Asterisk. Asterisk plays this audio to the caller over the active phone channel. The loop then repeats, waiting for the caller's next utterance.

This entire loop—from the end of the caller's speech to the beginning of the AI's response—must happen in under a second to feel natural. This is where performance tuning becomes critical.

~150ms
STT Latency
~300ms
LLM TTFT
~250ms
TTS Latency
<800ms
Total Response Latency

Essential Asterisk Concepts for AI Integration

To successfully implement the architecture above, you need to be comfortable with a few core Asterisk configuration concepts. These are the levers you'll pull to connect everything together.

The Brains: extensions.conf Dialplan

The dialplan (`/etc/asterisk/extensions.conf`) is the heart of Asterisk, dictating how calls are handled. For an AI enhanced Asterisk system, its primary job is to route incoming calls to your EAGI script. You'll define a context that contains the extension for your AI agent.


[general]
; General settings

[from-pstn-trunk]
; This context handles calls from your main phone number
exten => _+1NXXNXXXXXX,1,NoOp(Call from ${CALLERID(num)} to ${EXTEN})
exten => _+1NXXNXXXXXX,n,Goto(ai-receptionist,s,1)

[ai-receptionist]
; The context for our AI agent
exten => s,1,Answer()
; Set some variables for the EAGI script
exten => s,n,Set(CHANNEL(language)=en)
; Execute the EAGI script. The script must be executable and in the agi-bin directory.
exten => s,n,EAGI(ai_receptionist.py,${UNIQUEID})
; If the script exits, hang up the call.
exten => s,n,Hangup()

exten => h,1,NoOp(Call hung up)

The Gateway: PJSIP Endpoints

PJSIP is the modern SIP channel driver in Asterisk. You'll use `pjsip.conf` to configure your SIP trunks (from providers like Twilio or Bandwidth) and any local SIP endpoints (like softphones for testing). A key setting is the `context`, which tells Asterisk where to send the call in the dialplan.


; /etc/asterisk/pjsip.conf

[my-sip-trunk]
type=endpoint
transport=transport-udp
context=from-pstn-trunk ; This is the critical link to the dialplan
disallow=all
allow=ulaw
aors=my-sip-trunk
...

The Lifeblood: Audio Formats and Codecs

Telephony audio is different from high-fidelity music. The standard codec is G.711 (either µ-law in North America/Japan or a-law elsewhere). This is an 8-bit companded format sampled at 8000Hz. When Asterisk passes this audio to your EAGI script via file descriptor 3, it helpfully converts it to a more usable format: 16-bit signed-linear PCM, 8000Hz, mono. This is often referred to as `slin@8000`.

Crucial Tip: Your Speech-to-Text model must be able to process 8kHz audio. Many off-the-shelf models are trained on 16kHz audio and will perform poorly with telephone-quality sound. Models like Whisper have been trained on a wide variety of audio and handle 8kHz well, but you must configure your STT tool to expect it.

The Magic Pipe: EAGI's File Descriptor 3

This concept cannot be overstated. In your EAGI script (e.g., in Python), you will open and read from file descriptor 3 to get the audio.


import sys
import os

# In Python, os.fdopen() can wrap a file descriptor in a file-like object.
# fd 0 is stdin, 1 is stdout, 2 is stderr, 3 is our audio!
audio_stream = os.fdopen(3, 'rb')

while True:
    # Read 320 bytes of audio data (20ms of 8kHz, 16-bit audio)
    audio_chunk = audio_stream.read(320)
    if not audio_chunk:
        break
    #...pipe this chunk to your STT process...
This provides a continuous, low-latency stream of the caller's voice, ready for AI processing.

Practical Integration: Building Your Agent with LLM backend, Whisper, and mixael-TTS

Let's get practical. The beauty of the Asterisk LLM integration is using powerful, locally-hosted open-source models to maintain privacy and control costs. Here’s a breakdown of the toolchain.

Step 1: Real-Time Transcription with Whisper

OpenAI's Whisper is the de facto standard for open-source STT. For real-time performance, we use a C++ port like `whisper.cpp`. Your EAGI script will spawn `whisper.cpp` as a subprocess, piping the audio from file descriptor 3 directly into its standard input.

Example Command (within your EAGI script):


# The EAGI script pipes audio from fd3 to this subprocess's stdin
stt_process = subprocess.Popen([
    "/path/to/whisper.cpp/main",
    "-m", "/path/to/ggml-base.en.bin", # Path to the Whisper model
    "-t", "8",                       # Number of threads
    "--step", "4000",                # Process audio in 4-second steps
    "--length", "8000",              # Keep 8 seconds of audio context
    "-l", "en",                      # Language
    "-otxt",                         # Output as plain text
    "-",                             # Read audio from stdin
], stdin=audio_stream, stdout=subprocess.PIPE)
Your script then reads the transcribed text from `stt_process.stdout`.

ollama">Step 2: Intelligent Response with LLM backend LLMs

LLM backend is a fantastic tool for serving and running large language models locally. Once installed, you can pull and run a model like Llama 3 or Mistral with a single command (`ollama run llama3`). LLM backend exposes a simple REST API on `localhost:11434`.

From your EAGI script, once you have transcribed text from Whisper, you make a standard HTTP POST request to the LLM backend API.

Example API Call (using Python's `requests` library):


import requests
import json

def get_llm_response(text_from_whisper):
    prompt = f"The user said: '{text_from_whisper}'. You are a helpful AI assistant. Respond concisely."
    response = requests.post(
        "http://localhost:11434/api/generate",
        data=json.dumps({
            "model": "llama3:8b",
            "prompt": prompt,
            "stream": False # Get the full response at once
        })
    )
    return response.json()["response"]
This gives you the AI's generated text response, ready for synthesis.

xtts">Step 3: Human-like Speech with mixael-TTS

Coqui's mixael-TTS is a leading open-source, multi-lingual Text-to-Speech model that offers incredible voice quality and cloning capabilities. You can run it via its own API server. Your EAGI script takes the text from LLM backend and sends it to the mixael-TTS server to generate audio.

Example API Call:


# Assuming mixael-TTS server is running on localhost:8020
tts_response = requests.post(
    "http://localhost:8020/tts",
    json={
        "text": text_from_llm,
        "speaker_wav": "/path/to/your_voice_sample.wav", # For voice cloning
        "language": "en"
    }
)

# Save the returned audio content to a temporary file
temp_wav_path = f"/tmp/{unique_call_id}.wav"
with open(temp_wav_path, 'wb') as f:
    f.write(tts_response.content)

# Now, tell Asterisk to play this file
sys.stdout.write(f'STREAM FILE {temp_wav_path.replace(".wav", "")} ""\n')
sys.stdout.flush()
Asterisk plays the generated `.wav` file, and the conversation loop is complete.

Tuning for Speed: Performance Optimization

Latency is the enemy of natural conversation. An Asterisk AI PBX must be tuned for speed. Your goal is to minimize the "ear-to-mouth" delay.

Keeping Watch: Monitoring and Debugging Your AI Agent

When things go wrong, you need visibility. The Asterisk CLI is your best friend.

By combining Asterisk's native debugging with robust logging in your application script, you can quickly diagnose issues anywhere in the call flow.

Scaling Up: Building a Multi-Tenant Asterisk AI PBX

A single, powerful server can serve multiple businesses, each with its own unique AI persona, knowledge base, and phone number. This is the essence of a multi-tenant Asterisk AI PBX.

The key is to use Asterisk's contexts and a database to isolate tenants. Here's the strategy:

  1. Database Schema: Create a database (e.g., PostgreSQL) with tables for `tenants`, `phone_numbers`, and `ai_configurations`. The `phone_numbers` table maps an incoming DID to a `tenant_id`. The `ai_configurations` table stores the prompt, voice model, and other settings for each tenant.
  2. Dynamic Dialplan: Use `res_odbc` to have Asterisk query the database when a call arrives. The dialplan looks up the incoming DID in the `phone_numbers` table to identify the tenant.
  3. Passing Tenant Info: The dialplan then passes the `tenant_id` as an argument to the EAGI script.
    
    exten => _+1NXXNXXXXXX,1,MYSQL(Connect connid localhost user pass asterisk)
    exten => _+1NXXNXXXXXX,n,MYSQL(Query resultid ${connid} SELECT tenant_id FROM phone_numbers WHERE did='${EXTEN}')
    exten => _+1NXXNXXXXXX,n,MYSQL(Fetch fetchid ${resultid} TENANT_ID)
    exten => _+1NXXNXXXXXX,n,EAGI(agent.py,${TENANT_ID})
        
  4. Tenant-Specific Logic: The EAGI script uses the `tenant_id` to fetch the correct AI configuration from the database. It uses the right prompt, the right voice, and accesses the right knowledge base for that specific business.

This architecture allows you to scale your service efficiently, onboarding new clients simply by adding rows to a database, without ever touching your Asterisk configuration files.

faq">Frequently Asked Questions

Can I use cloud-based AI services like OpenAI API or Google STT instead of local models?

Absolutely. The architecture remains the same. Instead of making an API call to `localhost`, your EAGI script would make a secure, authenticated API call to the cloud service's endpoint. The main trade-offs are cost and latency. Cloud services typically charge per API call or per minute of audio, which can become expensive at scale. They also introduce network latency, which can make the conversation feel less responsive compared to a well-tuned local setup.

What kind of hardware is required for a production Asterisk AI PBX?

For a production system handling multiple concurrent calls, you should invest in a dedicated server. A good starting point would be a modern multi-core CPU (e.g., AMD EPYC or Intel Xeon), 64GB+ of RAM, and at least one high-end NVIDIA GPU with 16GB or more of VRAM (e.g., an RTX 4080 or an A-series data center GPU). Fast NVMe storage is also crucial for quick loading of models and audio files. For a small-scale test, a desktop with a consumer GPU like an RTX 3060 can be sufficient.

Is EAGI always better than ARI for AI voice agents?

For the specific task of building a single-channel, real-time conversational agent, EAGI is almost always the more direct, lower-latency, and simpler solution. Its direct access to the audio stream is purpose-built for this use case. ARI becomes a better choice when you need to build a more complex application that manages the state of *many* calls simultaneously, like a dynamic conferencing system or a third-party CTI dashboard that needs to originate, bridge, and record calls on behalf of users.

How

Prêt à déployer votre Agent Vocal IA ?

Solution on-premise, latence 335ms, 100% RGPD. Déploiement en 2-4 semaines.

Demander une Démo Guide Installation

Frequently Asked Questions