Self-hosted inference gateway for voice AI — route STT, LLM, and TTS to any provider or local model
Project description
VoiceGateway
Self-hosted inference gateway for voice AI. Unified STT + LLM + TTS routing. Your API keys. Local models included. Agent-managed via MCP.
Docs · Quick Start · MCP Setup · Deploy
Why VoiceGateway
Every LLM gateway routes LLMs. None routes the full voice pipeline STT, LLM, and TTS through one unified interface with local model support, project-based budgeting, and agent-native management.
VoiceGateway is that missing layer. Drop-in LiveKit compatibility, bring your own API keys, optional local-only operation, and a first-class MCP server that lets Claude Code, Cursor, or Codex configure it for you.
| LiteLLM | Cloudflare AI Gateway | LiveKit Inference (Cloud) | VoiceGateway | |
|---|---|---|---|---|
| LLM routing | ✅ | ✅ | ✅ | ✅ |
| STT routing | ❌ | ❌ | ✅ | ✅ |
| TTS routing | ❌ | ❌ | ✅ | ✅ |
| Local models | Partial | ❌ | ❌ | ✅ |
| Self-hostable | ✅ | ❌ | ❌ | ✅ |
| Project-based budgets | ✅ | ❌ | ❌ | ✅ |
| Fallback chains | Limited | ✅ | Limited | ✅ |
| MCP server | ❌ | ❌ | ❌ | ✅ |
| LiveKit plugin native | ❌ | ❌ | ✅ | ✅ |
| License | MIT | Commercial | Commercial | MIT |
Deploy
One command to Fly.io public HTTPS URL, persistent storage, MCP-ready.
git clone https://github.com/mahimailabs/voicegateway
cd voicegateway/deploy/fly
./deploy.sh
You get a *.fly.dev URL, an MCP endpoint your coding agent can connect to, and encrypted API key storage. Fly uses pay-as-you-go pricing (~$1-3/month for light use; volumes billed even when suspended). Deployment guide →
Other options: Docker Compose locally, Hetzner/Oracle for cheap self-host, or any Docker host. See docs.voicegateway.dev/guide/installation.
Quick Start
Option 1: pip install (recommended for development)
pip install "voicegateway[cloud,dashboard,mcp]"
voicegw init # creates voicegw.yaml
voicegw status # verify providers
voicegw dashboard # http://localhost:9090
Option 2: Docker (production-ready)
Pull the official image from Docker Hub — no build required:
docker run -p 8080:8080 \
-v $(pwd)/voicegw-data:/data \
-e OPENAI_API_KEY=sk-... \
-e DEEPGRAM_API_KEY=dg_... \
mahimairaja/voicegateway:latest
Multi-arch images for linux/amd64 and linux/arm64.
Docker Hub →
Option 3: Docker Compose (recommended for self-hosting)
# docker-compose.yml
services:
voicegateway:
image: mahimairaja/voicegateway:latest
ports: ["8080:8080"]
volumes: ["./voicegw-data:/data"]
env_file: .env
dashboard:
image: mahimairaja/voicegateway-dashboard:latest
ports: ["9090:9090"]
volumes: ["./voicegw-data:/data:ro"]
depends_on: [voicegateway]
cp .env.example .env # edit with your API keys
docker compose up -d
open http://localhost:9090
Your first agent
from voicegateway import Gateway
from livekit.agents import AgentSession
gw = Gateway()
session = AgentSession(
stt=gw.stt("deepgram/nova-3"),
llm=gw.llm("openai/gpt-4o-mini"),
tts=gw.tts("cartesia/sonic-3:voice_id"),
)
Full tutorial: docs.voicegateway.dev/guide/first-agent
Manage from your coding agent (MCP)
VoiceGateway ships a first-class Model Context Protocol server. Your Claude Code, Cursor, or Codex instance can configure providers, create projects, check costs, and tail logs all through natural language.
Local (stdio)
pip install "voicegateway[mcp]"
claude mcp add voicegateway --command "voicegw mcp --transport stdio"
Remote (HTTP/SSE with bearer auth)
export VOICEGW_MCP_TOKEN=$(openssl rand -hex 32)
voicegw mcp --transport http --port 8090
Then in Claude Code:
claude mcp add voicegateway \
--transport sse \
--url https://your-host.fly.dev/mcp/sse \
--header "Authorization: Bearer $VOICEGW_MCP_TOKEN"
What you can ask your agent
- "List all my providers"
- "Add Deepgram with API key dg_live_..."
- "Create a project for Tony's Pizza with a $5 daily budget using the premium stack"
- "Show me yesterday's costs for tonys-pizza"
- "What's our P95 TTFB this week?"
- "Delete the dev-testing project" (agent shows preview, asks for confirmation)
17 tools available
| Category | Tools |
|---|---|
| Observability | get_health, get_provider_status, get_costs, get_latency_stats, get_logs |
| Providers | list_providers, get_provider, test_provider, add_provider, delete_provider |
| Models | list_models, register_model, delete_model |
| Projects | list_projects, get_project, create_project, delete_project |
Destructive operations (delete_*) require explicit confirm=True the agent receives a preview with impact details first and only deletes after you confirm. Full tool reference →
Projects
Organize agents into projects for per-project budgets and cost tracking:
# voicegw.yaml
projects:
restaurant-agent:
name: "Restaurant Receptionist"
description: "AI receptionist for Tony's Pizza"
default_stack: premium
daily_budget: 5.00
budget_action: warn # warn | throttle | block
tags: ["production", "client-ian"]
dev-testing:
name: "Development Testing"
default_stack: local
daily_budget: 0.00
tags: ["development"]
stacks:
premium:
stt: deepgram/nova-3
llm: openai/gpt-4o-mini
tts: cartesia/sonic-3
local:
stt: local/whisper-large-v3
llm: ollama/qwen2.5:3b
tts: local/kokoro
Use in code:
gw = Gateway()
# Tag requests with a project for cost attribution
stt = gw.stt("deepgram/nova-3", project="restaurant-agent")
# Or use a named stack (all three modalities at once)
stt, llm, tts = gw.stack("premium", project="restaurant-agent")
# Query project costs
gw.costs("today", project="restaurant-agent")
CLI:
voicegw projects # list all projects
voicegw project restaurant-agent # project details
voicegw costs --project restaurant-agent # project costs today
voicegw logs --project restaurant-agent # recent requests
Fallback Chains
Automatic failover across providers stay running even when a cloud provider has an outage.
# voicegw.yaml
fallbacks:
stt: [deepgram/nova-3, groq/whisper-large-v3, local/whisper-large-v3]
llm: [openai/gpt-4o-mini, groq/llama-3.3-70b, ollama/qwen2.5:7b]
tts: [cartesia/sonic-3, elevenlabs/eleven_turbo_v2_5, local/kokoro]
session = AgentSession(
stt=gw.stt_with_fallback(),
llm=gw.llm_with_fallback(),
tts=gw.tts_with_fallback(),
)
If Deepgram returns 500s, requests automatically route to Groq. If both fail, local Whisper kicks in. Your agent never goes offline.
Supported Models
11 providers across cloud and local. Add more with one line in voicegw.yaml or let your coding agent do it via MCP.
STT
| Model ID | Provider | Type |
|---|---|---|
deepgram/nova-3 |
Deepgram | cloud |
deepgram/nova-2-conversationalai |
Deepgram | cloud |
assemblyai/universal-2 |
AssemblyAI | cloud |
openai/whisper-1 |
OpenAI | cloud |
groq/whisper-large-v3 |
Groq | cloud |
local/whisper-large-v3 |
faster-whisper | local |
local/whisper-turbo |
faster-whisper | local |
LLM
| Model ID | Provider | Type |
|---|---|---|
openai/gpt-4.1 |
OpenAI | cloud |
openai/gpt-4o |
OpenAI | cloud |
openai/gpt-4o-mini |
OpenAI | cloud |
anthropic/claude-opus-4-7 |
Anthropic | cloud |
anthropic/claude-sonnet-4-6 |
Anthropic | cloud |
anthropic/claude-haiku-4-5 |
Anthropic | cloud |
groq/llama-3.3-70b-versatile |
Groq | cloud |
groq/llama-3.1-8b-instant |
Groq | cloud |
ollama/qwen2.5:7b |
Ollama | local |
ollama/qwen2.5:3b |
Ollama | local |
ollama/llama3.2:3b |
Ollama | local |
TTS
| Model ID | Provider | Type |
|---|---|---|
cartesia/sonic-3 |
Cartesia | cloud |
elevenlabs/eleven_turbo_v2_5 |
ElevenLabs | cloud |
elevenlabs/eleven_flash_v2_5 |
ElevenLabs | cloud |
deepgram/aura-2 |
Deepgram | cloud |
openai/tts-1-hd |
OpenAI | cloud |
local/kokoro |
Kokoro ONNX | local |
local/piper |
Piper | local |
Full reference: docs.voicegateway.dev/configuration/providers
Architecture
flowchart TB
A[LiveKit Agent] --> B[VoiceGateway]
B --> C[Router]
C --> D[Cloud Providers]
C --> E[Local Providers]
D --> D1[OpenAI · Deepgram · Anthropic · Cartesia · Groq · ElevenLabs · AssemblyAI]
E --> E1[Ollama · Whisper · Kokoro · Piper]
B --> F[Middleware Pipeline]
F --> F1[Cost Tracker]
F --> F2[Latency Monitor]
F --> F3[Budget Enforcer]
F --> F4[Fallback Router]
F --> G[(SQLite · encrypted)]
G --> H[Dashboard UI]
G --> I[MCP Server]
I --> J[Claude Code · Cursor · Codex]
Dashboard
A self-hosted web UI at http://localhost:9090 with:
- Overview - total requests, cost today, active models, project summary cards
- Settings - add/edit providers, register models, manage general config with Source badges (YAML vs Custom vs Env)
- Projects - full CRUD with budget gauges, cost charts, recent requests per project
- Costs - daily spend with per-provider/model/project breakdown
- Latency - P50/P95/P99 TTFB and total latency per model
- Logs - recent requests with filters for project, modality, status
API keys are encrypted with Fernet before storage. The sidebar project switcher filters every page.
HTTP API
voicegw serve --port 8080
| Endpoint | Purpose |
|---|---|
GET /health |
Health check |
GET /v1/status |
Provider health + model count |
GET /v1/models |
List registered models |
GET /v1/providers + CRUD |
Manage providers |
GET /v1/projects + CRUD |
Manage projects |
GET /v1/costs?period=today&project=X |
Cost summary |
GET /v1/latency?period=week |
Latency stats |
GET /v1/logs?project=X&modality=stt |
Request logs |
GET /v1/audit-log |
Config change history |
GET /v1/metrics |
Prometheus-format metrics |
Full reference: docs.voicegateway.dev/api/http-api
Installation
# Core engine
pip install voicegateway
# With web dashboard
pip install "voicegateway[dashboard]"
# With cloud providers (OpenAI, Deepgram, Anthropic, etc.)
pip install "voicegateway[cloud]"
# With local model runtimes (Whisper, Kokoro, Piper)
pip install "voicegateway[local]"
# With MCP server for agent management
pip install "voicegateway[mcp]"
# Everything
pip install "voicegateway[all,dashboard,mcp]"
Python 3.11+. MCP extra pulls in mcp>=1.2.0. Local extras pull larger ML runtimes.
Docker Compose
services:
voicegateway:
image: mahimailabs/voicegateway:latest
ports: ["8080:8080"]
env_file: .env
volumes:
- ./voicegw.yaml:/app/voicegw.yaml:ro
- voicegw_data:/data
dashboard:
image: mahimailabs/voicegateway-dashboard:latest
ports: ["9090:9090"]
depends_on: [voicegateway]
# Optional: local LLM with Ollama
ollama:
image: ollama/ollama
profiles: [local]
ports: ["11434:11434"]
docker compose up -d # core + dashboard
docker compose --profile local up -d # + Ollama for local LLMs
docker exec voicegateway-ollama ollama pull qwen2.5:3b
Contributing
We welcome provider additions, bug fixes, and documentation improvements.
git clone https://github.com/mahimailabs/voicegateway
cd voicegateway
pip install -e ".[all,dashboard,mcp,dev]"
pytest
Add a provider (10-step guide): docs.voicegateway.dev/contributing/adding-a-provider
Before submitting a PR, please read CONTRIBUTING.md and CODE_OF_CONDUCT.md.
License
MIT © Mahimai Labs
Built on the shoulders of giants: LiveKit Agents, FastAPI, Pydantic, cryptography, Model Context Protocol.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voicegateway-0.0.3.tar.gz.
File metadata
- Download URL: voicegateway-0.0.3.tar.gz
- Upload date:
- Size: 234.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2e5851f61cb564fb5469d32797ea21c8284931a221fa57a117c28b327a0b3eb
|
|
| MD5 |
b2670856a9727246c8ccbe1dc2e232d6
|
|
| BLAKE2b-256 |
fdd575ecc42c6a5bd5257a7aeb0e0de3c46625a8801e6574c7517543e115a52d
|
File details
Details for the file voicegateway-0.0.3-py3-none-any.whl.
File metadata
- Download URL: voicegateway-0.0.3-py3-none-any.whl
- Upload date:
- Size: 81.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebc5d765c07b8c57a6169fca2ac664b47b2bab6e0d13246e7eca4d1d85947756
|
|
| MD5 |
9adf195ac1414dc5f41bac03018ad027
|
|
| BLAKE2b-256 |
c61a2f10b4ba00c7f55a7f891ee11ef0f38b02ec3244d69dec13a15b49b2cec8
|