Self-hosted inference gateway for voice AI — route STT, LLM, and TTS to any provider or local model
Project description
VoiceGateway
Self-hosted inference gateway for voice AI. One config. Any provider. Local models included.
A drop-in routing layer that gives self-hosters the same provider/model developer experience as LiveKit Inference Cloud — but with your API keys, local models (Ollama, Whisper, Kokoro, Piper), automatic fallback chains, project-based cost tracking, and a web dashboard.
Quick Start (Docker Compose)
# 1. Clone
git clone https://github.com/mahimailabs/voicegateway.git
cd voicegateway
# 2. Configure
cp .env.example .env
# Edit .env with your API keys
# 3. Start everything
docker compose up -d
# 4. Open the dashboard
open http://localhost:9090
# 5. (Optional) Start with local LLM
docker compose --profile local up -d
docker exec voicegateway-ollama ollama pull qwen2.5:3b
Installation
Core engine (recommended):
pip install voicegateway
With web dashboard:
pip install "voicegateway[dashboard]"
With all cloud providers:
pip install "voicegateway[cloud]"
Everything:
pip install "voicegateway[all,dashboard]"
Quick Start (pip install)
pip install "voicegateway[cloud,dashboard]"
voicegw init # creates voicegw.yaml
# edit voicegw.yaml with your API keys
voicegw status # check provider status
voicegw dashboard # http://localhost:9090
Then in your agent:
from voicegateway import Gateway
from livekit.agents import AgentSession, Agent
gw = Gateway()
session = AgentSession(
stt=gw.stt("deepgram/nova-3"),
llm=gw.llm("openai/gpt-4.1-mini"),
tts=gw.tts("cartesia/sonic-3:voice_id"),
)
Manage from your coding agent (MCP)
VoiceGateway ships a first-class Model Context Protocol (MCP) server. Your Claude Code, Cursor, or Codex instance can manage the gateway conversationally — list providers, add API keys, register models, create projects, inspect costs and latency, tail logs.
Install:
pip install "voicegateway[mcp]"
Claude Code:
claude mcp add voicegateway --command "voicegw mcp --transport stdio"
Now in Claude Code you can say things like:
- "List all my providers"
- "Add Deepgram with API key dg_live_..."
- "Create a project for Tony's Pizza with a $5 daily budget using the premium stack"
- "Show me yesterday's costs for tonys-pizza"
- "What's our P95 TTFB this week?"
Remote / team deployment (HTTP/SSE):
export VOICEGW_MCP_TOKEN=$(openssl rand -hex 32)
voicegw mcp --transport http --port 8090
Then point your agent's MCP config at http://your-host:8090/sse with the
token as a bearer header.
Available tools (17): get_health, get_provider_status, get_costs,
get_latency_stats, list_providers, get_provider, test_provider,
add_provider, delete_provider, list_models, register_model,
delete_model, list_projects, get_project, create_project,
delete_project, get_logs.
Destructive operations (delete_*) require an explicit confirm=True — the
agent first receives a preview with impact details, shows it to you, and
only deletes after you confirm.
Full tool reference in docs/mcp.md.
Architecture
flowchart TB
A[LiveKit Agent] --> B[VoiceGateway]
B --> C[Model Router]
C --> D[Cloud Providers]
C --> E[Local Providers]
D --> D1[OpenAI]
D --> D2[Deepgram]
D --> D3[Cartesia]
D --> D4[Anthropic]
D --> D5[Groq]
D --> D6[ElevenLabs]
D --> D7[AssemblyAI]
E --> E1[Ollama]
E --> E2[Whisper local]
E --> E3[Kokoro]
E --> E4[Piper]
B --> F[Middleware]
F --> F1[Cost Tracking]
F --> F2[Latency Monitor]
F --> F3[Fallback Chains]
F --> F4[Rate Limiting]
F --> G[(SQLite)]
G --> H[Dashboard]
B --> I[Projects]
I --> I1[Budget Tracking]
I --> I2[Per-Project Costs]
I --> I3[Project Dashboard]
Projects
Organize agents into projects for per-project cost tracking and budgets:
# voicegw.yaml
projects:
restaurant-agent:
name: "Restaurant Receptionist"
description: "AI receptionist for Tony's Pizza"
default_stack: premium
daily_budget: 5.00
tags: ["production", "client-ian"]
dev-testing:
name: "Development Testing"
default_stack: local
daily_budget: 0.00
tags: ["development"]
stacks:
premium:
stt: deepgram/nova-3
llm: openai/gpt-4.1-mini
tts: cartesia/sonic-3
local:
stt: local/whisper-large-v3
llm: ollama/qwen2.5:3b
tts: local/kokoro
Use in code:
gw = Gateway()
# Tag requests with a project
stt = gw.stt("deepgram/nova-3", project="restaurant-agent")
# Or use a named stack
stt, llm, tts = gw.stack("premium", project="restaurant-agent")
# Query project costs
gw.costs("today", project="restaurant-agent")
CLI:
voicegw projects # list all projects
voicegw project restaurant-agent # project details
voicegw costs --project restaurant-agent # project costs
voicegw logs --project restaurant-agent # project logs
Supported Models
STT
| Model ID | Provider | Type |
|---|---|---|
deepgram/nova-3 |
Deepgram | cloud |
deepgram/nova-2 |
Deepgram | cloud |
assemblyai/universal-2 |
AssemblyAI | cloud |
openai/whisper-1 |
OpenAI | cloud |
groq/whisper-large-v3 |
Groq | cloud |
local/whisper-large-v3 |
faster-whisper | local |
local/whisper-turbo |
faster-whisper | local |
local/whisper-base |
faster-whisper | local |
LLM
| Model ID | Provider | Type |
|---|---|---|
openai/gpt-4.1-mini |
OpenAI | cloud |
openai/gpt-4o |
OpenAI | cloud |
openai/gpt-4o-mini |
OpenAI | cloud |
anthropic/claude-3.5-sonnet |
Anthropic | cloud |
groq/llama-3.1-70b |
Groq | cloud |
groq/llama-3.1-8b |
Groq | cloud |
ollama/qwen2.5:3b |
Ollama | local |
ollama/qwen2.5:7b |
Ollama | local |
ollama/llama3.2:3b |
Ollama | local |
ollama/phi4-mini |
Ollama | local |
TTS
| Model ID | Provider | Type |
|---|---|---|
cartesia/sonic-3 |
Cartesia | cloud |
elevenlabs/eleven_turbo_v2_5 |
ElevenLabs | cloud |
deepgram/aura-2 |
Deepgram | cloud |
openai/tts-1 |
OpenAI | cloud |
local/kokoro |
Kokoro ONNX | local |
local/piper |
Piper | local |
Fallback Chains
fallbacks:
stt: [deepgram/nova-3, groq/whisper-large-v3, local/whisper-large-v3]
llm: [openai/gpt-4.1-mini, groq/llama-3.1-70b, ollama/qwen2.5:3b]
tts: [cartesia/sonic-3, elevenlabs/eleven_turbo_v2_5, local/kokoro]
session = AgentSession(
stt=gw.stt_with_fallback(),
llm=gw.llm_with_fallback(),
tts=gw.tts_with_fallback(),
)
HTTP API (voicegw serve)
voicegw serve --port 8080
| Endpoint | Description |
|---|---|
GET /health |
Health check |
GET /v1/status |
Provider health |
GET /v1/models |
Available models |
GET /v1/costs?period=today&project=X |
Cost summary |
GET /v1/projects |
Project list with stats |
GET /v1/projects/:id |
Project details |
GET /v1/logs?project=X&modality=stt |
Request logs |
GET /v1/metrics |
Prometheus metrics |
Dashboard
voicegw dashboard starts a web UI on port 9090 with Neo-Brutalism styling:
- Overview — total requests, cost today, active models; project summary cards
- Models — every configured model with provider and status
- Costs — daily cost, per-provider/model/project breakdown
- Latency — TTFB/total per model, P50/P95/P99
- Logs — recent requests with modality and project filters
The sidebar includes a project switcher — selecting a project filters every page.
Docker Compose
| Service | Port | Description |
|---|---|---|
voicegateway |
8080 | HTTP API + model router |
dashboard |
9090 | Web dashboard |
ollama (optional) |
11434 | Local LLM (start with --profile local) |
docker compose up -d # API + dashboard
docker compose --profile local up -d # + Ollama
Config: ./voicegw.yaml mounted read-only. API keys in .env.
Comparison with LiveKit Inference
| Feature | LiveKit Inference (Cloud) | VoiceGateway (self-host) |
|---|---|---|
provider/model string interface |
Yes | Yes |
| Cloud providers | Managed by LiveKit | Bring your own API keys |
| Local models (Ollama, Whisper, Kokoro) | No | Yes |
| Project-based organization | No | Yes |
| Cost tracking | Per-account | Per-request, per-project |
| Fallback chains | Limited | Fully configurable |
| Dashboard | LiveKit Cloud UI | Self-hosted |
| Docker Compose | N/A | One command |
| Works offline | No | Yes (with local models) |
| License | Commercial | MIT |
Contributing
pip install -e ".[dev]"
pytest
To add a new provider: see voicegateway/core/registry.py and CLAUDE.md.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voicegateway-0.0.2.tar.gz.
File metadata
- Download URL: voicegateway-0.0.2.tar.gz
- Upload date:
- Size: 115.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f0ec7c6091272d70b01a3193fba6c1399b92496443fb5487374215e8dc928c5
|
|
| MD5 |
37b1f4d6c836e530c3ba1618e9a82a4a
|
|
| BLAKE2b-256 |
cb64c69a900948d99e5deb09e5ee5eb13276b11b8ce43cc862ff45b290224c9b
|
File details
Details for the file voicegateway-0.0.2-py3-none-any.whl.
File metadata
- Download URL: voicegateway-0.0.2-py3-none-any.whl
- Upload date:
- Size: 77.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecedbee95044f7bf62e0231824e2415a48a8056e136a4c1caafc7162a9add69d
|
|
| MD5 |
ea378f09a6f6da4a3dc0b06e1f54747f
|
|
| BLAKE2b-256 |
0c20a5c966cf8df489fd9d9e36bda9da9e2c627125b7fa16a6f642b5d6ed9be8
|