Skip to main content

Self-hosted inference gateway for voice AI — route STT, LLM, and TTS to any provider or local model

Project description

VoiceGateway

Self-hosted inference gateway for voice AI. One config. Any provider. Local models included.

A drop-in routing layer that gives self-hosters the same provider/model developer experience as LiveKit Inference Cloud — but with your API keys, local models (Ollama, Whisper, Kokoro, Piper), automatic fallback chains, project-based cost tracking, and a web dashboard.


Quick Start (Docker Compose)

# 1. Clone
git clone https://github.com/mahimailabs/voicegateway.git
cd voicegateway

# 2. Configure
cp .env.example .env
# Edit .env with your API keys

# 3. Start everything
docker compose up -d

# 4. Open the dashboard
open http://localhost:9090

# 5. (Optional) Start with local LLM
docker compose --profile local up -d
docker exec voicegateway-ollama ollama pull qwen2.5:3b

Installation

Core engine (recommended):

pip install voicegateway

With web dashboard:

pip install "voicegateway[dashboard]"

With all cloud providers:

pip install "voicegateway[cloud]"

Everything:

pip install "voicegateway[all,dashboard]"

Quick Start (pip install)

pip install "voicegateway[cloud,dashboard]"

voicegw init              # creates voicegw.yaml
# edit voicegw.yaml with your API keys
voicegw status            # check provider status
voicegw dashboard         # http://localhost:9090

Then in your agent:

from voicegateway import Gateway
from livekit.agents import AgentSession, Agent

gw = Gateway()

session = AgentSession(
    stt=gw.stt("deepgram/nova-3"),
    llm=gw.llm("openai/gpt-4.1-mini"),
    tts=gw.tts("cartesia/sonic-3:voice_id"),
)

Manage from your coding agent (MCP)

VoiceGateway ships a first-class Model Context Protocol (MCP) server. Your Claude Code, Cursor, or Codex instance can manage the gateway conversationally — list providers, add API keys, register models, create projects, inspect costs and latency, tail logs.

Install:

pip install "voicegateway[mcp]"

Claude Code:

claude mcp add voicegateway --command "voicegw mcp --transport stdio"

Now in Claude Code you can say things like:

  • "List all my providers"
  • "Add Deepgram with API key dg_live_..."
  • "Create a project for Tony's Pizza with a $5 daily budget using the premium stack"
  • "Show me yesterday's costs for tonys-pizza"
  • "What's our P95 TTFB this week?"

Remote / team deployment (HTTP/SSE):

export VOICEGW_MCP_TOKEN=$(openssl rand -hex 32)
voicegw mcp --transport http --port 8090

Then point your agent's MCP config at http://your-host:8090/sse with the token as a bearer header.

Available tools (17): get_health, get_provider_status, get_costs, get_latency_stats, list_providers, get_provider, test_provider, add_provider, delete_provider, list_models, register_model, delete_model, list_projects, get_project, create_project, delete_project, get_logs.

Destructive operations (delete_*) require an explicit confirm=True — the agent first receives a preview with impact details, shows it to you, and only deletes after you confirm.

Full tool reference in docs/mcp.md.


Architecture

flowchart TB
    A[LiveKit Agent] --> B[VoiceGateway]
    B --> C[Model Router]
    C --> D[Cloud Providers]
    C --> E[Local Providers]
    D --> D1[OpenAI]
    D --> D2[Deepgram]
    D --> D3[Cartesia]
    D --> D4[Anthropic]
    D --> D5[Groq]
    D --> D6[ElevenLabs]
    D --> D7[AssemblyAI]
    E --> E1[Ollama]
    E --> E2[Whisper local]
    E --> E3[Kokoro]
    E --> E4[Piper]
    B --> F[Middleware]
    F --> F1[Cost Tracking]
    F --> F2[Latency Monitor]
    F --> F3[Fallback Chains]
    F --> F4[Rate Limiting]
    F --> G[(SQLite)]
    G --> H[Dashboard]
    B --> I[Projects]
    I --> I1[Budget Tracking]
    I --> I2[Per-Project Costs]
    I --> I3[Project Dashboard]

Projects

Organize agents into projects for per-project cost tracking and budgets:

# voicegw.yaml
projects:
  restaurant-agent:
    name: "Restaurant Receptionist"
    description: "AI receptionist for Tony's Pizza"
    default_stack: premium
    daily_budget: 5.00
    tags: ["production", "client-ian"]

  dev-testing:
    name: "Development Testing"
    default_stack: local
    daily_budget: 0.00
    tags: ["development"]

stacks:
  premium:
    stt: deepgram/nova-3
    llm: openai/gpt-4.1-mini
    tts: cartesia/sonic-3
  local:
    stt: local/whisper-large-v3
    llm: ollama/qwen2.5:3b
    tts: local/kokoro

Use in code:

gw = Gateway()

# Tag requests with a project
stt = gw.stt("deepgram/nova-3", project="restaurant-agent")

# Or use a named stack
stt, llm, tts = gw.stack("premium", project="restaurant-agent")

# Query project costs
gw.costs("today", project="restaurant-agent")

CLI:

voicegw projects                          # list all projects
voicegw project restaurant-agent          # project details
voicegw costs --project restaurant-agent  # project costs
voicegw logs --project restaurant-agent   # project logs

Supported Models

STT

Model ID Provider Type
deepgram/nova-3 Deepgram cloud
deepgram/nova-2 Deepgram cloud
assemblyai/universal-2 AssemblyAI cloud
openai/whisper-1 OpenAI cloud
groq/whisper-large-v3 Groq cloud
local/whisper-large-v3 faster-whisper local
local/whisper-turbo faster-whisper local
local/whisper-base faster-whisper local

LLM

Model ID Provider Type
openai/gpt-4.1-mini OpenAI cloud
openai/gpt-4o OpenAI cloud
openai/gpt-4o-mini OpenAI cloud
anthropic/claude-3.5-sonnet Anthropic cloud
groq/llama-3.1-70b Groq cloud
groq/llama-3.1-8b Groq cloud
ollama/qwen2.5:3b Ollama local
ollama/qwen2.5:7b Ollama local
ollama/llama3.2:3b Ollama local
ollama/phi4-mini Ollama local

TTS

Model ID Provider Type
cartesia/sonic-3 Cartesia cloud
elevenlabs/eleven_turbo_v2_5 ElevenLabs cloud
deepgram/aura-2 Deepgram cloud
openai/tts-1 OpenAI cloud
local/kokoro Kokoro ONNX local
local/piper Piper local

Fallback Chains

fallbacks:
  stt: [deepgram/nova-3, groq/whisper-large-v3, local/whisper-large-v3]
  llm: [openai/gpt-4.1-mini, groq/llama-3.1-70b, ollama/qwen2.5:3b]
  tts: [cartesia/sonic-3, elevenlabs/eleven_turbo_v2_5, local/kokoro]
session = AgentSession(
    stt=gw.stt_with_fallback(),
    llm=gw.llm_with_fallback(),
    tts=gw.tts_with_fallback(),
)

HTTP API (voicegw serve)

voicegw serve --port 8080
Endpoint Description
GET /health Health check
GET /v1/status Provider health
GET /v1/models Available models
GET /v1/costs?period=today&project=X Cost summary
GET /v1/projects Project list with stats
GET /v1/projects/:id Project details
GET /v1/logs?project=X&modality=stt Request logs
GET /v1/metrics Prometheus metrics

Dashboard

voicegw dashboard starts a web UI on port 9090 with Neo-Brutalism styling:

  • Overview — total requests, cost today, active models; project summary cards
  • Models — every configured model with provider and status
  • Costs — daily cost, per-provider/model/project breakdown
  • Latency — TTFB/total per model, P50/P95/P99
  • Logs — recent requests with modality and project filters

The sidebar includes a project switcher — selecting a project filters every page.


Docker Compose

Service Port Description
voicegateway 8080 HTTP API + model router
dashboard 9090 Web dashboard
ollama (optional) 11434 Local LLM (start with --profile local)
docker compose up -d                        # API + dashboard
docker compose --profile local up -d        # + Ollama

Config: ./voicegw.yaml mounted read-only. API keys in .env.


Comparison with LiveKit Inference

Feature LiveKit Inference (Cloud) VoiceGateway (self-host)
provider/model string interface Yes Yes
Cloud providers Managed by LiveKit Bring your own API keys
Local models (Ollama, Whisper, Kokoro) No Yes
Project-based organization No Yes
Cost tracking Per-account Per-request, per-project
Fallback chains Limited Fully configurable
Dashboard LiveKit Cloud UI Self-hosted
Docker Compose N/A One command
Works offline No Yes (with local models)
License Commercial MIT

Contributing

pip install -e ".[dev]"
pytest

To add a new provider: see voicegateway/core/registry.py and CLAUDE.md.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicegateway-0.0.2.tar.gz (115.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicegateway-0.0.2-py3-none-any.whl (77.3 kB view details)

Uploaded Python 3

File details

Details for the file voicegateway-0.0.2.tar.gz.

File metadata

  • Download URL: voicegateway-0.0.2.tar.gz
  • Upload date:
  • Size: 115.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for voicegateway-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1f0ec7c6091272d70b01a3193fba6c1399b92496443fb5487374215e8dc928c5
MD5 37b1f4d6c836e530c3ba1618e9a82a4a
BLAKE2b-256 cb64c69a900948d99e5deb09e5ee5eb13276b11b8ce43cc862ff45b290224c9b

See more details on using hashes here.

File details

Details for the file voicegateway-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: voicegateway-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 77.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for voicegateway-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ecedbee95044f7bf62e0231824e2415a48a8056e136a4c1caafc7162a9add69d
MD5 ea378f09a6f6da4a3dc0b06e1f54747f
BLAKE2b-256 0c20a5c966cf8df489fd9d9e36bda9da9e2c627125b7fa16a6f642b5d6ed9be8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page