Self-hosted inference gateway for voice AI — route STT, LLM, and TTS to any provider or local model

These details have not been verified by PyPI

Project links

Project description

VoiceGateway

Self-hosted inference gateway for voice AI. One config. Any provider. Local models included.

A drop-in routing layer that gives self-hosters the same provider/model developer experience as LiveKit Inference Cloud — but with your API keys, local models (Ollama, Whisper, Kokoro, Piper), automatic fallback chains, project-based cost tracking, and a web dashboard.

Quick Start (Docker Compose)

# 1. Clone
git clone https://github.com/mahimailabs/voicegateway.git
cd voicegateway

# 2. Configure
cp .env.example .env
# Edit .env with your API keys

# 3. Start everything
docker compose up -d

# 4. Open the dashboard
open http://localhost:9090

# 5. (Optional) Start with local LLM
docker compose --profile local up -d
docker exec voicegateway-ollama ollama pull qwen2.5:3b

Installation

Core engine (recommended):

pip install voicegateway

With web dashboard:

pip install "voicegateway[dashboard]"

With all cloud providers:

pip install "voicegateway[cloud]"

Everything:

pip install "voicegateway[all,dashboard]"

Quick Start (pip install)

pip install "voicegateway[cloud,dashboard]"

voicegw init              # creates voicegw.yaml
# edit voicegw.yaml with your API keys
voicegw status            # check provider status
voicegw dashboard         # http://localhost:9090

Then in your agent:

from voicegateway import Gateway
from livekit.agents import AgentSession, Agent

gw = Gateway()

session = AgentSession(
    stt=gw.stt("deepgram/nova-3"),
    llm=gw.llm("openai/gpt-4.1-mini"),
    tts=gw.tts("cartesia/sonic-3:voice_id"),
)

Manage from your coding agent (MCP)

VoiceGateway ships a first-class Model Context Protocol (MCP) server. Your Claude Code, Cursor, or Codex instance can manage the gateway conversationally — list providers, add API keys, register models, create projects, inspect costs and latency, tail logs.

Install:

pip install "voicegateway[mcp]"

Claude Code:

claude mcp add voicegateway --command "voicegw mcp --transport stdio"

Now in Claude Code you can say things like:

"List all my providers"
"Add Deepgram with API key dg_live_..."
"Create a project for Tony's Pizza with a $5 daily budget using the premium stack"
"Show me yesterday's costs for tonys-pizza"
"What's our P95 TTFB this week?"

Remote / team deployment (HTTP/SSE):

export VOICEGW_MCP_TOKEN=$(openssl rand -hex 32)
voicegw mcp --transport http --port 8090

Then point your agent's MCP config at http://your-host:8090/sse with the token as a bearer header.

Available tools (17): get_health, get_provider_status, get_costs, get_latency_stats, list_providers, get_provider, test_provider, add_provider, delete_provider, list_models, register_model, delete_model, list_projects, get_project, create_project, delete_project, get_logs.

Destructive operations (delete_*) require an explicit confirm=True — the agent first receives a preview with impact details, shows it to you, and only deletes after you confirm.

Full tool reference in docs/mcp.md.

Architecture

flowchart TB
    A[LiveKit Agent] --> B[VoiceGateway]
    B --> C[Model Router]
    C --> D[Cloud Providers]
    C --> E[Local Providers]
    D --> D1[OpenAI]
    D --> D2[Deepgram]
    D --> D3[Cartesia]
    D --> D4[Anthropic]
    D --> D5[Groq]
    D --> D6[ElevenLabs]
    D --> D7[AssemblyAI]
    E --> E1[Ollama]
    E --> E2[Whisper local]
    E --> E3[Kokoro]
    E --> E4[Piper]
    B --> F[Middleware]
    F --> F1[Cost Tracking]
    F --> F2[Latency Monitor]
    F --> F3[Fallback Chains]
    F --> F4[Rate Limiting]
    F --> G[(SQLite)]
    G --> H[Dashboard]
    B --> I[Projects]
    I --> I1[Budget Tracking]
    I --> I2[Per-Project Costs]
    I --> I3[Project Dashboard]

Projects

Organize agents into projects for per-project cost tracking and budgets:

# voicegw.yaml
projects:
  restaurant-agent:
    name: "Restaurant Receptionist"
    description: "AI receptionist for Tony's Pizza"
    default_stack: premium
    daily_budget: 5.00
    tags: ["production", "client-ian"]

  dev-testing:
    name: "Development Testing"
    default_stack: local
    daily_budget: 0.00
    tags: ["development"]

stacks:
  premium:
    stt: deepgram/nova-3
    llm: openai/gpt-4.1-mini
    tts: cartesia/sonic-3
  local:
    stt: local/whisper-large-v3
    llm: ollama/qwen2.5:3b
    tts: local/kokoro

Use in code:

gw = Gateway()

# Tag requests with a project
stt = gw.stt("deepgram/nova-3", project="restaurant-agent")

# Or use a named stack
stt, llm, tts = gw.stack("premium", project="restaurant-agent")

# Query project costs
gw.costs("today", project="restaurant-agent")

CLI:

voicegw projects                          # list all projects
voicegw project restaurant-agent          # project details
voicegw costs --project restaurant-agent  # project costs
voicegw logs --project restaurant-agent   # project logs

Supported Models

STT

Model ID	Provider	Type
`deepgram/nova-3`	Deepgram	cloud
`deepgram/nova-2`	Deepgram	cloud
`assemblyai/universal-2`	AssemblyAI	cloud
`openai/whisper-1`	OpenAI	cloud
`groq/whisper-large-v3`	Groq	cloud
`local/whisper-large-v3`	faster-whisper	local
`local/whisper-turbo`	faster-whisper	local
`local/whisper-base`	faster-whisper	local

LLM

Model ID	Provider	Type
`openai/gpt-4.1-mini`	OpenAI	cloud
`openai/gpt-4o`	OpenAI	cloud
`openai/gpt-4o-mini`	OpenAI	cloud
`anthropic/claude-3.5-sonnet`	Anthropic	cloud
`groq/llama-3.1-70b`	Groq	cloud
`groq/llama-3.1-8b`	Groq	cloud
`ollama/qwen2.5:3b`	Ollama	local
`ollama/qwen2.5:7b`	Ollama	local
`ollama/llama3.2:3b`	Ollama	local
`ollama/phi4-mini`	Ollama	local

TTS

Model ID	Provider	Type
`cartesia/sonic-3`	Cartesia	cloud
`elevenlabs/eleven_turbo_v2_5`	ElevenLabs	cloud
`deepgram/aura-2`	Deepgram	cloud
`openai/tts-1`	OpenAI	cloud
`local/kokoro`	Kokoro ONNX	local
`local/piper`	Piper	local

Fallback Chains

fallbacks:
  stt: [deepgram/nova-3, groq/whisper-large-v3, local/whisper-large-v3]
  llm: [openai/gpt-4.1-mini, groq/llama-3.1-70b, ollama/qwen2.5:3b]
  tts: [cartesia/sonic-3, elevenlabs/eleven_turbo_v2_5, local/kokoro]

session = AgentSession(
    stt=gw.stt_with_fallback(),
    llm=gw.llm_with_fallback(),
    tts=gw.tts_with_fallback(),
)

HTTP API (`voicegw serve`)

voicegw serve --port 8080

Endpoint	Description
`GET /health`	Health check
`GET /v1/status`	Provider health
`GET /v1/models`	Available models
`GET /v1/costs?period=today&project=X`	Cost summary
`GET /v1/projects`	Project list with stats
`GET /v1/projects/:id`	Project details
`GET /v1/logs?project=X&modality=stt`	Request logs
`GET /v1/metrics`	Prometheus metrics

Dashboard

voicegw dashboard starts a web UI on port 9090 with Neo-Brutalism styling:

Overview — total requests, cost today, active models; project summary cards
Models — every configured model with provider and status
Costs — daily cost, per-provider/model/project breakdown
Latency — TTFB/total per model, P50/P95/P99
Logs — recent requests with modality and project filters

The sidebar includes a project switcher — selecting a project filters every page.

Docker Compose

Service	Port	Description
`voicegateway`	8080	HTTP API + model router
`dashboard`	9090	Web dashboard
`ollama` (optional)	11434	Local LLM (start with `--profile local`)

docker compose up -d                        # API + dashboard
docker compose --profile local up -d        # + Ollama

Config: ./voicegw.yaml mounted read-only. API keys in .env.

Comparison with LiveKit Inference

Feature	LiveKit Inference (Cloud)	VoiceGateway (self-host)
`provider/model` string interface	Yes	Yes
Cloud providers	Managed by LiveKit	Bring your own API keys
Local models (Ollama, Whisper, Kokoro)	No	Yes
Project-based organization	No	Yes
Cost tracking	Per-account	Per-request, per-project
Fallback chains	Limited	Fully configurable
Dashboard	LiveKit Cloud UI	Self-hosted
Docker Compose	N/A	One command
Works offline	No	Yes (with local models)
License	Commercial	MIT

Contributing

pip install -e ".[dev]"
pytest

To add a new provider: see voicegateway/core/registry.py and CLAUDE.md.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.3

Apr 18, 2026

This version

0.0.2

Apr 17, 2026

0.0.1

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicegateway-0.0.2.tar.gz (115.2 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voicegateway-0.0.2-py3-none-any.whl (77.3 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file voicegateway-0.0.2.tar.gz.

File metadata

Download URL: voicegateway-0.0.2.tar.gz
Upload date: Apr 17, 2026
Size: 115.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for voicegateway-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`1f0ec7c6091272d70b01a3193fba6c1399b92496443fb5487374215e8dc928c5`
MD5	`37b1f4d6c836e530c3ba1618e9a82a4a`
BLAKE2b-256	`cb64c69a900948d99e5deb09e5ee5eb13276b11b8ce43cc862ff45b290224c9b`

See more details on using hashes here.

File details

Details for the file voicegateway-0.0.2-py3-none-any.whl.

File metadata

Download URL: voicegateway-0.0.2-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 77.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for voicegateway-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ecedbee95044f7bf62e0231824e2415a48a8056e136a4c1caafc7162a9add69d`
MD5	`ea378f09a6f6da4a3dc0b06e1f54747f`
BLAKE2b-256	`0c20a5c966cf8df489fd9d9e36bda9da9e2c627125b7fa16a6f642b5d6ed9be8`

See more details on using hashes here.

voicegateway 0.0.2

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

VoiceGateway

Quick Start (Docker Compose)

Installation

Quick Start (pip install)

Manage from your coding agent (MCP)

Architecture

Projects

Supported Models

STT

LLM

TTS

Fallback Chains

HTTP API (voicegw serve)

Dashboard

Docker Compose

Comparison with LiveKit Inference

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

HTTP API (`voicegw serve`)