Skip to main content

Self-hosted inference gateway for voice AI — route STT, LLM, and TTS to any provider or local model

Project description

VoiceGateway

Self-hosted inference gateway for voice AI. One config. Any provider. Local models included.

A drop-in routing layer that gives self-hosters the same provider/model developer experience as LiveKit Inference Cloud — but with your API keys, local models (Ollama, Whisper, Kokoro, Piper), automatic fallback chains, project-based cost tracking, and a web dashboard.


Quick Start (Docker Compose)

# 1. Clone
git clone https://github.com/mahimairaja/livekit-inference-gateway.git
cd livekit-inference-gateway

# 2. Configure
cp .env.example .env
# Edit .env with your API keys

# 3. Start everything
docker compose up -d

# 4. Open the dashboard
open http://localhost:9090

# 5. (Optional) Start with local LLM
docker compose --profile local up -d
docker exec voicegateway-ollama ollama pull qwen2.5:3b

Quick Start (pip install)

pip install "voicegateway[cloud,local]"

voicegw init              # creates voicegw.yaml
# edit voicegw.yaml with your API keys
voicegw status            # check provider status
voicegw dashboard         # http://localhost:9090

Then in your agent:

from voicegateway import Gateway
from livekit.agents import AgentSession, Agent

gw = Gateway()

session = AgentSession(
    stt=gw.stt("deepgram/nova-3"),
    llm=gw.llm("openai/gpt-4.1-mini"),
    tts=gw.tts("cartesia/sonic-3:voice_id"),
)

Architecture

flowchart TB
    A[LiveKit Agent] --> B[VoiceGateway]
    B --> C[Model Router]
    C --> D[Cloud Providers]
    C --> E[Local Providers]
    D --> D1[OpenAI]
    D --> D2[Deepgram]
    D --> D3[Cartesia]
    D --> D4[Anthropic]
    D --> D5[Groq]
    D --> D6[ElevenLabs]
    D --> D7[AssemblyAI]
    E --> E1[Ollama]
    E --> E2[Whisper local]
    E --> E3[Kokoro]
    E --> E4[Piper]
    B --> F[Middleware]
    F --> F1[Cost Tracking]
    F --> F2[Latency Monitor]
    F --> F3[Fallback Chains]
    F --> F4[Rate Limiting]
    F --> G[(SQLite)]
    G --> H[Dashboard]
    B --> I[Projects]
    I --> I1[Budget Tracking]
    I --> I2[Per-Project Costs]
    I --> I3[Project Dashboard]

Projects

Organize agents into projects for per-project cost tracking and budgets:

# voicegw.yaml
projects:
  restaurant-agent:
    name: "Restaurant Receptionist"
    description: "AI receptionist for Tony's Pizza"
    default_stack: premium
    daily_budget: 5.00
    tags: ["production", "client-ian"]

  dev-testing:
    name: "Development Testing"
    default_stack: local
    daily_budget: 0.00
    tags: ["development"]

stacks:
  premium:
    stt: deepgram/nova-3
    llm: openai/gpt-4.1-mini
    tts: cartesia/sonic-3
  local:
    stt: local/whisper-large-v3
    llm: ollama/qwen2.5:3b
    tts: local/kokoro

Use in code:

gw = Gateway()

# Tag requests with a project
stt = gw.stt("deepgram/nova-3", project="restaurant-agent")

# Or use a named stack
stt, llm, tts = gw.stack("premium", project="restaurant-agent")

# Query project costs
gw.costs("today", project="restaurant-agent")

CLI:

voicegw projects                          # list all projects
voicegw project restaurant-agent          # project details
voicegw costs --project restaurant-agent  # project costs
voicegw logs --project restaurant-agent   # project logs

Supported Models

STT

Model ID Provider Type
deepgram/nova-3 Deepgram cloud
deepgram/nova-2 Deepgram cloud
assemblyai/universal-2 AssemblyAI cloud
openai/whisper-1 OpenAI cloud
groq/whisper-large-v3 Groq cloud
local/whisper-large-v3 faster-whisper local
local/whisper-turbo faster-whisper local
local/whisper-base faster-whisper local

LLM

Model ID Provider Type
openai/gpt-4.1-mini OpenAI cloud
openai/gpt-4o OpenAI cloud
openai/gpt-4o-mini OpenAI cloud
anthropic/claude-3.5-sonnet Anthropic cloud
groq/llama-3.1-70b Groq cloud
groq/llama-3.1-8b Groq cloud
ollama/qwen2.5:3b Ollama local
ollama/qwen2.5:7b Ollama local
ollama/llama3.2:3b Ollama local
ollama/phi4-mini Ollama local

TTS

Model ID Provider Type
cartesia/sonic-3 Cartesia cloud
elevenlabs/eleven_turbo_v2_5 ElevenLabs cloud
deepgram/aura-2 Deepgram cloud
openai/tts-1 OpenAI cloud
local/kokoro Kokoro ONNX local
local/piper Piper local

Fallback Chains

fallbacks:
  stt: [deepgram/nova-3, groq/whisper-large-v3, local/whisper-large-v3]
  llm: [openai/gpt-4.1-mini, groq/llama-3.1-70b, ollama/qwen2.5:3b]
  tts: [cartesia/sonic-3, elevenlabs/eleven_turbo_v2_5, local/kokoro]
session = AgentSession(
    stt=gw.stt_with_fallback(),
    llm=gw.llm_with_fallback(),
    tts=gw.tts_with_fallback(),
)

HTTP API (voicegw serve)

voicegw serve --port 8080
Endpoint Description
GET /health Health check
GET /v1/status Provider health
GET /v1/models Available models
GET /v1/costs?period=today&project=X Cost summary
GET /v1/projects Project list with stats
GET /v1/projects/:id Project details
GET /v1/logs?project=X&modality=stt Request logs
GET /v1/metrics Prometheus metrics

Dashboard

voicegw dashboard starts a web UI on port 9090 with Neo-Brutalism styling:

  • Overview — total requests, cost today, active models; project summary cards
  • Models — every configured model with provider and status
  • Costs — daily cost, per-provider/model/project breakdown
  • Latency — TTFB/total per model, P50/P95/P99
  • Logs — recent requests with modality and project filters

The sidebar includes a project switcher — selecting a project filters every page.


Docker Compose

Service Port Description
voicegateway 8080 HTTP API + model router
dashboard 9090 Web dashboard
ollama (optional) 11434 Local LLM (start with --profile local)
docker compose up -d                        # API + dashboard
docker compose --profile local up -d        # + Ollama

Config: ./voicegw.yaml mounted read-only. API keys in .env.


Comparison with LiveKit Inference

Feature LiveKit Inference (Cloud) VoiceGateway (self-host)
provider/model string interface Yes Yes
Cloud providers Managed by LiveKit Bring your own API keys
Local models (Ollama, Whisper, Kokoro) No Yes
Project-based organization No Yes
Cost tracking Per-account Per-request, per-project
Fallback chains Limited Fully configurable
Dashboard LiveKit Cloud UI Self-hosted
Docker Compose N/A One command
Works offline No Yes (with local models)
License Commercial Apache 2.0

Contributing

pip install -e ".[dev]"
pytest

To add a new provider: see voicegateway/core/registry.py and CLAUDE.md.


License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicegateway-0.0.1.tar.gz (46.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicegateway-0.0.1-py3-none-any.whl (54.4 kB view details)

Uploaded Python 3

File details

Details for the file voicegateway-0.0.1.tar.gz.

File metadata

  • Download URL: voicegateway-0.0.1.tar.gz
  • Upload date:
  • Size: 46.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for voicegateway-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4e3ad468287d6b9871ebfe3dc60a1cdc9c52d87bd7cec3c61cf88fc2d8ba7397
MD5 271908dd451d03d66be2c2aaad53c8d1
BLAKE2b-256 314ee03a532a5fa423e43c0ab57e65ee5cf01ef0b554bee92706e4f4a98c3078

See more details on using hashes here.

File details

Details for the file voicegateway-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for voicegateway-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8ade873a38bb4b268d39c919b13bb2b3e50a6919d36efa1803e0cfb897c83867
MD5 3d326ec47b50d3d4b971ba4b3064f40d
BLAKE2b-256 8238c30bcc1a9af4b5e9d92174671b7d3502e5143fb179aa987d6f79d20bf7e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page