Skip to main content

Ollama-style Voice Model Management - Self-hosted OpenAI-compatible Speech AI

Project description

Vocal

Ollama for Voice Models — Self-hosted Speech AI Platform

License: SSPL-1.0 Python 3.11+ PyPI CI Platform

Vocal manages STT (Speech-to-Text) and TTS (Text-to-Speech) models the way Ollama manages LLMs. It provides an OpenAI-compatible REST API, a Python SDK, and a CLI — with model download, caching, and multi-backend support built in.


Quick Start

# Run without installing
uvx --from vocal-ai vocal serve

# Or install permanently
pip install vocal-ai
vocal serve

Interactive API docs are at http://localhost:8000/docs.

# Pull a model and transcribe
vocal models pull Systran/faster-whisper-tiny
vocal transcribe your_audio.wav

# Text-to-speech (built-in, no download)
vocal speak "Hello, world!"

# Real-time microphone transcription
vocal listen

# Full voice agent (STT → LLM → TTS)
vocal chat    # requires Ollama running locally

Optional backends (base install already includes torch + faster-whisper + transformers + silero-vad):

Extra What you get Install
kokoro Kokoro-82M neural TTS, #1 on TTS Arena pip install "vocal-ai[kokoro]"
piper Piper offline TTS, fast, multilingual pip install "vocal-ai[piper]"
qwen3-tts Qwen3-TTS voice cloning (CUDA required) pip install "vocal-ai[qwen3-tts]"
whisperx WhisperX — word-level timestamps + diarization pip install "vocal-ai[whisperx]"
nemo NVIDIA NeMo STT (Parakeet-TDT, Canary-Qwen) pip install "vocal-ai[nemo]"
chatterbox Chatterbox voice cloning TTS pip install "vocal-ai[chatterbox]"

Missing a backend? The error message will tell you exactly which command to run.


Features

  • OpenAI-compatible/v1/audio/transcriptions, /v1/audio/speech, /v1/realtime
  • Ollama-style model management — pull, list, delete models from the CLI or API
  • Auto-generated SDK — typed Python client generated from the live OpenAPI spec
  • Streaming TTS — first audio bytes before full synthesis completes
  • WebSocket ASR — ~200 ms latency with server-side VAD
  • Voice agent — full STT → LLM → TTS loop, OpenAI Realtime protocol compatible
  • Voice selection — list and select voices per model
  • Voice cloning — clone a voice from a 3–30 s reference recording
  • Cross-platform — Windows, macOS, Linux (WSL supported)
  • GPU acceleration — automatic CUDA detection with VRAM optimization

Documentation

Getting Started Install, first transcription, platform notes
Available Models STT/TTS catalog, hardware guide
CLI Reference All commands with options
Configuration Environment variables, .env
Contributing Dev setup, PR workflow
Architecture Package structure, adapter pattern
Adding Models New STT/TTS backends
Testing Test tiers, CI, cross-platform
Release Process Version bump, PyPI publish

API Overview

Speech-to-Text (OpenAI-compatible)

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@audio.mp3" \
  -F "model=Systran/faster-whisper-tiny"

Text-to-Speech (OpenAI-compatible)

curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"model":"pyttsx3","input":"Hello, world!","response_format":"wav"}' \
  --output speech.wav

Voice Cloning

curl -X POST http://localhost:8000/v1/audio/clone \
  -F "text=Synthesize in my voice." \
  -F "reference_audio=@speaker.wav" \
  --output clone.wav

Model Management (Ollama-style)

curl http://localhost:8000/v1/models               # list
curl -X POST http://localhost:8000/v1/models/Systran/faster-whisper-tiny/download
curl -X DELETE http://localhost:8000/v1/models/Systran/faster-whisper-tiny

SDK

from vocal_sdk import VocalClient
from vocal_sdk.api.audio import text_to_speech_v1_audio_speech_post
from vocal_sdk.models import TTSRequest

client = VocalClient(base_url="http://localhost:8000")
audio = text_to_speech_v1_audio_speech_post.sync(
    client=client,
    body=TTSRequest(model="pyttsx3", input="Hello from the SDK."),
)
open("output.wav", "wb").write(audio)

Cross-Platform

Platform TTS Engine Notes
Windows SAPI5 (pyttsx3) Built-in, no extra install
macOS NSSpeechSynthesizer Built-in, no extra install
Linux / WSL espeak-ng (pyttsx3) sudo apt install espeak-ng ffmpeg

All audio formats (mp3, wav, opus, aac, flac, pcm) work on all platforms via ffmpeg.


Contributing

git clone https://github.com/niradler/vocal.git
cd vocal
make install
make lint && make test

See docs/developer/contributing.md for the full workflow.


License

Server Side Public License (SSPL-1.0) — free to use and self-host. If you offer Vocal as a managed service to third parties, you must open-source your full service stack under the same license.

Built with FastAPI, faster-whisper, HuggingFace Hub, and uv.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocal_ai-0.3.6.tar.gz (499.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocal_ai-0.3.6-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file vocal_ai-0.3.6.tar.gz.

File metadata

  • Download URL: vocal_ai-0.3.6.tar.gz
  • Upload date:
  • Size: 499.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for vocal_ai-0.3.6.tar.gz
Algorithm Hash digest
SHA256 c088067d21d871bd650e2d2a0169020b77ca328ec51bc6b7abab25a268bb1d85
MD5 c6cc051945ec886b41a4182b4043c0de
BLAKE2b-256 03ccaa46533b74bba5298e434069d774d5cf702ef511166c43b4d0b0337c63e5

See more details on using hashes here.

File details

Details for the file vocal_ai-0.3.6-py3-none-any.whl.

File metadata

  • Download URL: vocal_ai-0.3.6-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for vocal_ai-0.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b6758c7eb26c4ec88b06b47153e805675738a04cbb8e9e64cfaf88f25d4b710e
MD5 50769a5b41f025968d129c53e95b765c
BLAKE2b-256 56112d9edf5462b54a1e5429cddca45cc954cfdfad84fbf998d0dcd6aa49d29a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page