Skip to main content

Ollama-style Voice Model Management - Self-hosted OpenAI-compatible Speech AI

Project description

Vocal

Ollama for Voice Models — Self-hosted Speech AI Platform

License: SSPL-1.0 Python 3.11+ PyPI CI Platform

Vocal manages STT (Speech-to-Text) and TTS (Text-to-Speech) models the way Ollama manages LLMs. It provides an OpenAI-compatible REST API, a Python SDK, and a CLI — with model download, caching, and multi-backend support built in.


Quick Start

# Run without installing
uvx --from vocal-ai vocal serve

# Or install permanently
pip install vocal-ai
vocal serve

Interactive API docs are at http://localhost:8000/docs.

# Pull a model and transcribe
vocal models pull Systran/faster-whisper-tiny
vocal transcribe your_audio.wav

# Text-to-speech (built-in, no download)
vocal speak "Hello, world!"

# Real-time microphone transcription
vocal listen

# Full voice agent (STT → LLM → TTS)
vocal chat    # requires Ollama running locally

Optional backends (base install already includes torch + faster-whisper + transformers + silero-vad):

Extra What you get Install
kokoro Kokoro-82M neural TTS, #1 on TTS Arena pip install "vocal-ai[kokoro]"
piper Piper offline TTS, fast, multilingual pip install "vocal-ai[piper]"
qwen3-tts Qwen3-TTS voice cloning (CUDA required) pip install "vocal-ai[qwen3-tts]"
whisperx WhisperX — word-level timestamps + diarization pip install "vocal-ai[whisperx]"
nemo NVIDIA NeMo STT (Parakeet-TDT, Canary-Qwen) pip install "vocal-ai[nemo]"
chatterbox Chatterbox voice cloning TTS pip install "vocal-ai[chatterbox]"
voxtral Voxtral-Mini-4B STT + TTS (CUDA, 16 GB+) pip install "vocal-ai[voxtral]"

Missing a backend? The error message will tell you exactly which command to run.


Features

  • OpenAI-compatible/v1/audio/transcriptions, /v1/audio/speech, /v1/realtime
  • Ollama-style model management — pull, list, delete models from the CLI or API
  • Auto-generated SDK — typed Python client generated from the live OpenAPI spec
  • Streaming TTS — first audio bytes before full synthesis completes
  • WebSocket ASR — ~200 ms latency with server-side VAD
  • Voice agent — full STT → LLM → TTS loop, OpenAI Realtime protocol compatible
  • Voice selection — list and select voices per model
  • Voice cloning — clone a voice from a 3–30 s reference recording
  • Cross-platform — Windows, macOS, Linux (WSL supported)
  • GPU acceleration — automatic CUDA detection with VRAM optimization

Documentation

Getting Started Install, first transcription, platform notes
Available Models STT/TTS catalog, hardware guide
CLI Reference All commands with options
Configuration Environment variables, .env
Contributing Dev setup, PR workflow
Architecture Package structure, adapter pattern
Adding Models New STT/TTS backends
Testing Test tiers, CI, cross-platform
Release Process Version bump, PyPI publish

API Overview

Speech-to-Text (OpenAI-compatible)

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@audio.mp3" \
  -F "model=Systran/faster-whisper-tiny"

Text-to-Speech (OpenAI-compatible)

curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"model":"pyttsx3","input":"Hello, world!","response_format":"wav"}' \
  --output speech.wav

Voice Cloning

curl -X POST http://localhost:8000/v1/audio/clone \
  -F "text=Synthesize in my voice." \
  -F "reference_audio=@speaker.wav" \
  --output clone.wav

Model Management (Ollama-style)

curl http://localhost:8000/v1/models               # list
curl -X POST http://localhost:8000/v1/models/Systran/faster-whisper-tiny/download
curl -X DELETE http://localhost:8000/v1/models/Systran/faster-whisper-tiny

SDK

from vocal_sdk import VocalClient
from vocal_sdk.api.audio import text_to_speech_v1_audio_speech_post
from vocal_sdk.models import TTSRequest

client = VocalClient(base_url="http://localhost:8000")
audio = text_to_speech_v1_audio_speech_post.sync(
    client=client,
    body=TTSRequest(model="pyttsx3", input="Hello from the SDK."),
)
open("output.wav", "wb").write(audio)

Cross-Platform

Platform TTS Engine Notes
Windows SAPI5 (pyttsx3) Built-in, no extra install
macOS NSSpeechSynthesizer Built-in, no extra install
Linux / WSL espeak-ng (pyttsx3) sudo apt install espeak-ng ffmpeg

All audio formats (mp3, wav, opus, aac, flac, pcm) work on all platforms via ffmpeg.


Contributing

git clone https://github.com/niradler/vocal.git
cd vocal
make install
make lint && make test

See docs/developer/contributing.md for the full workflow.


License

Server Side Public License (SSPL-1.0) — free to use and self-host. If you offer Vocal as a managed service to third parties, you must open-source your full service stack under the same license.

Built with FastAPI, faster-whisper, HuggingFace Hub, and uv.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocal_ai-0.3.7.tar.gz (508.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocal_ai-0.3.7-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file vocal_ai-0.3.7.tar.gz.

File metadata

  • Download URL: vocal_ai-0.3.7.tar.gz
  • Upload date:
  • Size: 508.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for vocal_ai-0.3.7.tar.gz
Algorithm Hash digest
SHA256 4e9a807fc25b78c3c8362c428d0b0ecd1037b121c2eb78bddb6f4aaa299bb28e
MD5 cf6cee8d64acf732395b9132bd512503
BLAKE2b-256 3086a27d86c3dc3c6779f79c2f76540bd8aac3640b534e4e0534fc24bc29cc88

See more details on using hashes here.

File details

Details for the file vocal_ai-0.3.7-py3-none-any.whl.

File metadata

  • Download URL: vocal_ai-0.3.7-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for vocal_ai-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 fddc7976a046d60911e22f66b7d4e6795c96fe53bb0c50207fccf46b80280fcc
MD5 d90d6b2efcf809a1d3f94361bc77b8fc
BLAKE2b-256 78b1be729665e47275acd0979a23a047de5d6b677b406bc67ff254b06076be9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page