OpenAI-compatible HTTP server for OmniVoice TTS
Project description
omnivoice-server
OpenAI-compatible HTTP server for OmniVoice text-to-speech.
Author: zamery (@maemreyo) | Email: matthew.ngo1114@gmail.com
⚠️ Early Development Notice
This is a new repository built on top of OmniVoice (released 2026). Both the upstream model and this server wrapper are under active development. Expect:
- API changes and breaking updates
- Performance improvements as PyTorch MPS support matures
- New features and bug fixes
- Documentation updates
Current Status: Functional on CPU and CUDA. MPS (Apple Silicon) has known issues. See Verification Status below.
Features
- OpenAI-compatible API - Drop-in replacement for OpenAI TTS endpoints
- Three voice modes:
- Auto: Model selects voice automatically
- Design: Specify voice attributes (gender, age, accent, pitch, style)
- Clone: Voice cloning from reference audio
- Voice profile management - Save and reuse cloned voices
- Streaming synthesis - Low-latency sentence-level streaming
- Concurrent requests - Configurable thread pool for parallel synthesis
- Multiple audio formats - WAV and raw PCM output
- Speed control - 0.25x to 4.0x playback speed
- Optional authentication - Bearer token support
- Production-ready - Request timeouts, health checks, metrics
Quick Start
Prerequisites
PyTorch must be installed before installing omnivoice-server. The correct PyTorch variant depends on your hardware:
# CPU only (works everywhere, but slow)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
# NVIDIA GPU (CUDA) - recommended for production
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
# Apple Silicon (MPS) - currently broken, use CPU instead
# See docs/verification/MPS_ISSUE.md for details
For other CUDA versions or more options, see the official PyTorch installation guide.
Installation
# Option 1: Install from PyPI (recommended)
pip install omnivoice-server
# Option 2: Install with uv (faster)
uv tool install omnivoice-server
# Option 3: Install from GitHub (latest development version)
pip install git+https://github.com/maemreyo/omnivoice-server.git
# Option 4: Clone and install locally for development
git clone https://github.com/maemreyo/omnivoice-server.git
cd omnivoice-server
pip install -e .
Start the Server
# Basic usage (downloads model on first run)
omnivoice-server
# With custom settings
omnivoice-server --host 0.0.0.0 --port 8880 --device cuda
# With authentication
export OMNIVOICE_API_KEY="your-secret-key"
omnivoice-server
The server will start at http://127.0.0.1:8880 by default.
⚠️ Verification Status
Last Updated: 2026-04-04
Status: ✅ Working (CPU only)
Quick Summary
- ✅ System works - Produces clear, high-quality audio for English and Vietnamese
- ❌ MPS broken - Apple Silicon GPU has PyTorch bugs, use CPU instead
- ⚠️ CPU slow - RTF=4.92 (5x slower than real-time, ~10s per voice)
- ✅ No memory leaks - Stable memory usage verified
Benchmark Results (CPU)
| Metric | Value | Status |
|---|---|---|
| Latency (mean) | 10.2 seconds | ⚠️ Slow |
| RTF (Real-Time Factor) | 4.92 | ⚠️ 5x slower than real-time |
| Memory leak | None | ✅ Stable |
| Audio quality | Excellent | ✅ Clear speech |
Production Recommendation
For production, deploy on NVIDIA GPU (CUDA):
- 20-25x faster than CPU (RTF~0.2)
- Cloud options: AWS g5.xlarge (
$1/hr), GCP T4/V100, RunPod ($0.40/hr)
Detailed reports: See docs/verification/ for full verification results and technical details.
Audio Samples
Listen to verified voice samples:
English (Female, American accent) - 199KB
Vietnamese (Female) - 203KB
Both samples demonstrate clear, natural speech quality on CPU device.
First Request
curl -X POST http://127.0.0.1:8880/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "omnivoice",
"input": "Hello, this is OmniVoice text-to-speech!",
"voice": "auto"
}' \
--output speech.wav
API Usage
Basic Synthesis
import httpx
response = httpx.post(
"http://127.0.0.1:8880/v1/audio/speech",
json={
"model": "omnivoice",
"input": "Hello world!",
"voice": "auto",
"response_format": "wav"
}
)
with open("output.wav", "wb") as f:
f.write(response.content)
Voice Design
Specify voice attributes to design a custom voice:
response = httpx.post(
"http://127.0.0.1:8880/v1/audio/speech",
json={
"model": "omnivoice",
"input": "This voice has specific attributes.",
"voice": "design:female,british accent,young adult,high pitch"
}
)
Available attributes:
- Gender: male, female
- Age: child, young adult, middle-aged, elderly
- Pitch: very low, low, medium, high, very high
- Style: whisper
- Accent (English): American, British, Australian, Indian, Irish
- Dialect (Chinese): 四川话, 陕西话, 粤语, 闽南话
Voice Cloning
Option 1: Save a Profile (Reusable)
# Create a profile
with open("reference.wav", "rb") as f:
response = httpx.post(
"http://127.0.0.1:8880/v1/voices/profiles",
data={
"profile_id": "my_voice",
"ref_text": "This is the reference text."
},
files={"ref_audio": f}
)
# Use the profile
response = httpx.post(
"http://127.0.0.1:8880/v1/audio/speech",
json={
"model": "omnivoice",
"input": "This uses my cloned voice.",
"voice": "clone:my_voice"
}
)
Option 2: One-Shot Cloning
with open("reference.wav", "rb") as f:
response = httpx.post(
"http://127.0.0.1:8880/v1/audio/speech/clone",
data={
"text": "This is one-shot cloning.",
"ref_text": "Reference text."
},
files={"ref_audio": f}
)
Streaming
Stream audio in real-time for lower latency:
with httpx.stream(
"POST",
"http://127.0.0.1:8880/v1/audio/speech",
json={
"model": "omnivoice",
"input": "Long text to stream...",
"voice": "auto",
"stream": True
}
) as response:
for chunk in response.iter_bytes():
# Process PCM audio chunks
play_audio(chunk)
See examples/streaming_player.py for a complete example.
CLI Usage
# Start server with defaults
omnivoice-server
# Custom host and port
omnivoice-server --host 0.0.0.0 --port 8880
# Use GPU
omnivoice-server --device cuda
# Adjust inference quality (higher = better quality, slower)
omnivoice-server --num-step 32
# Enable authentication
omnivoice-server --api-key your-secret-key
# Adjust concurrency
omnivoice-server --max-concurrent 4
# Custom model path
omnivoice-server --model-id /path/to/local/model
Environment Variables
All CLI options can be set via environment variables with OMNIVOICE_ prefix:
export OMNIVOICE_HOST=0.0.0.0
export OMNIVOICE_PORT=8880
export OMNIVOICE_DEVICE=cuda
export OMNIVOICE_API_KEY=your-secret-key
export OMNIVOICE_NUM_STEP=32
export OMNIVOICE_MAX_CONCURRENT=4
omnivoice-server
Configuration
| Option | Env Var | Default | Description |
|---|---|---|---|
--host |
OMNIVOICE_HOST |
127.0.0.1 |
Bind host |
--port |
OMNIVOICE_PORT |
8880 |
Bind port |
--device |
OMNIVOICE_DEVICE |
cpu |
Device: cpu, cuda (MPS broken) |
--num-step |
OMNIVOICE_NUM_STEP |
32 |
Inference steps (1-64, higher=better quality) |
--max-concurrent |
OMNIVOICE_MAX_CONCURRENT |
2 |
Max concurrent requests |
--api-key |
OMNIVOICE_API_KEY |
"" |
Bearer token (empty = no auth) |
--model-id |
OMNIVOICE_MODEL_ID |
k2-fsa/OmniVoice |
HuggingFace repo or local path |
--profile-dir |
OMNIVOICE_PROFILE_DIR |
~/.omnivoice/profiles |
Voice profiles directory |
--log-level |
OMNIVOICE_LOG_LEVEL |
info |
Logging level |
API Reference
Endpoints
POST /v1/audio/speech
Generate speech from text (OpenAI-compatible).
Request body:
{
"model": "omnivoice",
"input": "Text to synthesize",
"voice": "auto",
"response_format": "wav",
"speed": 1.0,
"stream": false,
"num_step": 32
}
Response: Audio file (WAV or PCM)
POST /v1/audio/speech/clone
One-shot voice cloning (multipart form).
Form fields:
text(required): Text to synthesizeref_audio(required): Reference audio fileref_text(optional): Reference transcriptspeed(optional): Playback speed (default: 1.0)num_step(optional): Inference steps
Response: Audio file (WAV)
GET /v1/voices
List available voices and profiles.
Response:
{
"voices": [
{"id": "auto", "type": "auto", "description": "..."},
{"id": "design:<attributes>", "type": "design", "description": "..."},
{"id": "clone:my_voice", "type": "clone", "profile_id": "my_voice"}
],
"design_attributes": {...},
"total": 3
}
POST /v1/voices/profiles
Create a voice cloning profile.
Form fields:
profile_id(required): Unique identifier (alphanumeric, dashes, underscores)ref_audio(required): Reference audio fileref_text(optional): Reference transcriptoverwrite(optional): Overwrite existing profile (default: false)
Response:
{
"profile_id": "my_voice",
"created_at": "2026-04-04T12:00:00Z",
"ref_text": "Reference text"
}
GET /v1/voices/profiles/{profile_id}
Get profile details.
PATCH /v1/voices/profiles/{profile_id}
Update profile (ref_audio and/or ref_text).
DELETE /v1/voices/profiles/{profile_id}
Delete a profile.
GET /v1/models
List available models (OpenAI-compatible).
GET /health
Health check endpoint.
GET /metrics
Prometheus-style metrics.
Examples
See the examples/ directory:
python_client.py- Comprehensive Python client examplesstreaming_player.py- Real-time streaming audio playercurl_examples.sh- cURL command examples
Run examples:
# Python client
cd examples
python python_client.py
# Streaming player (requires pyaudio)
pip install pyaudio
python streaming_player.py "Hello, this is streaming audio!"
# cURL examples
chmod +x curl_examples.sh
./curl_examples.sh
Docker Deployment
Quick Start with Docker Compose
# Start the server
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the server
docker-compose down
The server will be available at http://localhost:8880. Voice profiles are persisted in the ./profiles directory.
Build and Run Manually
# Build the image
docker build -t omnivoice-server .
# Run the container
docker run -d \
-p 8880:8880 \
-v $(pwd)/profiles:/app/profiles \
-e OMNIVOICE_API_KEY=your-secret-key \
--name omnivoice \
omnivoice-server
# View logs
docker logs -f omnivoice
Configuration
Set environment variables in docker-compose.yml or pass them with -e:
OMNIVOICE_HOST=0.0.0.0- Bind host (must be 0.0.0.0 in Docker)OMNIVOICE_PORT=8880- Server portOMNIVOICE_DEVICE=cpu- Device (cpu, cuda)OMNIVOICE_NUM_STEP=32- Inference stepsOMNIVOICE_API_KEY=secret- Optional authentication
For CUDA GPU support, see comments in docker-compose.yml.
Development
Setup
# Clone repository
git clone https://github.com/maemreyo/omnivoice-server.git
cd omnivoice-server
# Install with dev dependencies
pip install -e ".[dev]"
Run Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=omnivoice_server --cov-report=term-missing
# Run specific test
pytest tests/test_streaming.py -v
Code Quality
# Lint
ruff check omnivoice_server/ tests/
# Format
ruff format omnivoice_server/ tests/
# Type check
mypy omnivoice_server/
CI/CD
GitHub Actions workflow runs on every push:
- Linting (ruff)
- Type checking (mypy)
- Tests (pytest)
- Python 3.10, 3.11, 3.12
Hardware Requirements
- CPU: 4+ cores recommended
- RAM: 8GB minimum, 16GB recommended
- GPU:
- ✅ NVIDIA GPU with CUDA - Recommended for production (20-25x faster than CPU)
- ❌ Apple Silicon (MPS) - Currently broken due to PyTorch bugs, do not use
- ✅ CPU - Works but slow (5x slower than real-time)
- Storage: 3GB for model cache
Device Comparison
| Device | Audio Quality | Speed (RTF) | Status |
|---|---|---|---|
| CPU | ✅ Excellent | 4.92 (slow) | Use for dev |
| MPS (Apple Silicon) | ❌ Broken | N/A | Do not use |
| CUDA (NVIDIA GPU) | ✅ Excellent | ~0.2 (fast) | Use for prod |
Note: Default device is now cpu due to MPS issues. See docs/verification/MPS_ISSUE.md for technical details.
Performance
Verified benchmark results (CPU, num_step=32):
| Metric | Value |
|---|---|
| Latency | 10.2 seconds per voice |
| RTF (Real-Time Factor) | 4.92 |
| Memory | Stable, no leaks |
Expected performance on different hardware:
| Hardware | num_step | Latency (short text) | RTF |
|---|---|---|---|
| CPU (Intel i7) | 32 | ~10s | 4.92 |
| GPU (RTX 3090) | 32 | ~0.5s | ~0.2 |
| Apple M1 Max (MPS) | 32 | ❌ Broken audio | N/A |
Streaming mode reduces perceived latency by sending audio as soon as the first sentence is ready.
Troubleshooting
Model Download Issues
The model is downloaded from HuggingFace on first run. If you encounter issues:
# Pre-download the model
python -c "from omnivoice import OmniVoice; OmniVoice.from_pretrained('k2-fsa/OmniVoice')"
# Or use a local model
omnivoice-server --model-id /path/to/local/model
CUDA Out of Memory
Reduce concurrent requests or use CPU:
omnivoice-server --max-concurrent 1 --device cpu
Audio Quality Issues
Increase inference steps for better quality:
omnivoice-server --num-step 32
Documentation
Comprehensive technical documentation is available in the docs/ directory:
| Document | Description |
|---|---|
| verification/VERIFICATION_RESULTS.md | ⭐ Verification results and benchmark data |
| verification/MPS_ISSUE.md | Technical analysis of Apple Silicon MPS bug |
| system/ecosystem.md | System context, hardware requirements, deployment |
| system/specification.md | Complete system specification |
| architecture/overview.md | Architecture diagrams and component maps |
| design/dataflow.md | Data flow and API design details |
License
MIT
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run code quality checks
- Submit a pull request
Acknowledgments
Built on top of OmniVoice by k2-fsa.
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omnivoice_server-0.1.0.tar.gz.
File metadata
- Download URL: omnivoice_server-0.1.0.tar.gz
- Upload date:
- Size: 398.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21c0d69dbc454af214236b21a0212b3836ffe659fa8618ca99d7f495c381d3a6
|
|
| MD5 |
9d3c4af60d5546ae2e777caacfcd45f0
|
|
| BLAKE2b-256 |
b18302034315822c2a51108c216beb2231771769cbe12e7bfb6f83d32635e1c9
|
Provenance
The following attestation bundles were made for omnivoice_server-0.1.0.tar.gz:
Publisher:
publish.yml on maemreyo/omnivoice-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omnivoice_server-0.1.0.tar.gz -
Subject digest:
21c0d69dbc454af214236b21a0212b3836ffe659fa8618ca99d7f495c381d3a6 - Sigstore transparency entry: 1237864999
- Sigstore integration time:
-
Permalink:
maemreyo/omnivoice-server@1ad79fbcff1da98454c763f952ff4d566c784bdd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/maemreyo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1ad79fbcff1da98454c763f952ff4d566c784bdd -
Trigger Event:
release
-
Statement type:
File details
Details for the file omnivoice_server-0.1.0-py3-none-any.whl.
File metadata
- Download URL: omnivoice_server-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba70e8e8a90a03dc896875f800c0d6f2f113b7c3fcd6f66b54be093220600287
|
|
| MD5 |
47d3d878351bf1e34bb612dd1abbfab6
|
|
| BLAKE2b-256 |
766a3c694d25c91b45ef96c9b2d27568187ae5dc3f490eef4659620ccdb054cd
|
Provenance
The following attestation bundles were made for omnivoice_server-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on maemreyo/omnivoice-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omnivoice_server-0.1.0-py3-none-any.whl -
Subject digest:
ba70e8e8a90a03dc896875f800c0d6f2f113b7c3fcd6f66b54be093220600287 - Sigstore transparency entry: 1237865005
- Sigstore integration time:
-
Permalink:
maemreyo/omnivoice-server@1ad79fbcff1da98454c763f952ff4d566c784bdd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/maemreyo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1ad79fbcff1da98454c763f952ff4d566c784bdd -
Trigger Event:
release
-
Statement type: