OpenAI-compatible HTTP server for OmniVoice TTS

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

zamery

These details have not been verified by PyPI

Project description

omnivoice-server

OpenAI-compatible HTTP server for OmniVoice text-to-speech.

Author: zamery (@maemreyo) | Email: matthew.ngo1114@gmail.com

⚠️ Early Development Notice

This is a new repository built on top of OmniVoice (released 2026). Both the upstream model and this server wrapper are under active development. Expect:

API changes and breaking updates

Performance improvements as PyTorch MPS support matures

New features and bug fixes

Documentation updates

Current Status: Functional on CPU and CUDA. MPS (Apple Silicon) has known issues. See Verification Status below.

Features

OpenAI-compatible API - Drop-in replacement for OpenAI TTS endpoints
Three voice modes:
- Auto: Model selects voice automatically
- Design: Specify voice attributes (gender, age, accent, pitch, style)
- Clone: Voice cloning from reference audio
Voice profile management - Save and reuse cloned voices
Streaming synthesis - Low-latency sentence-level streaming
Concurrent requests - Configurable thread pool for parallel synthesis
Multiple audio formats - WAV and raw PCM output
Speed control - 0.25x to 4.0x playback speed
Optional authentication - Bearer token support
Production-ready - Request timeouts, health checks, metrics

Quick Start

Prerequisites

PyTorch must be installed before installing omnivoice-server. The correct PyTorch variant depends on your hardware:

# CPU only (works everywhere, but slow)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu

# NVIDIA GPU (CUDA) - recommended for production
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

# Apple Silicon (MPS) - currently broken, use CPU instead
# See docs/verification/MPS_ISSUE.md for details

For other CUDA versions or more options, see the official PyTorch installation guide.

Installation

# Option 1: Install from PyPI (recommended)
pip install omnivoice-server

# Option 2: Install with uv (faster)
uv tool install omnivoice-server

# Option 3: Install from GitHub (latest development version)
pip install git+https://github.com/maemreyo/omnivoice-server.git

# Option 4: Clone and install locally for development
git clone https://github.com/maemreyo/omnivoice-server.git
cd omnivoice-server
pip install -e .

Start the Server

# Basic usage (downloads model on first run)
omnivoice-server

# With custom settings
omnivoice-server --host 0.0.0.0 --port 8880 --device cuda

# With authentication
export OMNIVOICE_API_KEY="your-secret-key"
omnivoice-server

The server will start at http://127.0.0.1:8880 by default.

⚠️ Verification Status

Last Updated: 2026-04-04
Status: ✅ Working (CPU only)

Quick Summary

✅ System works - Produces clear, high-quality audio for English and Vietnamese
❌ MPS broken - Apple Silicon GPU has PyTorch bugs, use CPU instead
⚠️ CPU slow - RTF=4.92 (5x slower than real-time, ~10s per voice)
✅ No memory leaks - Stable memory usage verified

Benchmark Results (CPU)

Metric	Value	Status
Latency (mean)	10.2 seconds	⚠️ Slow
RTF (Real-Time Factor)	4.92	⚠️ 5x slower than real-time
Memory leak	None	✅ Stable
Audio quality	Excellent	✅ Clear speech

Production Recommendation

For production, deploy on NVIDIA GPU (CUDA):

20-25x faster than CPU (RTF~0.2)
Cloud options: AWS g5.xlarge (~~$1/hr), GCP T4/V100, RunPod (~~$0.40/hr)

Detailed reports: See docs/verification/ for full verification results and technical details.

Audio Samples

Listen to verified voice samples:

English (Female, American accent) - 199KB

Download English sample

Vietnamese (Female) - 203KB

Download Vietnamese sample

Both samples demonstrate clear, natural speech quality on CPU device.

First Request

curl -X POST http://127.0.0.1:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "omnivoice",
    "input": "Hello, this is OmniVoice text-to-speech!",
    "voice": "auto"
  }' \
  --output speech.wav

API Usage

Basic Synthesis

import httpx

response = httpx.post(
    "http://127.0.0.1:8880/v1/audio/speech",
    json={
        "model": "omnivoice",
        "input": "Hello world!",
        "voice": "auto",
        "response_format": "wav"
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Voice Design

Specify voice attributes to design a custom voice:

response = httpx.post(
    "http://127.0.0.1:8880/v1/audio/speech",
    json={
        "model": "omnivoice",
        "input": "This voice has specific attributes.",
        "voice": "design:female,british accent,young adult,high pitch"
    }
)

Available attributes:

Gender: male, female
Age: child, young adult, middle-aged, elderly
Pitch: very low, low, medium, high, very high
Style: whisper
Accent (English): American, British, Australian, Indian, Irish
Dialect (Chinese): 四川话, 陕西话, 粤语, 闽南话

Voice Cloning

Option 1: Save a Profile (Reusable)

# Create a profile
with open("reference.wav", "rb") as f:
    response = httpx.post(
        "http://127.0.0.1:8880/v1/voices/profiles",
        data={
            "profile_id": "my_voice",
            "ref_text": "This is the reference text."
        },
        files={"ref_audio": f}
    )

# Use the profile
response = httpx.post(
    "http://127.0.0.1:8880/v1/audio/speech",
    json={
        "model": "omnivoice",
        "input": "This uses my cloned voice.",
        "voice": "clone:my_voice"
    }
)

Option 2: One-Shot Cloning

with open("reference.wav", "rb") as f:
    response = httpx.post(
        "http://127.0.0.1:8880/v1/audio/speech/clone",
        data={
            "text": "This is one-shot cloning.",
            "ref_text": "Reference text."
        },
        files={"ref_audio": f}
    )

Streaming

Stream audio in real-time for lower latency:

with httpx.stream(
    "POST",
    "http://127.0.0.1:8880/v1/audio/speech",
    json={
        "model": "omnivoice",
        "input": "Long text to stream...",
        "voice": "auto",
        "stream": True
    }
) as response:
    for chunk in response.iter_bytes():
        # Process PCM audio chunks
        play_audio(chunk)

See examples/streaming_player.py for a complete example.

CLI Usage

# Start server with defaults
omnivoice-server

# Custom host and port
omnivoice-server --host 0.0.0.0 --port 8880

# Use GPU
omnivoice-server --device cuda

# Adjust inference quality (higher = better quality, slower)
omnivoice-server --num-step 32

# Enable authentication
omnivoice-server --api-key your-secret-key

# Adjust concurrency
omnivoice-server --max-concurrent 4

# Custom model path
omnivoice-server --model-id /path/to/local/model

Environment Variables

All CLI options can be set via environment variables with OMNIVOICE_ prefix:

export OMNIVOICE_HOST=0.0.0.0
export OMNIVOICE_PORT=8880
export OMNIVOICE_DEVICE=cuda
export OMNIVOICE_API_KEY=your-secret-key
export OMNIVOICE_NUM_STEP=32
export OMNIVOICE_MAX_CONCURRENT=4

omnivoice-server

Configuration

Option	Env Var	Default	Description
`--host`	`OMNIVOICE_HOST`	`127.0.0.1`	Bind host
`--port`	`OMNIVOICE_PORT`	`8880`	Bind port
`--device`	`OMNIVOICE_DEVICE`	`cpu`	Device: cpu, cuda (MPS broken)
`--num-step`	`OMNIVOICE_NUM_STEP`	`32`	Inference steps (1-64, higher=better quality)
`--max-concurrent`	`OMNIVOICE_MAX_CONCURRENT`	`2`	Max concurrent requests
`--api-key`	`OMNIVOICE_API_KEY`	`""`	Bearer token (empty = no auth)
`--model-id`	`OMNIVOICE_MODEL_ID`	`k2-fsa/OmniVoice`	HuggingFace repo or local path
`--profile-dir`	`OMNIVOICE_PROFILE_DIR`	`~/.omnivoice/profiles`	Voice profiles directory
`--log-level`	`OMNIVOICE_LOG_LEVEL`	`info`	Logging level

API Reference

Endpoints

`POST /v1/audio/speech`

Generate speech from text (OpenAI-compatible).

Request body:

{
  "model": "omnivoice",
  "input": "Text to synthesize",
  "voice": "auto",
  "response_format": "wav",
  "speed": 1.0,
  "stream": false,
  "num_step": 32
}

Response: Audio file (WAV or PCM)

`POST /v1/audio/speech/clone`

One-shot voice cloning (multipart form).

Form fields:

text (required): Text to synthesize
ref_audio (required): Reference audio file
ref_text (optional): Reference transcript
speed (optional): Playback speed (default: 1.0)
num_step (optional): Inference steps

Response: Audio file (WAV)

`GET /v1/voices`

List available voices and profiles.

Response:

{
  "voices": [
    {"id": "auto", "type": "auto", "description": "..."},
    {"id": "design:<attributes>", "type": "design", "description": "..."},
    {"id": "clone:my_voice", "type": "clone", "profile_id": "my_voice"}
  ],
  "design_attributes": {...},
  "total": 3
}

`POST /v1/voices/profiles`

Create a voice cloning profile.

Form fields:

profile_id (required): Unique identifier (alphanumeric, dashes, underscores)
ref_audio (required): Reference audio file
ref_text (optional): Reference transcript
overwrite (optional): Overwrite existing profile (default: false)

Response:

{
  "profile_id": "my_voice",
  "created_at": "2026-04-04T12:00:00Z",
  "ref_text": "Reference text"
}

`GET /v1/voices/profiles/{profile_id}`

Get profile details.

`PATCH /v1/voices/profiles/{profile_id}`

Update profile (ref_audio and/or ref_text).

`DELETE /v1/voices/profiles/{profile_id}`

Delete a profile.

`GET /v1/models`

List available models (OpenAI-compatible).

`GET /health`

Health check endpoint.

`GET /metrics`

Prometheus-style metrics.

Examples

See the examples/ directory:

python_client.py - Comprehensive Python client examples
streaming_player.py - Real-time streaming audio player
curl_examples.sh - cURL command examples

Run examples:

# Python client
cd examples
python python_client.py

# Streaming player (requires pyaudio)
pip install pyaudio
python streaming_player.py "Hello, this is streaming audio!"

# cURL examples
chmod +x curl_examples.sh
./curl_examples.sh

Docker Deployment

Quick Start with Docker Compose

# Start the server
docker-compose up -d

# View logs
docker-compose logs -f

# Stop the server
docker-compose down

The server will be available at http://localhost:8880. Voice profiles are persisted in the ./profiles directory.

Build and Run Manually

# Build the image
docker build -t omnivoice-server .

# Run the container
docker run -d \
  -p 8880:8880 \
  -v $(pwd)/profiles:/app/profiles \
  -e OMNIVOICE_API_KEY=your-secret-key \
  --name omnivoice \
  omnivoice-server

# View logs
docker logs -f omnivoice

Configuration

Set environment variables in docker-compose.yml or pass them with -e:

OMNIVOICE_HOST=0.0.0.0 - Bind host (must be 0.0.0.0 in Docker)
OMNIVOICE_PORT=8880 - Server port
OMNIVOICE_DEVICE=cpu - Device (cpu, cuda)
OMNIVOICE_NUM_STEP=32 - Inference steps
OMNIVOICE_API_KEY=secret - Optional authentication

For CUDA GPU support, see comments in docker-compose.yml.

Development

Setup

# Clone repository
git clone https://github.com/maemreyo/omnivoice-server.git
cd omnivoice-server

# Install with dev dependencies
pip install -e ".[dev]"

Run Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=omnivoice_server --cov-report=term-missing

# Run specific test
pytest tests/test_streaming.py -v

Code Quality

# Lint
ruff check omnivoice_server/ tests/

# Format
ruff format omnivoice_server/ tests/

# Type check
mypy omnivoice_server/

CI/CD

GitHub Actions workflow runs on every push:

Linting (ruff)
Type checking (mypy)
Tests (pytest)
Python 3.10, 3.11, 3.12

Hardware Requirements

CPU: 4+ cores recommended
RAM: 8GB minimum, 16GB recommended
GPU:
- ✅ NVIDIA GPU with CUDA - Recommended for production (20-25x faster than CPU)
- ❌ Apple Silicon (MPS) - Currently broken due to PyTorch bugs, do not use
- ✅ CPU - Works but slow (5x slower than real-time)
Storage: 3GB for model cache

Device Comparison

Device	Audio Quality	Speed (RTF)	Status
CPU	✅ Excellent	4.92 (slow)	Use for dev
MPS (Apple Silicon)	❌ Broken	N/A	Do not use
CUDA (NVIDIA GPU)	✅ Excellent	~0.2 (fast)	Use for prod

Note: Default device is now cpu due to MPS issues. See docs/verification/MPS_ISSUE.md for technical details.

Performance

Verified benchmark results (CPU, num_step=32):

Metric	Value
Latency	10.2 seconds per voice
RTF (Real-Time Factor)	4.92
Memory	Stable, no leaks

Expected performance on different hardware:

Hardware	num_step	Latency (short text)	RTF
CPU (Intel i7)	32	~10s	4.92
GPU (RTX 3090)	32	~0.5s	~0.2
Apple M1 Max (MPS)	32	❌ Broken audio	N/A

Streaming mode reduces perceived latency by sending audio as soon as the first sentence is ready.

Troubleshooting

Model Download Issues

The model is downloaded from HuggingFace on first run. If you encounter issues:

# Pre-download the model
python -c "from omnivoice import OmniVoice; OmniVoice.from_pretrained('k2-fsa/OmniVoice')"

# Or use a local model
omnivoice-server --model-id /path/to/local/model

CUDA Out of Memory

Reduce concurrent requests or use CPU:

omnivoice-server --max-concurrent 1 --device cpu

Audio Quality Issues

Increase inference steps for better quality:

omnivoice-server --num-step 32

Documentation

Comprehensive technical documentation is available in the docs/ directory:

Document	Description
verification/VERIFICATION_RESULTS.md	⭐ Verification results and benchmark data
verification/MPS_ISSUE.md	Technical analysis of Apple Silicon MPS bug
system/ecosystem.md	System context, hardware requirements, deployment
system/specification.md	Complete system specification
architecture/overview.md	Architecture diagrams and component maps
design/dataflow.md	Data flow and API design details

License

MIT

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes with tests
Run code quality checks
Submit a pull request

Acknowledgments

Built on top of OmniVoice by k2-fsa.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

zamery

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.4

May 12, 2026

0.2.3

Apr 25, 2026

0.2.2

Apr 20, 2026

0.2.1

Apr 18, 2026

0.2.0

Apr 17, 2026

0.1.2

Apr 17, 2026

0.1.1

Apr 16, 2026

This version

0.1.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnivoice_server-0.1.0.tar.gz (398.9 kB view details)

Uploaded Apr 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omnivoice_server-0.1.0-py3-none-any.whl (28.3 kB view details)

Uploaded Apr 5, 2026 Python 3

File details

Details for the file omnivoice_server-0.1.0.tar.gz.

File metadata

Download URL: omnivoice_server-0.1.0.tar.gz
Upload date: Apr 5, 2026
Size: 398.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omnivoice_server-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`21c0d69dbc454af214236b21a0212b3836ffe659fa8618ca99d7f495c381d3a6`
MD5	`9d3c4af60d5546ae2e777caacfcd45f0`
BLAKE2b-256	`b18302034315822c2a51108c216beb2231771769cbe12e7bfb6f83d32635e1c9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for omnivoice_server-0.1.0.tar.gz:

Publisher: publish.yml on maemreyo/omnivoice-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: omnivoice_server-0.1.0.tar.gz
- Subject digest: 21c0d69dbc454af214236b21a0212b3836ffe659fa8618ca99d7f495c381d3a6
- Sigstore transparency entry: 1237864999
- Sigstore integration time: Apr 5, 2026
Source repository:
- Permalink: maemreyo/omnivoice-server@1ad79fbcff1da98454c763f952ff4d566c784bdd
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/maemreyo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1ad79fbcff1da98454c763f952ff4d566c784bdd
- Trigger Event: release

File details

Details for the file omnivoice_server-0.1.0-py3-none-any.whl.

File metadata

Download URL: omnivoice_server-0.1.0-py3-none-any.whl
Upload date: Apr 5, 2026
Size: 28.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omnivoice_server-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba70e8e8a90a03dc896875f800c0d6f2f113b7c3fcd6f66b54be093220600287`
MD5	`47d3d878351bf1e34bb612dd1abbfab6`
BLAKE2b-256	`766a3c694d25c91b45ef96c9b2d27568187ae5dc3f490eef4659620ccdb054cd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for omnivoice_server-0.1.0-py3-none-any.whl:

Publisher: publish.yml on maemreyo/omnivoice-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: omnivoice_server-0.1.0-py3-none-any.whl
- Subject digest: ba70e8e8a90a03dc896875f800c0d6f2f113b7c3fcd6f66b54be093220600287
- Sigstore transparency entry: 1237865005
- Sigstore integration time: Apr 5, 2026
Source repository:
- Permalink: maemreyo/omnivoice-server@1ad79fbcff1da98454c763f952ff4d566c784bdd
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/maemreyo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1ad79fbcff1da98454c763f952ff4d566c784bdd
- Trigger Event: release

omnivoice-server 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

omnivoice-server

Features

Quick Start

Prerequisites

Installation

Start the Server

⚠️ Verification Status

Quick Summary

Benchmark Results (CPU)

Production Recommendation

Audio Samples

First Request

API Usage

Basic Synthesis

Voice Design

Voice Cloning

Option 1: Save a Profile (Reusable)

Option 2: One-Shot Cloning

Streaming

CLI Usage

Environment Variables

Configuration

API Reference

Endpoints

POST /v1/audio/speech

POST /v1/audio/speech/clone

GET /v1/voices

POST /v1/voices/profiles

GET /v1/voices/profiles/{profile_id}

PATCH /v1/voices/profiles/{profile_id}

DELETE /v1/voices/profiles/{profile_id}

GET /v1/models

GET /health

GET /metrics

Examples

Docker Deployment

Quick Start with Docker Compose

Build and Run Manually

Configuration

Development

Setup

Run Tests

Code Quality

CI/CD

Hardware Requirements

Device Comparison

Performance

Troubleshooting

Model Download Issues

CUDA Out of Memory

Audio Quality Issues

Documentation

License

Contributing

Acknowledgments

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

`POST /v1/audio/speech`

`POST /v1/audio/speech/clone`

`GET /v1/voices`

`POST /v1/voices/profiles`

`GET /v1/voices/profiles/{profile_id}`

`PATCH /v1/voices/profiles/{profile_id}`

`DELETE /v1/voices/profiles/{profile_id}`

`GET /v1/models`

`GET /health`

`GET /metrics`