VoxCPM TTS model with Apple Neural Engine backend server

Project description

VoxCPMANE

VoxCPM TTS model with Apple Neural Engine (ANE) backend server. CoreML models available in Huggingface repository.

🎤 Voice Cloning: Support for custom voice prompts and cached voices
📡 Streaming Support: Real-time audio streaming for low latency
🎧 Server-side Playback: Direct audio playback on the server
🌐 Web Interface: Interactive playground for testing

Voice Cloning

https://github.com/user-attachments/assets/02ffa400-b2fd-422e-a3ad-a0ea232a55aa

Included Voices Listen samples

https://github.com/user-attachments/assets/28880ed2-2e21-4eb4-b0ce-18a100403e87

Installation

Prerequisites

macOS with Apple Silicon for ANE acceleration
Python 3.9-3.12
uv package manager (recommended)

Install with `pip` or `uv`

uv pip install voxcpmane

pip install voxcpmane

The server will start on http://localhost:8000 by default. You can access the web playground at the root URL.

Configuration

Command Line Options

uv run voxcpmane-server --help

--host: Host to bind the server to (default: 0.0.0.0)
--port: Port to run the server on (default: 8000)

API Reference

The server provides OpenAI-compatible endpoints for text-to-speech generation.

Base URL

http://localhost:8000

Request Model

All TTS endpoints accept the following request parameters:

{
  "model": "voxcpm-0.5b",           // Model identifier (fixed)
  "input": "Text to synthesize",     // Required: Text to generate speech for
  "voice": "voice_name",            // Optional: Use cached voice
  "prompt_wav_path": "/path/to/audio.wav",  // Optional: Path to prompt audio file
  "prompt_text": "Transcription of prompt audio",  // Optional: Text matching prompt audio
  "response_format": "wav",         // Optional: Audio format (wav, mp3, flac, opus, aac, pcm)
  "max_length": 2048,               // Optional: Max generated sequence length (1-2048)
  "cfg_value": 2.0,                 // Optional: Classifier-free guidance (0.0-10.0)
  "inference_timesteps": 10         // Optional: Diffusion steps (1-100)
}

Voice Selection

You have two options for voice control:

Cached Voices: Use pre-computed voice embeddings
- Set voice parameter to a cached voice name
- Available voices can be listed via /voices endpoint
- Ignores prompt_wav_path and prompt_text parameters
Custom Voice Cloning: Provide your own audio prompt
- Set prompt_wav_path to the path of local WAV file
- Set prompt_text to the exact transcription of the audio
- If prompt_wav_path is empty, generates with random voice

Parameters

max_length: Controls maximum generated audio length (each unit ≈ 0.04 seconds)
cfg_value: Classifier-free guidance strength.
inference_timesteps: Number of diffusion steps, defaults to 10.

Endpoints

1. Generate Speech (File)

POST /v1/audio/speech

Generates a complete audio file and returns it for download.

Request:

curl -X POST "http://localhost:8000/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "voxcpm-0.5b",
    "input": "Hello, this is a test of the VoxCPM TTS system.",
    "response_format": "wav"
  }'

Response: Binary audio file with appropriate Content-Type header

Supported formats: wav, mp3, flac, opus, aac, pcm

2. Stream Speech (Real-time)

POST /v1/audio/speech/stream

Streams audio chunks in real-time for low-latency playback.

Request:

curl -X POST "http://localhost:8000/v1/audio/speech/stream" \
  -H "Content-Type: application/json" \
  -H "Accept: application/octet-stream" \
  -d '{
    "model": "voxcpm-0.5b",
    "input": "This speech will be streamed in real-time.",
    "response_format": "pcm"
  }'

Response: Streaming binary audio data (16-bit PCM, 16kHz)

Headers:

X-Sample-Rate: Sample rate of the audio (typically 16000)

3. Play on Server

POST /v1/audio/speech/playback

Generates speech and plays it directly on the server with progress indicators.

Request:

{
  "model": "voxcpm-0.5b",
  "input": "This will play on the server"
}

Response:

{
  "status": "success",
  "message": "Audio playback completed on server",
  "duration_seconds": 5.23,
}

4. Cancel Generation

POST /v1/audio/speech/cancel

Cancels the currently running audio generation.

Request:

curl -X POST "http://localhost:8000/v1/audio/speech/cancel"

Response:

{
  "status": "success",
  "message": "Cancellation signal sent to Job 123"
}

5. List Available Voices

GET /voices

Returns a list of available cached voice names.

Request:

curl -X GET "http://localhost:8000/voices"

Response:

{
  "voices": ["voice1", "voice2", "voice3"],
  "count": 3,
  "cache_directory": "assets/caches"
}

6. Health Check

GET /health

Returns server status and current processing state.

Request:

curl -X GET "http://localhost:8000/health"

Response:

{
  "status": "healthy",
  "is_processing": true,
  "current_job_id": 123,
  "queue_pending": false,
  "model": "voxcpm-0.5b"
}

7. Web Playground

GET /

Interactive web interface for testing the TTS functionality.

Access at: http://localhost:{PORT}

Acknowledgments

VoxCPM - Original TTS model

Project details

Release history Release notifications | RSS feed

0.0.5b4 pre-release

Dec 16, 2025

0.0.5b3 pre-release

Dec 15, 2025

0.0.5b2 pre-release

Dec 14, 2025

0.0.5b1 pre-release

Dec 14, 2025

0.0.4

Dec 8, 2025

0.0.3

Dec 2, 2025

This version

0.0.2

Nov 10, 2025

0.0.1.post1

Nov 7, 2025

0.0.1

Nov 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxcpmane-0.0.2.tar.gz (33.6 kB view details)

Uploaded Nov 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voxcpmane-0.0.2-py3-none-any.whl (32.3 kB view details)

Uploaded Nov 10, 2025 Python 3

File details

Details for the file voxcpmane-0.0.2.tar.gz.

File metadata

Download URL: voxcpmane-0.0.2.tar.gz
Upload date: Nov 10, 2025
Size: 33.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.7

File hashes

Hashes for voxcpmane-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`38f7f0d31167214ce01567833eb23c93987b4fc897a16680835e46503952fb4c`
MD5	`e7ae8a9d6d9fe0b904383d04783bde2d`
BLAKE2b-256	`a33e43706524ce917e0b742f1f6f62dcd238e88bd5e2fb122103ae814c0aeba8`

See more details on using hashes here.

File details

Details for the file voxcpmane-0.0.2-py3-none-any.whl.

File metadata

Download URL: voxcpmane-0.0.2-py3-none-any.whl
Upload date: Nov 10, 2025
Size: 32.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.7

File hashes

Hashes for voxcpmane-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`179bd99c6b8efa323bf81d449238cc60b467f2076752b7d60aee5b647a491b36`
MD5	`12211925503ff7756d34ec1bcdd7d1a2`
BLAKE2b-256	`957c10811a49934899949a3b72b423ec2818be67e33f3f5bb9ad1f1857c3727d`

See more details on using hashes here.

voxcpmane 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

VoxCPMANE

Voice Cloning

Included Voices Listen samples

Installation

Prerequisites

Install with pip or uv

Configuration

Command Line Options

API Reference

Base URL

Request Model

Voice Selection

Parameters

Endpoints

1. Generate Speech (File)

2. Stream Speech (Real-time)

3. Play on Server

4. Cancel Generation

5. List Available Voices

6. Health Check

7. Web Playground

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Install with `pip` or `uv`