Skip to main content

VoxCPM TTS model with Apple Neural Engine backend server

Project description

VoxCPMANE

VoxCPM TTS model with Apple Neural Engine (ANE) backend server. CoreML models available in Huggingface repository.

  • 🎤 Voice Cloning: Support for custom voice prompts and cached voices
  • 📡 Streaming Support: Real-time audio streaming for low latency
  • 🎧 Server-side Playback: Direct audio playback on the server
  • 🌐 Web Interface: Interactive playground for testing

Voice Cloning

https://github.com/user-attachments/assets/02ffa400-b2fd-422e-a3ad-a0ea232a55aa

Included Voices Listen samples

https://github.com/user-attachments/assets/28880ed2-2e21-4eb4-b0ce-18a100403e87

Installation

Prerequisites

  • macOS with Apple Silicon for ANE acceleration
  • Python 3.9-3.12
  • uv package manager (recommended)

Quick Start

# Clone the repository
git clone https://github.com/0seba/VoxCPMANE.git
cd VoxCPMANE

# Install dependencies
uv sync

# Start the server
uv run voxcpmane-server -p ${PORT}

# Or run directly
uv run python -m voxcpmane.server

The server will start on http://localhost:8000 by default. You can access the web playground at the root URL.

Configuration

Command Line Options

uv run voxcpmane-server --help
  • --host: Host to bind the server to (default: 0.0.0.0)
  • --port: Port to run the server on (default: 8000)

API Reference

The server provides OpenAI-compatible endpoints for text-to-speech generation.

Base URL

http://localhost:8000

Request Model

All TTS endpoints accept the following request parameters:

{
  "model": "voxcpm-0.5b",           // Model identifier (fixed)
  "input": "Text to synthesize",     // Required: Text to generate speech for
  "voice": "voice_name",            // Optional: Use cached voice
  "prompt_wav_path": "/path/to/audio.wav",  // Optional: Path to prompt audio file
  "prompt_text": "Transcription of prompt audio",  // Optional: Text matching prompt audio
  "response_format": "wav",         // Optional: Audio format (wav, mp3, flac, opus, aac, pcm)
  "max_length": 2048,               // Optional: Max generated sequence length (1-2048)
  "cfg_value": 2.0,                 // Optional: Classifier-free guidance (0.0-10.0)
  "inference_timesteps": 10         // Optional: Diffusion steps (1-100)
}

Voice Selection

You have two options for voice control:

  1. Cached Voices: Use pre-computed voice embeddings

    • Set voice parameter to a cached voice name
    • Available voices can be listed via /voices endpoint
    • Ignores prompt_wav_path and prompt_text parameters
  2. Custom Voice Cloning: Provide your own audio prompt

    • Set prompt_wav_path to the path of local WAV file
    • Set prompt_text to the exact transcription of the audio
    • If prompt_wav_path is empty, generates with random voice

Parameters

  • max_length: Controls maximum generated audio length (each unit ≈ 0.04 seconds)
  • cfg_value: Classifier-free guidance strength.
  • inference_timesteps: Number of diffusion steps, defaults to 10.

Endpoints

1. Generate Speech (File)

POST /v1/audio/speech

Generates a complete audio file and returns it for download.

Request:

curl -X POST "http://localhost:8000/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "voxcpm-0.5b",
    "input": "Hello, this is a test of the VoxCPM TTS system.",
    "response_format": "wav"
  }'

Response: Binary audio file with appropriate Content-Type header

Supported formats: wav, mp3, flac, opus, aac, pcm

2. Stream Speech (Real-time)

POST /v1/audio/speech/stream

Streams audio chunks in real-time for low-latency playback.

Request:

curl -X POST "http://localhost:8000/v1/audio/speech/stream" \
  -H "Content-Type: application/json" \
  -H "Accept: application/octet-stream" \
  -d '{
    "model": "voxcpm-0.5b",
    "input": "This speech will be streamed in real-time.",
    "response_format": "pcm"
  }'

Response: Streaming binary audio data (16-bit PCM, 16kHz)

Headers:

  • X-Sample-Rate: Sample rate of the audio (typically 16000)

3. Play on Server

POST /v1/audio/speech/playback

Generates speech and plays it directly on the server with progress indicators.

Request:

{
  "model": "voxcpm-0.5b",
  "input": "This will play on the server"
}

Response:

{
  "status": "success",
  "message": "Audio playback completed on server",
  "duration_seconds": 5.23,
}

4. Cancel Generation

POST /v1/audio/speech/cancel

Cancels the currently running audio generation.

Request:

curl -X POST "http://localhost:8000/v1/audio/speech/cancel"

Response:

{
  "status": "success",
  "message": "Cancellation signal sent to Job 123"
}

5. List Available Voices

GET /voices

Returns a list of available cached voice names.

Request:

curl -X GET "http://localhost:8000/voices"

Response:

{
  "voices": ["voice1", "voice2", "voice3"],
  "count": 3,
  "cache_directory": "assets/caches"
}

6. Health Check

GET /health

Returns server status and current processing state.

Request:

curl -X GET "http://localhost:8000/health"

Response:

{
  "status": "healthy",
  "is_processing": true,
  "current_job_id": 123,
  "queue_pending": false,
  "model": "voxcpm-0.5b"
}

7. Web Playground

GET /

Interactive web interface for testing the TTS functionality.

Access at: http://localhost:{PORT}

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxcpmane-0.0.1.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voxcpmane-0.0.1-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file voxcpmane-0.0.1.tar.gz.

File metadata

  • Download URL: voxcpmane-0.0.1.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for voxcpmane-0.0.1.tar.gz
Algorithm Hash digest
SHA256 34bd63611be57d592c0f048401d0e1eccb6d637275a99aa775653604dd8e1b5c
MD5 890c14af7bab70d798500266aeb3e625
BLAKE2b-256 c2f9fdf4766a7cd888792bf0124518e31cbde76ecfb9af5d04b3a72d4764bf8a

See more details on using hashes here.

File details

Details for the file voxcpmane-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: voxcpmane-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 31.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for voxcpmane-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c9e30fc53994b98f22c78f7e53a123fe028569090b5aaacc42f5f90eb6891e11
MD5 755397101a14445b97995aa3f40e0c8b
BLAKE2b-256 6de311a2f5430171300e540b1ab55e938f20644c6d53ea08d16a05e38b8dd09e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page