Skip to main content

VoxCPM TTS model with Apple Neural Engine backend server

Project description

VoxCPMANE

VoxCPM TTS model with Apple Neural Engine (ANE) backend server. CoreML models available in Huggingface repository.

  • 🎤 Voice Cloning: Support for custom voice prompts and cached voices
  • 📡 Streaming Support: Real-time audio streaming for low latency
  • 🎧 Server-side Playback: Direct audio playback on the server
  • 🌐 Web Interface: Interactive playground for testing

Voice Cloning

https://github.com/user-attachments/assets/02ffa400-b2fd-422e-a3ad-a0ea232a55aa

Included Voices Listen samples

https://github.com/user-attachments/assets/28880ed2-2e21-4eb4-b0ce-18a100403e87

Installation

Prerequisites

  • macOS with Apple Silicon for ANE acceleration
  • Python 3.9-3.12
  • uv package manager (recommended)

Install with pip or uv

uv pip install voxcpmane
pip install voxcpmane

The server will start on http://localhost:8000 by default. You can access the web playground at the root URL.

Configuration

Command Line Options

uv run voxcpmane-server --help
  • --host: Host to bind the server to (default: 0.0.0.0)
  • --port: Port to run the server on (default: 8000)

API Reference

The server provides OpenAI-compatible endpoints for text-to-speech generation.

Base URL

http://localhost:8000

Request Model

All TTS endpoints accept the following request parameters:

{
  "model": "voxcpm-0.5b",           // Model identifier (fixed)
  "input": "Text to synthesize",     // Required: Text to generate speech for
  "voice": "voice_name",            // Optional: Use cached voice
  "prompt_wav_path": "/path/to/audio.wav",  // Optional: Path to prompt audio file
  "prompt_text": "Transcription of prompt audio",  // Optional: Text matching prompt audio
  "response_format": "wav",         // Optional: Audio format (wav, mp3, flac, opus, aac, pcm)
  "max_length": 2048,               // Optional: Max generated sequence length (1-2048)
  "cfg_value": 2.0,                 // Optional: Classifier-free guidance (0.0-10.0)
  "inference_timesteps": 10         // Optional: Diffusion steps (1-100)
}

Voice Selection

You have two options for voice control:

  1. Cached Voices: Use pre-computed voice embeddings

    • Set voice parameter to a cached voice name
    • Available voices can be listed via /voices endpoint
    • Ignores prompt_wav_path and prompt_text parameters
  2. Custom Voice Cloning: Provide your own audio prompt

    • Set prompt_wav_path to the path of local WAV file
    • Set prompt_text to the exact transcription of the audio
    • If prompt_wav_path is empty, generates with random voice

Parameters

  • max_length: Controls maximum generated audio length (each unit ≈ 0.04 seconds)
  • cfg_value: Classifier-free guidance strength.
  • inference_timesteps: Number of diffusion steps, defaults to 10.

Endpoints

1. Generate Speech (File)

POST /v1/audio/speech

Generates a complete audio file and returns it for download.

Request:

curl -X POST "http://localhost:8000/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "voxcpm-0.5b",
    "input": "Hello, this is a test of the VoxCPM TTS system.",
    "response_format": "wav"
  }'

Response: Binary audio file with appropriate Content-Type header

Supported formats: wav, mp3, flac, opus, aac, pcm

2. Stream Speech (Real-time)

POST /v1/audio/speech/stream

Streams audio chunks in real-time for low-latency playback.

Request:

curl -X POST "http://localhost:8000/v1/audio/speech/stream" \
  -H "Content-Type: application/json" \
  -H "Accept: application/octet-stream" \
  -d '{
    "model": "voxcpm-0.5b",
    "input": "This speech will be streamed in real-time.",
    "response_format": "pcm"
  }'

Response: Streaming binary audio data (16-bit PCM, 16kHz)

Headers:

  • X-Sample-Rate: Sample rate of the audio (typically 16000)

3. Play on Server

POST /v1/audio/speech/playback

Generates speech and plays it directly on the server with progress indicators.

Request:

{
  "model": "voxcpm-0.5b",
  "input": "This will play on the server"
}

Response:

{
  "status": "success",
  "message": "Audio playback completed on server",
  "duration_seconds": 5.23,
}

4. Cancel Generation

POST /v1/audio/speech/cancel

Cancels the currently running audio generation.

Request:

curl -X POST "http://localhost:8000/v1/audio/speech/cancel"

Response:

{
  "status": "success",
  "message": "Cancellation signal sent to Job 123"
}

5. List Available Voices

GET /voices

Returns a list of available cached voice names.

Request:

curl -X GET "http://localhost:8000/voices"

Response:

{
  "voices": ["voice1", "voice2", "voice3"],
  "count": 3,
  "cache_directory": "assets/caches"
}

6. Health Check

GET /health

Returns server status and current processing state.

Request:

curl -X GET "http://localhost:8000/health"

Response:

{
  "status": "healthy",
  "is_processing": true,
  "current_job_id": 123,
  "queue_pending": false,
  "model": "voxcpm-0.5b"
}

7. Web Playground

GET /

Interactive web interface for testing the TTS functionality.

Access at: http://localhost:{PORT}

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxcpmane-0.0.2.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voxcpmane-0.0.2-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file voxcpmane-0.0.2.tar.gz.

File metadata

  • Download URL: voxcpmane-0.0.2.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for voxcpmane-0.0.2.tar.gz
Algorithm Hash digest
SHA256 38f7f0d31167214ce01567833eb23c93987b4fc897a16680835e46503952fb4c
MD5 e7ae8a9d6d9fe0b904383d04783bde2d
BLAKE2b-256 a33e43706524ce917e0b742f1f6f62dcd238e88bd5e2fb122103ae814c0aeba8

See more details on using hashes here.

File details

Details for the file voxcpmane-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: voxcpmane-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for voxcpmane-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 179bd99c6b8efa323bf81d449238cc60b467f2076752b7d60aee5b647a491b36
MD5 12211925503ff7756d34ec1bcdd7d1a2
BLAKE2b-256 957c10811a49934899949a3b72b423ec2818be67e33f3f5bb9ad1f1857c3727d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page