REST API for ChatterboxTTS with OpenAI compatibility
Project description
Chatterbox TTS API
A Flask-based REST API for ChatterboxTTS, providing OpenAI-compatible text-to-speech endpoints with voice cloning capabilities.
Features
🚀 OpenAI-Compatible API - Drop-in replacement for OpenAI's TTS API
🎭 Voice Cloning - Use your own voice samples for personalized speech
📝 Smart Text Processing - Automatic chunking for long texts
🐳 Docker Ready - Full containerization support
⚙️ Configurable - Extensive environment variable configuration
🎛️ Parameter Control - Real-time adjustment of speech characteristics
Quick Start
1. Local Installation
# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
# Setup environment — using Python 3.11
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Copy and customize environment variables
cp .env.example .env
# Add your voice sample (or use the provided one)
# cp your-voice.mp3 voice-sample.mp3
# Start the API
python api.py
2. Docker (Recommended)
# Clone and start with Docker Compose
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
cp .env.example .env # Customize as needed
docker compose up -d
# If you have nvidia/CUDA, you might have better luck with this
docker compose -f docker-compose.gpu.yml up -d --build
# Watch the logs as it initializes (the first use of TTS takes the longest)
docker logs chatterbox-tts-api -f
# Test the API
curl -X POST http://localhost:5123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Hello from ChatterboxTTS!"}' \
--output test.wav
API Usage
Basic Text-to-Speech
curl -X POST http://localhost:5123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Your text here"}' \
--output speech.wav
With Custom Parameters
curl -X POST http://localhost:5123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Dramatic speech!",
"exaggeration": 1.2,
"cfg_weight": 0.3,
"temperature": 0.9
}' \
--output dramatic.wav
Python Example
import requests
response = requests.post(
"http://localhost:5123/v1/audio/speech",
json={
"input": "Hello world!",
"exaggeration": 0.8 # More expressive
}
)
with open("output.wav", "wb") as f:
f.write(response.content)
Configuration
Key environment variables (see .env.example for full list):
| Variable | Default | Description |
|---|---|---|
PORT |
5123 |
API server port |
EXAGGERATION |
0.5 |
Emotion intensity (0.25-2.0) |
CFG_WEIGHT |
0.5 |
Pace control (0.0-1.0) |
TEMPERATURE |
0.8 |
Sampling randomness (0.05-5.0) |
VOICE_SAMPLE_PATH |
./voice-sample.mp3 |
Voice sample for cloning |
DEVICE |
auto |
Device (auto/cuda/mps/cpu) |
Voice Cloning
Replace the default voice sample:
# Replace the default voice sample
cp your-voice.mp3 voice-sample.mp3
# Or set a custom path
echo "VOICE_SAMPLE_PATH=/path/to/your/voice.mp3" >> .env
For best results:
- Use 10-30 seconds of clear speech
- Avoid background noise
- Prefer WAV or high-quality MP3
Docker Deployment
Development
docker compose up
Production
# Create production environment
cp .env.example .env
nano .env # Set FLASK_DEBUG=false, etc.
# Deploy
docker compose -f docker-compose.yml up -d
With GPU Support
# Uncomment GPU section in docker-compose.yml
# Ensure NVIDIA Container Toolkit is installed
docker compose up -d
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/audio/speech |
POST | Generate speech from text |
/health |
GET | Health check and status |
/config |
GET | Current configuration |
/v1/models |
GET | Available models (OpenAI compat) |
Parameters Reference
Speech Generation Parameters
Exaggeration (0.25-2.0)
0.3-0.4: Professional, neutral0.5: Default balanced0.7-0.8: More expressive1.0+: Very dramatic
CFG Weight (0.0-1.0)
0.2-0.3: Faster speech0.5: Default pace0.7-0.8: Slower, deliberate
Temperature (0.05-5.0)
0.4-0.6: More consistent0.8: Default balance1.0+: More creative/random
Testing
Run the test suite:
python test_api.py
Performance
- CPU: Works but slower, reduce chunk size for better memory usage
- GPU: Recommended for production, significantly faster
- Memory: 4GB minimum, 8GB+ recommended
- Concurrency: Single request processing for stability
Troubleshooting
Common Issues
CUDA/CPU Compatibility Error
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False
This happens because chatterbox-tts models require PyTorch with CUDA support, even when running on CPU. Solutions:
# Option 1: Use default setup (now includes CUDA-enabled PyTorch (maybe))
docker compose up -d
# Option 2: Use explicit CUDA setup
docker compose -f docker-compose.gpu.yml up -d
# Option 3: Use CPU-only setup (may have compatibility issues)
docker compose -f docker-compose.cpu.yml up -d
# Option 4: Clear model cache and retry with CUDA-enabled setup
docker volume rm chatterbox-tts-api_chatterbox-models
docker compose up -d --build
For local development, install PyTorch with CUDA support:
pip uninstall torch torchvision torchaudio
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
pip install chatterbox-tts
Port conflicts
# Change port
echo "PORT=5002" >> .env
GPU not detected
# Force CPU mode
echo "DEVICE=cpu" >> .env
Out of memory
# Reduce chunk size
echo "MAX_CHUNK_LENGTH=200" >> .env
Model download fails
# Clear cache and retry
rm -rf models/
python api.py
Development
Local Development
# Install in development mode
pip install -e .
# Enable debug mode
export FLASK_DEBUG=true
python api.py
Testing
# Run API tests
python test_api.py
# Test specific endpoint
curl http://localhost:5123/health
License
This API wrapper is provided under the same license terms as the underlying ChatterboxTTS model. See the ChatterboxTTS repository for details.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
Related Projects
- ChatterboxTTS - The core TTS model
- Resemble AI - Production TTS services
Support
- 📖 Documentation: See API_README.md and DOCKER_README.md
- 🐛 Issues: Report bugs and feature requests via GitHub issues
- 💬 Discord: Join the ChatterboxTTS Discord or the Discord for this project
Made with ♥️ for the open source community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chatterbox_api-1.0.0.tar.gz.
File metadata
- Download URL: chatterbox_api-1.0.0.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccc6812cbe0922e6e9ea6b34bb9931e55d55283f2236e301d16a26598610b228
|
|
| MD5 |
b45db00d36069198c1933215b08e0ff0
|
|
| BLAKE2b-256 |
054b0dd5cad6e2723fa5157680041a3d159af5cc5cb02d9f6910c72ba6a33b12
|
File details
Details for the file chatterbox_api-1.0.0-py3-none-any.whl.
File metadata
- Download URL: chatterbox_api-1.0.0-py3-none-any.whl
- Upload date:
- Size: 16.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3302cd01423a0eafeab08681a62f69ca9d4a7616b11ccaa566687d05eaa2f7d5
|
|
| MD5 |
fe9924904e2eeb8418822086e064f789
|
|
| BLAKE2b-256 |
6fb26ff71d7ea6d8df121c3ec4442de2df071d5a73be088b6d4f5876d418afe1
|