Chatterbox MLX: Open Source TTS and Voice Conversion for MLX. Based off of Chatterbox by Resemble AI

These details have not been verified by PyPI

Project links

Project description

Chatterbox MLX - Apple Silicon Optimized TTS

An MLX-optimized fork of Resemble AI's Chatterbox TTS for Apple Silicon, delivering up to 2.4x faster inference.

Installation

pip install chatterbox-mlx

Requirements

macOS with Apple Silicon (M1/M2/M3/M4)
Python 3.11+ (tested primarily with 3.11.12. Also tested with 3.12.12)
~4GB disk space for model weights

Important: Python must be compiled with lzma support. If you're using pyenv:

# Install xz library first (provides liblzma)
brew install xz

# Then install Python (or reinstall if already installed)
pyenv install 3.11.12  # or your preferred version

If you see an error about ModuleNotFoundError: No module named '_lzma', you need to install xz and reinstall Python.

CLI Usage

Generate speech directly from the terminal:

    # Generate English speech (auto-generated filename):
    chatterbox "Artificial intelligence has made remarkable strides in recent years, particularly in the field of natural language processing."

    # Generate Spanish speech:
    chatterbox "La inteligencia artificial ha logrado avances notables en los últimos años." --lang es

    # Use the --voice flag to provide a reference audio file for voice cloning:
    chatterbox "Artificial intelligence has made remarkable strides in recent years, particularly in the field of natural language processing." --voice speaker.wav

    # Run multilingual benchmark (saves to benchmark_output/)
    chatterbox --benchmark --languages en es

CLI Options

Option	Description	Default
`-o, --output`	Output WAV file path	Auto-generated
`-l, --lang`	Language code (en, es, fr, de, ja, zh, etc.)	`en`
`-v, --voice`	Reference audio for voice cloning	None
`--exaggeration`	Emotion intensity (0.0-1.0)	`0.5`
`--cfg`	Classifier-free guidance weight	`0.5`
`--backend`	Backend: hybrid-mlx, mlx, pytorch	`hybrid-mlx`
`--benchmark`	Run multilingual benchmark	False
`--languages`	Languages to benchmark	en es fr de ja zh
`--no-save-audio`	Don't save benchmark audio files	False (saves)
`-q, --quiet`	Suppress progress messages	False

Quick Start

import torchaudio as ta
from chatterbox.tts_mlx import ChatterboxTTSMLX

# Load model (downloads weights automatically on first run). Default is "cpu", choose "hybrid-mlx" for best performance on an Apple Silicon device.
model = ChatterboxTTSMLX.from_pretrained(device="hybrid-mlx")


# Generate speech
text = "Hello! This is Chatterbox running with MLX optimization on Apple Silicon."
wav = model.generate(text)
ta.save("output.wav", wav, model.sr)

# Voice cloning with reference audio
wav = model.generate(
    text,
    audio_prompt_path="reference_voice.wav",
    exaggeration=0.5,  # Emotion intensity (0.0-1.0)
    cfg_weight=0.5,    # Classifier-free guidance
)

Long-Form Audio Generation

For texts longer than ~50 words, use chunked generation:

long_text = """
Your long text here. It can span multiple paragraphs and sentences.
The generate_long method will automatically split it at sentence boundaries,
generate each chunk separately, and crossfade them together seamlessly.
"""

wav = model.generate_long(
    long_text,
    audio_prompt_path="reference_voice.wav",
    chunk_size_words=50,
    overlap_duration=0.1,
)
ta.save("long_output.wav", wav, model.sr)

🙏 Acknowledgements

This project is built on top of the excellent Chatterbox TTS by Resemble AI. I'm deeply grateful for their work in creating and open-sourcing a production-grade, multilingual text-to-speech system under the MIT license.

This fork focuses specifically on MLX optimizations for Apple Silicon. If you're looking for the original project with CUDA support and the full feature set, please visit the official Resemble AI repository.

What's Different in This Fork?

This package provides native MLX acceleration for Apple Silicon Macs, achieving significant performance improvements:

Text Length	CPU Baseline	MLX Optimized	Speedup
Short (5 words)	8.91s	3.70s	2.4x faster
Medium (31 words)	57.51s	24.40s	2.4x faster
Long (94 words)	137.92s	62.66s	2.2x faster

Key Optimizations

MLX-Native T3 Model: The 520M parameter Llama 3 backbone runs entirely on MLX
Float16 KV Cache: Up to 5.8 GB memory savings with 32% faster generation
Hybrid Architecture: Combines MLX speed with PyTorch quality controls
Long-Form Generation: Intelligent chunking with crossfade for extended audio

Benchmark Results

All benchmarks run on Apple M4 (32GB RAM), macOS 15.4, Python 3.11, PyTorch 2.8.0.

English TTS Performance

Device	Text	Words	Time	RTF
Hybrid-MLX	short	5	4.08s	0.65x
Hybrid-MLX	medium	31	25.24s	0.73x
Hybrid-MLX	long	94	62.66s	0.74x
Pure MLX	short	5	3.70s	0.69x
Pure MLX	medium	31	24.40s	0.72x
Pure MLX	long	94	68.82s	0.71x
CPU	short	5	8.91s	0.27x
CPU	medium	31	57.51s	0.33x
CPU	long	94	137.92s	0.34x

Key findings:

Hybrid-MLX recommended for production (best quality/speed balance)
Pure MLX fastest for short texts, but quality degrades on long texts
2.2-2.4x speedup vs CPU baseline across all text lengths

Multilingual Performance

Device	Language	Time	RTF
Hybrid-MLX	English	12.25s	0.71x
Hybrid-MLX	Spanish	14.74s	0.76x
Pure MLX	English	14.55s	0.67x
Pure MLX	Spanish	13.78s	0.75x
MPS	English	19.96s	0.50x
MPS	Spanish	21.06s	0.51x
CPU	English	25.64s	0.32x
CPU	Spanish	31.31s	0.33x

Visual Comparison

                    GENERATION TIME COMPARISON

     Short (5 words)
     ├─ CPU        ████████████████████████████████████████ 8.91s
     ├─ Hybrid-MLX ██████████████████ 4.08s (2.2x faster)
     └─ Pure MLX   ████████████████ 3.70s (2.4x faster)

     Medium (31 words)
     ├─ CPU        ████████████████████████████████████████ 57.51s
     ├─ Hybrid-MLX █████████████████ 25.24s (2.3x faster)
     └─ Pure MLX   ████████████████ 24.40s (2.4x faster)

     Long (94 words)
     ├─ CPU        ████████████████████████████████████████ 137.92s
     ├─ Hybrid-MLX █████████████████ 62.66s (2.2x faster)  ✓ Best quality
     └─ Pure MLX   ██████████████████ 68.82s (2.0x faster)

Backend Comparison

Backend	Description	RTF	Memory	Recommendation
Hybrid-MLX	T3 (MLX) + S3Gen (PyTorch/MPS)	0.74x	~16GB	✅ Production use
Pure MLX	Everything on MLX	0.71x	~14GB	Minimal dependencies
PyTorch MPS	Full PyTorch on MPS	0.51x	~14GB	Fallback
CPU	PyTorch on CPU	0.34x	~14GB	Baseline

RTF = Real-Time Factor (audio_duration / generation_time). Higher is better.

Running Benchmarks

You can reproduce these benchmarks on your own hardware.

English TTS Benchmark

# Full benchmark (all backends)
python benchmark_mps.py --runs 3 --validate

# Quick test with Hybrid-MLX only
python benchmark_mps.py --hybrid-mlx-only --runs 1

# CPU baseline only
python benchmark_mps.py --cpu-only --runs 1

# With voice cloning
python benchmark_mps.py --audio-prompt speaker.wav --runs 3

# Enable memory debugging
DEBUG_MEMORY=1 python benchmark_mps.py --hybrid-mlx-only

Options:

Flag	Description
`--warmup N`	Warmup runs before timing (default: 1)
`--runs N`	Number of timed benchmark runs (default: 3)
`--devices`	Backends to test: `mps`, `cpu`, `hybrid-mlx`, `mlx`, `mlx-q4`
`--audio-prompt FILE`	Reference audio for voice cloning
`--output-dir DIR`	Output directory (default: `benchmark_output/`)
`--validate`	Enable Whisper transcription validation (computes WER)
`--mps-only`	Only benchmark PyTorch MPS
`--cpu-only`	Only benchmark CPU
`--hybrid-mlx-only`	Only benchmark Hybrid-MLX
`--mlx-only`	Only benchmark Pure MLX
`--debug-memory`	Enable detailed memory logging

Multilingual Benchmark

# Test specific languages
python benchmark_multilingual.py \
    --audio-prompt speaker.wav \
    --languages en es fr de ja zh \
    --runs 3

# Quick test with Hybrid-MLX
python benchmark_multilingual.py \
    --audio-prompt speaker.wav \
    --languages en es \
    --hybrid-mlx-only

# With validation
python benchmark_multilingual.py \
    --audio-prompt speaker.wav \
    --languages en es fr \
    --validate

Supported Languages: en (English), es (Spanish), fr (French), de (German), it (Italian), pt (Portuguese), ru (Russian), ja (Japanese), zh (Chinese), ko (Korean), ar (Arabic), hi (Hindi), tr (Turkish), pl (Polish), nl (Dutch), sv (Swedish), da (Danish), no (Norwegian), fi (Finnish), el (Greek), he (Hebrew), ms (Malay), sw (Swahili)

Benchmark Output

Results are saved to:

benchmark_output/benchmark_results.json - English TTS results
benchmark_multilingual_output/multilingual_results.json - Multilingual results
Generated audio files: {device}_{category}.wav

Architecture

Chatterbox is a two-stage TTS pipeline. This fork accelerates the most compute-intensive component (T3) with MLX:

┌─────────────────────────────────────────────────────────────────────┐
│                     CHATTERBOX MLX PIPELINE                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────────┐   │
│  │ VoiceEncoder │    │     T3       │    │       S3Gen          │   │
│  │  (PyTorch)   │───▶│    (MLX)     │───▶│   (PyTorch/MPS)      │   │
│  │   ~2M params │    │  520M params │    │     ~80M params      │   │
│  └──────────────┘    └──────────────┘    └──────────────────────┘   │
│                            ▲                                        │
│                            │                                        │
│                    2.4x faster with MLX                             │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Supported Languages

All 23 languages from the original Chatterbox are supported:

Arabic • Danish • German • Greek • English • Spanish • Finnish • French • Hebrew • Hindi • Italian • Japanese • Korean • Malay • Dutch • Norwegian • Polish • Portuguese • Russian • Swedish • Swahili • Turkish • Chinese

from chatterbox.mtl_tts_mlx import ChatterboxMultilingualTTSMLX

model = ChatterboxMultilingualTTSMLX.from_pretrained(device="mps")

# French
wav = model.generate("Bonjour, comment ça va?", language_id="fr")

# Japanese
wav = model.generate("こんにちは、元気ですか？", language_id="ja")

Tips for Best Results

General Use

Default settings (exaggeration=0.5, cfg_weight=0.5) work well for most cases
Ensure reference audio matches target language to avoid accent transfer

Expressive Speech

Lower cfg_weight (~0.3) + higher exaggeration (~0.7) for dramatic delivery
Higher exaggeration speeds up speech; lower cfg_weight compensates

Memory Usage

Enable debug logging to monitor memory:

DEBUG_MEMORY=1 python your_script.py

Differences from Original Chatterbox

Feature	Original (Resemble AI)	This Fork
Target Hardware	NVIDIA CUDA	Apple Silicon
ML Framework	PyTorch	MLX + PyTorch hybrid
T3 Inference	PyTorch	MLX (2.4x faster)
KV Cache	Float32	Float16 (32% faster)
Long-form Audio	Basic	Chunked with crossfade

Credits & Links

Original Project: Resemble AI's Chatterbox
Resemble AI: resemble.ai - For creating and open-sourcing this incredible TTS system
Demo: Hugging Face Space
Evaluation: Outperforms ElevenLabs

Upstream Dependencies

License

MIT License - Same as the original Chatterbox project.

Citation

If you use this project, please cite the original Chatterbox:

@misc{chatterboxtts2025,
  author       = {{Resemble AI}},
  title        = {{Chatterbox-TTS}},
  year         = {2025},
  howpublished = {\url{https://github.com/resemble-ai/chatterbox}},
  note         = {GitHub repository}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.4

Jan 10, 2026

1.0.3

Jan 10, 2026

1.0.2

Jan 10, 2026

1.0.1

Jan 10, 2026

1.0.0

Jan 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chatterbox_mlx-1.0.4.tar.gz (196.0 kB view details)

Uploaded Jan 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chatterbox_mlx-1.0.4-py3-none-any.whl (243.1 kB view details)

Uploaded Jan 10, 2026 Python 3

File details

Details for the file chatterbox_mlx-1.0.4.tar.gz.

File metadata

Download URL: chatterbox_mlx-1.0.4.tar.gz
Upload date: Jan 10, 2026
Size: 196.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for chatterbox_mlx-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`0214b0ba87d55d08efc635452d010b98dcd3791c4c6809ca74bcc911dc354f84`
MD5	`290156e32fb2566a30114b1c23b76ea6`
BLAKE2b-256	`22b47aaa56e9c6f6e6769136b4ca96a1ce659614436210146b263402ada6ce44`

See more details on using hashes here.

File details

Details for the file chatterbox_mlx-1.0.4-py3-none-any.whl.

File metadata

Download URL: chatterbox_mlx-1.0.4-py3-none-any.whl
Upload date: Jan 10, 2026
Size: 243.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for chatterbox_mlx-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`face394ba5467ac07c1596093b6bad5870c70c74b807ef90222e434eb78e223e`
MD5	`76d80e185efa03a0b6f2052e4df1483d`
BLAKE2b-256	`f31d0516f1d8b93fb61196afc53bf61d6ba02bc737f6598e3b1622e6840117c2`

See more details on using hashes here.

chatterbox-mlx 1.0.4

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Chatterbox MLX - Apple Silicon Optimized TTS

Installation

Requirements

CLI Usage

CLI Options

Quick Start

Long-Form Audio Generation

🙏 Acknowledgements

What's Different in This Fork?

Key Optimizations

Benchmark Results

English TTS Performance

Multilingual Performance

Visual Comparison

Backend Comparison

Running Benchmarks

English TTS Benchmark

Multilingual Benchmark

Benchmark Output

Architecture

Supported Languages

Tips for Best Results

General Use

Expressive Speech

Memory Usage

Differences from Original Chatterbox

Credits & Links

Upstream Dependencies

License

Citation

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes