IndexTTS for Apple Silicon using MLX
Project description
MLX-IndexTTS
IndexTTS for Apple Silicon using MLX. Zero-shot text-to-speech with voice cloning capabilities.
Features
- Run IndexTTS 1.5/2.0 natively on Apple Silicon
- RTF ~0.5 (2x faster than real-time on M2 Max)
- Voice cloning from reference audio
- v2.0: Emotion control (8 emotions)
- Auto-detect model version (1.5/2.0)
Requirements
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.10+
- uv package manager
Installation
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and install
git clone https://github.com/user/mlx-indextts.git
cd mlx-indextts
# Basic install (generation only)
uv sync
# With model conversion support (requires torch)
uv sync --extra convert
Quick Start
1. Convert Model (auto-detects version)
# Convert IndexTTS 1.5
uv run mlx-indextts convert \
--model-dir /path/to/indexTTS-1.5 \
-o models/mlx-indexTTS-1.5
# Convert IndexTTS 2.0
uv run mlx-indextts convert \
--model-dir /path/to/indexTTS-2 \
-o models/mlx-indexTTS-2.0
2. Generate Speech (auto-detects version)
# v1.5
uv run mlx-indextts generate \
-m models/mlx-indexTTS-1.5 \
-r reference.wav \
-t "你好,这是一个语音合成测试。" \
-o output.wav
# v2.0
uv run mlx-indextts generate \
-m models/mlx-indexTTS-2.0 \
-r reference.wav \
-t "你好,这是一个语音合成测试。" \
-o output.wav
# v2.0 with emotion control
uv run mlx-indextts generate \
-m models/mlx-indexTTS-2.0 \
-r reference.wav \
-t "今天真是太开心了!" \
-o output.wav \
--emotion happy --emo-alpha 0.8
3. Pre-compute Speaker (Faster Inference)
Pre-compute speaker conditioning to skip audio preprocessing on subsequent generations.
# v1.5
uv run mlx-indextts speaker \
-m models/mlx-indexTTS-1.5 \
-r reference.wav \
-o speaker_v15.npz
# v2.0
uv run mlx-indextts speaker \
-m models/mlx-indexTTS-2.0 \
-r reference.wav \
-o speaker_v20.npz
# Use pre-computed speaker (much faster loading)
uv run mlx-indextts generate \
-m models/mlx-indexTTS-2.0 \
-r speaker_v20.npz \
-t "你好,世界!" \
-o output.wav
Note: v1.5 and v2.0 speaker files are incompatible - each version requires its own .npz file.
Python API
# v1.5
from mlx_indextts.generate import IndexTTS
tts = IndexTTS.load_model("models/mlx-indexTTS-1.5")
audio = tts.generate(text="你好", ref_audio="reference.wav")
tts.save_audio(audio, "output.wav")
# v2.0
from mlx_indextts.generate_v2 import IndexTTSv2
tts = IndexTTSv2("models/mlx-indexTTS-2.0")
audio = tts.generate(
text="你好",
reference_audio="reference.wav",
output_path="output.wav",
emotion="happy",
emo_alpha=0.8,
)
CLI Options
mlx-indextts generate [OPTIONS]
Required:
-m, --model Model directory
-r, --ref-audio Reference audio (.wav or .npz)
-t, --text Text to synthesize
-o, --output Output file
Common options:
--max-tokens Max mel tokens (default: 800 for v1.5, 1500 for v2.0)
--temperature Sampling temperature (default: 1.0 for v1.5, 0.8 for v2.0)
--seed, -s Random seed for reproducibility
-v, --verbose Verbose output
-p, --play Play audio after generation
--quantize, -q Runtime quantization: 4, 8, or fp32
v2.0 only:
--emotion Emotion: happy/sad/angry/afraid/disgusted/melancholic/surprised/calm
--emo-alpha Emotion intensity 0.0-1.0 (default: 1.0)
--diffusion-steps Diffusion steps (default: 25)
--cfg-rate CFG rate (default: 0.7)
Version Comparison
| Feature | v1.5 | v2.0 |
|---|---|---|
| Sample rate | 24000 Hz | 22050 Hz |
| Max tokens | 800 | 1815 |
| Default temperature | 1.0 | 0.8 |
| Emotion control | ❌ | ✅ 8 emotions |
| S2Mel (CFM) | ❌ | ✅ |
| BigVGAN | Custom | nvidia pretrained |
| Runtime quantization | ✅ | ✅ |
| Speaker pre-compute | ✅ | ✅ |
Supported Emotions (v2.0)
| English | 中文 |
|---|---|
| happy | 高兴 |
| angry | 愤怒 |
| sad | 悲伤 |
| afraid | 恐惧 |
| disgusted | 反感 |
| melancholic | 低落 |
| surprised | 惊讶 |
| calm | 自然 |
Mixed emotions: --emotion "happy:0.6,sad:0.4"
Performance
| Metric | v1.5 | v2.0 |
|---|---|---|
| RTF (M2 Max) | ~0.5 | ~1.3 |
| Load time (.wav) | ~0.3s | ~9s |
| Load time (.npz) | ~0.3s | ~1.5s |
License
MIT License
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_indextts-0.1.0.tar.gz.
File metadata
- Download URL: mlx_indextts-0.1.0.tar.gz
- Upload date:
- Size: 962.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47fae8781e6ebcdb439b9b019bb7f104ce6c318a6be86ab0b7d6ee92676dec57
|
|
| MD5 |
0821606ee709d746195c35c4c41a5e15
|
|
| BLAKE2b-256 |
c16cc319c9fd3712d43b6350fe13789914cf14938df4a1d6155e9e4228c6bac9
|
File details
Details for the file mlx_indextts-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mlx_indextts-0.1.0-py3-none-any.whl
- Upload date:
- Size: 117.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e2e4e2efd3158fe99c0d6839e9d1eece6b2082cde9e54fb650a536332c1ba34
|
|
| MD5 |
b1be9ab8b182ab3c3073bde65f2e6342
|
|
| BLAKE2b-256 |
bdadace4661be1e9019eac82173f7d9fbf51e0b1fdc869cc5388d9d6a8233b54
|