Skip to main content

IndexTTS for Apple Silicon using MLX

Project description

MLX-IndexTTS

IndexTTS for Apple Silicon using MLX. Zero-shot text-to-speech with voice cloning capabilities.

Features

  • Run IndexTTS 1.5/2.0 natively on Apple Silicon
  • RTF ~0.5 (2x faster than real-time on M2 Max)
  • Voice cloning from reference audio
  • v2.0: Emotion control (8 emotions)
  • Auto-detect model version (1.5/2.0)

Requirements

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.10+
  • uv package manager

Installation

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and install
git clone https://github.com/user/mlx-indextts.git
cd mlx-indextts

# Basic install (generation only)
uv sync

# With model conversion support (requires torch)
uv sync --extra convert

Quick Start

1. Convert Model (auto-detects version)

# Convert IndexTTS 1.5
uv run mlx-indextts convert \
    --model-dir /path/to/indexTTS-1.5 \
    -o models/mlx-indexTTS-1.5

# Convert IndexTTS 2.0
uv run mlx-indextts convert \
    --model-dir /path/to/indexTTS-2 \
    -o models/mlx-indexTTS-2.0

2. Generate Speech (auto-detects version)

# v1.5
uv run mlx-indextts generate \
    -m models/mlx-indexTTS-1.5 \
    -r reference.wav \
    -t "你好,这是一个语音合成测试。" \
    -o output.wav

# v2.0
uv run mlx-indextts generate \
    -m models/mlx-indexTTS-2.0 \
    -r reference.wav \
    -t "你好,这是一个语音合成测试。" \
    -o output.wav

# v2.0 with emotion control
uv run mlx-indextts generate \
    -m models/mlx-indexTTS-2.0 \
    -r reference.wav \
    -t "今天真是太开心了!" \
    -o output.wav \
    --emotion happy --emo-alpha 0.8

3. Pre-compute Speaker (Faster Inference)

Pre-compute speaker conditioning to skip audio preprocessing on subsequent generations.

# v1.5
uv run mlx-indextts speaker \
    -m models/mlx-indexTTS-1.5 \
    -r reference.wav \
    -o speaker_v15.npz

# v2.0
uv run mlx-indextts speaker \
    -m models/mlx-indexTTS-2.0 \
    -r reference.wav \
    -o speaker_v20.npz

# Use pre-computed speaker (much faster loading)
uv run mlx-indextts generate \
    -m models/mlx-indexTTS-2.0 \
    -r speaker_v20.npz \
    -t "你好,世界!" \
    -o output.wav

Note: v1.5 and v2.0 speaker files are incompatible - each version requires its own .npz file.

Python API

# v1.5
from mlx_indextts.generate import IndexTTS

tts = IndexTTS.load_model("models/mlx-indexTTS-1.5")
audio = tts.generate(text="你好", ref_audio="reference.wav")
tts.save_audio(audio, "output.wav")

# v2.0
from mlx_indextts.generate_v2 import IndexTTSv2

tts = IndexTTSv2("models/mlx-indexTTS-2.0")
audio = tts.generate(
    text="你好",
    reference_audio="reference.wav",
    output_path="output.wav",
    emotion="happy",
    emo_alpha=0.8,
)

CLI Options

mlx-indextts generate [OPTIONS]

Required:
  -m, --model        Model directory
  -r, --ref-audio    Reference audio (.wav or .npz)
  -t, --text         Text to synthesize
  -o, --output       Output file

Common options:
  --max-tokens       Max mel tokens (default: 800 for v1.5, 1500 for v2.0)
  --temperature      Sampling temperature (default: 1.0 for v1.5, 0.8 for v2.0)
  --seed, -s         Random seed for reproducibility
  -v, --verbose      Verbose output
  -p, --play         Play audio after generation
  --quantize, -q     Runtime quantization: 4, 8, or fp32

v2.0 only:
  --emotion          Emotion: happy/sad/angry/afraid/disgusted/melancholic/surprised/calm
  --emo-alpha        Emotion intensity 0.0-1.0 (default: 1.0)
  --diffusion-steps  Diffusion steps (default: 25)
  --cfg-rate         CFG rate (default: 0.7)

Version Comparison

Feature v1.5 v2.0
Sample rate 24000 Hz 22050 Hz
Max tokens 800 1815
Default temperature 1.0 0.8
Emotion control ✅ 8 emotions
S2Mel (CFM)
BigVGAN Custom nvidia pretrained
Runtime quantization
Speaker pre-compute

Supported Emotions (v2.0)

English 中文
happy 高兴
angry 愤怒
sad 悲伤
afraid 恐惧
disgusted 反感
melancholic 低落
surprised 惊讶
calm 自然

Mixed emotions: --emotion "happy:0.6,sad:0.4"

Performance

Metric v1.5 v2.0
RTF (M2 Max) ~0.5 ~1.3
Load time (.wav) ~0.3s ~9s
Load time (.npz) ~0.3s ~1.5s

License

MIT License

Acknowledgments

  • IndexTTS - Original PyTorch implementation
  • MLX - Apple's ML framework

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_indextts-0.1.0.tar.gz (962.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_indextts-0.1.0-py3-none-any.whl (117.0 kB view details)

Uploaded Python 3

File details

Details for the file mlx_indextts-0.1.0.tar.gz.

File metadata

  • Download URL: mlx_indextts-0.1.0.tar.gz
  • Upload date:
  • Size: 962.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mlx_indextts-0.1.0.tar.gz
Algorithm Hash digest
SHA256 47fae8781e6ebcdb439b9b019bb7f104ce6c318a6be86ab0b7d6ee92676dec57
MD5 0821606ee709d746195c35c4c41a5e15
BLAKE2b-256 c16cc319c9fd3712d43b6350fe13789914cf14938df4a1d6155e9e4228c6bac9

See more details on using hashes here.

File details

Details for the file mlx_indextts-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlx_indextts-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 117.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mlx_indextts-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e2e4e2efd3158fe99c0d6839e9d1eece6b2082cde9e54fb650a536332c1ba34
MD5 b1be9ab8b182ab3c3073bde65f2e6342
BLAKE2b-256 bdadace4661be1e9019eac82173f7d9fbf51e0b1fdc869cc5388d9d6a8233b54

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page