IndexTTS2 inference library: zero-shot text-to-speech with emotional control
Project description
index-tts-inference
Minimal pip package for IndexTTS2 inference. Wraps the official IndexTTS2 repo, stripped down to only what's needed for inference.
Install
pip install indextts2-inference
Optional extras
SageAttention (alternative attention backend):
pip install indextts2-inference[sage-attn]
Flash Attention v2 (acceleration engine with KV cache and CUDA graphs):
pip install indextts2-inference[flash-attn]
DeepSpeed:
pip install indextts2-inference[deepspeed]
Usage
from indextts import IndexTTS2
# Auto-downloads model from HuggingFace
tts = IndexTTS2()
# Or use local/finetuned checkpoints
tts = IndexTTS2(model_dir="/path/to/checkpoints")
# Basic inference
tts.infer(spk_audio_prompt="voice.wav", text="Hello world", output_path="out.wav")
Attention backends
# Default PyTorch SDPA — auto-selects best kernel, no extra deps needed
tts = IndexTTS2()
# SageAttention — may help on Ampere/Hopper GPUs, requires sageattention package
tts = IndexTTS2(attn_backend="sage", use_fp16=True)
# Flash Attention v2 — acceleration engine with paged KV cache and CUDA graphs
tts = IndexTTS2(attn_backend="flash")
Language selection
By default the language is auto-detected between Chinese and English. You can set it explicitly:
tts = IndexTTS2(language="es")
tts.infer(spk_audio_prompt="voice.wav", text="Hola, esto es una prueba.", output_path="out.wav")
Emotion control
There are three ways to control the emotion of the generated speech:
# 1. From a reference audio
tts.infer(
spk_audio_prompt="speaker.wav",
text="Some text",
output_path="out.wav",
emo_audio_prompt="happy_reference.wav",
emo_alpha=0.7,
)
# 2. With an explicit emotion vector
# [happy, angry, sad, afraid, disgusted, melancholic, surprised, calm]
tts.infer(
spk_audio_prompt="speaker.wav",
text="I am very happy!",
output_path="out.wav",
emo_vector=[0.8, 0, 0, 0, 0, 0, 0, 0],
)
# 3. Auto-detect emotion from the text itself
tts.infer(
spk_audio_prompt="speaker.wav",
text="I am very happy!",
output_path="out.wav",
use_emo_text=True,
)
Streaming
for chunk in tts.infer(
spk_audio_prompt="voice.wav",
text="Long text to synthesize...",
output_path="out.wav",
stream_return=True,
):
if chunk is not None and hasattr(chunk, "shape"):
audio_np = chunk.squeeze().cpu().numpy()
Generation parameters
You can tune sampling parameters via kwargs:
tts.infer(
spk_audio_prompt="voice.wav",
text="Hello",
output_path="out.wav",
temperature=0.6,
top_k=20,
top_p=0.8,
max_mel_tokens=2000,
)
Logging
By default, index-tts-inference only shows warnings. To see detailed logs:
export INDEXTTS_LOG_LEVEL=DEBUG # DEBUG, INFO, WARNING (default)
PyTorch with CUDA
This package lists torch and torchaudio as dependencies without pinning a specific CUDA version. Install the CUDA variant you need before installing this package:
# Example: PyTorch with CUDA 12.8
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128
# Then install the package
pip install indextts2-inference
Or with uv:
# pyproject.toml of your project
[tool.uv.sources]
torch = [{ index = "pytorch-cuda", marker = "sys_platform == 'linux'" }]
torchaudio = [{ index = "pytorch-cuda", marker = "sys_platform == 'linux'" }]
[[tool.uv.index]]
name = "pytorch-cuda"
url = "https://download.pytorch.org/whl/cu128"
explicit = true
License
See LICENSE and DISCLAIMER.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file indextts2_inference-2.1.0.tar.gz.
File metadata
- Download URL: indextts2_inference-2.1.0.tar.gz
- Upload date:
- Size: 344.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6a0dcbaa56ba5a5adb73a735961ac6ea3ce9373399d7e2b4ba636dabe50d389
|
|
| MD5 |
0324b1809e79ababab25e2fbb7964bb3
|
|
| BLAKE2b-256 |
92dfbf1fe1aa6fadc395b8e7ec88fd454350fe81f2a3816188bfb9162d432abb
|
Provenance
The following attestation bundles were made for indextts2_inference-2.1.0.tar.gz:
Publisher:
release.yml on nicokim/indextts2-inference
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
indextts2_inference-2.1.0.tar.gz -
Subject digest:
a6a0dcbaa56ba5a5adb73a735961ac6ea3ce9373399d7e2b4ba636dabe50d389 - Sigstore transparency entry: 997339047
- Sigstore integration time:
-
Permalink:
nicokim/indextts2-inference@ba9788dd4c89949c17e75556f7db0cfac043150b -
Branch / Tag:
refs/tags/v2.1.0 - Owner: https://github.com/nicokim
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ba9788dd4c89949c17e75556f7db0cfac043150b -
Trigger Event:
push
-
Statement type:
File details
Details for the file indextts2_inference-2.1.0-py3-none-any.whl.
File metadata
- Download URL: indextts2_inference-2.1.0-py3-none-any.whl
- Upload date:
- Size: 168.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
062b44fc391ffeef966cf7dab4b61140f6cf9c4f4103181092abbdfa1d302fd8
|
|
| MD5 |
a861f82e7ac78f576e02001c4aa93ce7
|
|
| BLAKE2b-256 |
a7cb470e789c80e4695fe3b83bfbb833db99145cf872c9f8c23e8ebe542159c5
|
Provenance
The following attestation bundles were made for indextts2_inference-2.1.0-py3-none-any.whl:
Publisher:
release.yml on nicokim/indextts2-inference
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
indextts2_inference-2.1.0-py3-none-any.whl -
Subject digest:
062b44fc391ffeef966cf7dab4b61140f6cf9c4f4103181092abbdfa1d302fd8 - Sigstore transparency entry: 997339072
- Sigstore integration time:
-
Permalink:
nicokim/indextts2-inference@ba9788dd4c89949c17e75556f7db0cfac043150b -
Branch / Tag:
refs/tags/v2.1.0 - Owner: https://github.com/nicokim
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ba9788dd4c89949c17e75556f7db0cfac043150b -
Trigger Event:
push
-
Statement type: