Skip to main content

IndexTTS2 inference library: zero-shot text-to-speech with emotional control

Project description

index-tts-inference

CI PyPI

Minimal pip package for IndexTTS2 inference. Wraps the official IndexTTS2 repo, stripped down to only what's needed for inference.

Install

pip install indextts2-inference

Optional extras

SageAttention (alternative attention backend):

pip install indextts2-inference[sage-attn]

Flash Attention v2 (acceleration engine with KV cache and CUDA graphs):

pip install indextts2-inference[flash-attn]

DeepSpeed:

pip install indextts2-inference[deepspeed]

Usage

from indextts import IndexTTS2

# Auto-downloads model from HuggingFace
tts = IndexTTS2()

# Or use local/finetuned checkpoints
tts = IndexTTS2(model_dir="/path/to/checkpoints")

# Basic inference
tts.infer(spk_audio_prompt="voice.wav", text="Hello world", output_path="out.wav")

Attention backends

# Default PyTorch SDPA — auto-selects best kernel, no extra deps needed
tts = IndexTTS2()

# SageAttention — may help on Ampere/Hopper GPUs, requires sageattention package
tts = IndexTTS2(attn_backend="sage", use_fp16=True)

# Flash Attention v2 — acceleration engine with paged KV cache and CUDA graphs
tts = IndexTTS2(attn_backend="flash")

Language selection

By default the language is auto-detected between Chinese and English. You can set it explicitly:

tts = IndexTTS2(language="es")
tts.infer(spk_audio_prompt="voice.wav", text="Hola, esto es una prueba.", output_path="out.wav")

Emotion control

There are three ways to control the emotion of the generated speech:

# 1. From a reference audio
tts.infer(
    spk_audio_prompt="speaker.wav",
    text="Some text",
    output_path="out.wav",
    emo_audio_prompt="happy_reference.wav",
    emo_alpha=0.7,
)

# 2. With an explicit emotion vector
#    [happy, angry, sad, afraid, disgusted, melancholic, surprised, calm]
tts.infer(
    spk_audio_prompt="speaker.wav",
    text="I am very happy!",
    output_path="out.wav",
    emo_vector=[0.8, 0, 0, 0, 0, 0, 0, 0],
)

# 3. Auto-detect emotion from the text itself
tts.infer(
    spk_audio_prompt="speaker.wav",
    text="I am very happy!",
    output_path="out.wav",
    use_emo_text=True,
)

Streaming

for chunk in tts.infer(
    spk_audio_prompt="voice.wav",
    text="Long text to synthesize...",
    output_path="out.wav",
    stream_return=True,
):
    if chunk is not None and hasattr(chunk, "shape"):
        audio_np = chunk.squeeze().cpu().numpy()

Generation parameters

You can tune sampling parameters via kwargs:

tts.infer(
    spk_audio_prompt="voice.wav",
    text="Hello",
    output_path="out.wav",
    temperature=0.6,
    top_k=20,
    top_p=0.8,
    max_mel_tokens=2000,
)

Logging

By default, index-tts-inference only shows warnings. To see detailed logs:

export INDEXTTS_LOG_LEVEL=DEBUG  # DEBUG, INFO, WARNING (default)

PyTorch with CUDA

This package lists torch and torchaudio as dependencies without pinning a specific CUDA version. Install the CUDA variant you need before installing this package:

# Example: PyTorch with CUDA 12.8
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128

# Then install the package
pip install indextts2-inference

Or with uv:

# pyproject.toml of your project
[tool.uv.sources]
torch = [{ index = "pytorch-cuda", marker = "sys_platform == 'linux'" }]
torchaudio = [{ index = "pytorch-cuda", marker = "sys_platform == 'linux'" }]

[[tool.uv.index]]
name = "pytorch-cuda"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

License

See LICENSE and DISCLAIMER.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indextts2_inference-2.1.0.tar.gz (344.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indextts2_inference-2.1.0-py3-none-any.whl (168.2 kB view details)

Uploaded Python 3

File details

Details for the file indextts2_inference-2.1.0.tar.gz.

File metadata

  • Download URL: indextts2_inference-2.1.0.tar.gz
  • Upload date:
  • Size: 344.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for indextts2_inference-2.1.0.tar.gz
Algorithm Hash digest
SHA256 a6a0dcbaa56ba5a5adb73a735961ac6ea3ce9373399d7e2b4ba636dabe50d389
MD5 0324b1809e79ababab25e2fbb7964bb3
BLAKE2b-256 92dfbf1fe1aa6fadc395b8e7ec88fd454350fe81f2a3816188bfb9162d432abb

See more details on using hashes here.

Provenance

The following attestation bundles were made for indextts2_inference-2.1.0.tar.gz:

Publisher: release.yml on nicokim/indextts2-inference

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file indextts2_inference-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for indextts2_inference-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 062b44fc391ffeef966cf7dab4b61140f6cf9c4f4103181092abbdfa1d302fd8
MD5 a861f82e7ac78f576e02001c4aa93ce7
BLAKE2b-256 a7cb470e789c80e4695fe3b83bfbb833db99145cf872c9f8c23e8ebe542159c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for indextts2_inference-2.1.0-py3-none-any.whl:

Publisher: release.yml on nicokim/indextts2-inference

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page