Skip to main content

IndexTTS2 inference library: zero-shot text-to-speech with emotional control

Project description

index-tts-inference

CI PyPI

Minimal pip package for IndexTTS2 inference. Wraps the official IndexTTS2 repo, stripped down to only what's needed for inference.

Install

pip install indextts2-inference

Optional extras

Spanish support (nemo text processing):

pip install indextts2-inference[es]

SageAttention (alternative attention backend):

pip install indextts2-inference[sage-attn]

Flash Attention v2 (acceleration engine with KV cache and CUDA graphs):

pip install indextts2-inference[flash-attn]

DeepSpeed:

pip install indextts2-inference[deepspeed]

Usage

from indextts import IndexTTS2

# Auto-downloads model from HuggingFace
tts = IndexTTS2()

# Or use local/finetuned checkpoints
tts = IndexTTS2(model_dir="/path/to/checkpoints")

# Basic inference
tts.infer(spk_audio_prompt="voice.wav", text="Hello world", output_path="out.wav")

Attention backends

# Default PyTorch SDPA — auto-selects best kernel, no extra deps needed
tts = IndexTTS2()

# SageAttention — may help on Ampere/Hopper GPUs, requires sageattention package
tts = IndexTTS2(attn_backend="sage", use_fp16=True)

# Flash Attention v2 — acceleration engine with paged KV cache and CUDA graphs
tts = IndexTTS2(attn_backend="flash")

Language selection

By default the language is auto-detected between Chinese and English. For Spanish, set it explicitly:

tts = IndexTTS2(language="es")
tts.infer(spk_audio_prompt="voice.wav", text="Hola, esto es una prueba.", output_path="out.wav")

Emotion control

There are three ways to control the emotion of the generated speech:

# 1. From a reference audio
tts.infer(
    spk_audio_prompt="speaker.wav",
    text="Some text",
    output_path="out.wav",
    emo_audio_prompt="happy_reference.wav",
    emo_alpha=0.7,
)

# 2. With an explicit emotion vector
#    [happy, angry, sad, afraid, disgusted, melancholic, surprised, calm]
tts.infer(
    spk_audio_prompt="speaker.wav",
    text="I am very happy!",
    output_path="out.wav",
    emo_vector=[0.8, 0, 0, 0, 0, 0, 0, 0],
)

# 3. Auto-detect emotion from the text itself
tts.infer(
    spk_audio_prompt="speaker.wav",
    text="I am very happy!",
    output_path="out.wav",
    use_emo_text=True,
)

Streaming

for chunk in tts.infer(
    spk_audio_prompt="voice.wav",
    text="Long text to synthesize...",
    output_path="out.wav",
    stream_return=True,
):
    if chunk is not None and hasattr(chunk, "shape"):
        audio_np = chunk.squeeze().cpu().numpy()

Generation parameters

You can tune sampling parameters via kwargs:

tts.infer(
    spk_audio_prompt="voice.wav",
    text="Hello",
    output_path="out.wav",
    temperature=0.6,
    top_k=20,
    top_p=0.8,
    max_mel_tokens=2000,
)

Logging

By default, index-tts-inference only shows warnings. To see detailed logs:

export INDEXTTS_LOG_LEVEL=DEBUG  # DEBUG, INFO, WARNING (default)

PyTorch with CUDA

This package lists torch and torchaudio as dependencies without pinning a specific CUDA version. Install the CUDA variant you need before installing this package:

# Example: PyTorch with CUDA 12.8
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128

# Then install the package
pip install indextts2-inference

Or with uv:

# pyproject.toml of your project
[tool.uv.sources]
torch = [{ index = "pytorch-cuda", marker = "sys_platform == 'linux'" }]
torchaudio = [{ index = "pytorch-cuda", marker = "sys_platform == 'linux'" }]

[[tool.uv.index]]
name = "pytorch-cuda"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

License

See LICENSE and DISCLAIMER.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indextts2_inference-2.0.2.tar.gz (347.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indextts2_inference-2.0.2-py3-none-any.whl (171.1 kB view details)

Uploaded Python 3

File details

Details for the file indextts2_inference-2.0.2.tar.gz.

File metadata

  • Download URL: indextts2_inference-2.0.2.tar.gz
  • Upload date:
  • Size: 347.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for indextts2_inference-2.0.2.tar.gz
Algorithm Hash digest
SHA256 707fae172a6964272a86920d2e6fe2f666ac56720d75f714a917ea784446c304
MD5 f9875ae88b2d63b871d441ea2b7e8c1b
BLAKE2b-256 dbeaa8ae21f0de31170b33e385c0c7b6103b8b3be535a591f1347fa15f6cf0e7

See more details on using hashes here.

Provenance

The following attestation bundles were made for indextts2_inference-2.0.2.tar.gz:

Publisher: release.yml on nicokim/indextts2-inference

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file indextts2_inference-2.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for indextts2_inference-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 457551a66d4638707de58b09ab2f38f959855d52005134b5977abd374709a658
MD5 885e3d85e7e42f8067d0b0cd3d214df8
BLAKE2b-256 5a8455548532a0f54d77610c23fc6e3c23d14d1d93574b7942bf464fb7ba41f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for indextts2_inference-2.0.2-py3-none-any.whl:

Publisher: release.yml on nicokim/indextts2-inference

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page