Skip to main content

IndexTTS2 inference library: zero-shot text-to-speech with emotional control

Project description

index-tts-inference

CI PyPI

Minimal pip package for IndexTTS2 inference. Wraps the official IndexTTS2 repo, stripped down to only what's needed for inference.

Install

pip install indextts2-inference

Optional extras

SageAttention (alternative attention backend):

pip install indextts2-inference[sage-attn]

Flash Attention v2 (acceleration engine with KV cache and CUDA graphs):

pip install indextts2-inference[flash-attn]

DeepSpeed:

pip install indextts2-inference[deepspeed]

Usage

from indextts import IndexTTS2

# Auto-downloads model from HuggingFace
tts = IndexTTS2()

# Or use local/finetuned checkpoints
tts = IndexTTS2(model_dir="/path/to/checkpoints")

# Basic inference
tts.infer(spk_audio_prompt="voice.wav", text="Hello world", output_path="out.wav")

Attention backends

# Default PyTorch SDPA — auto-selects best kernel, no extra deps needed
tts = IndexTTS2()

# SageAttention — may help on Ampere/Hopper GPUs, requires sageattention package
tts = IndexTTS2(attn_backend="sage", use_fp16=True)

# Flash Attention v2 — acceleration engine with paged KV cache and CUDA graphs
tts = IndexTTS2(attn_backend="flash")

Language selection

By default the language is auto-detected between Chinese and English. You can set it explicitly:

tts = IndexTTS2(language="es")
tts.infer(spk_audio_prompt="voice.wav", text="Hola, esto es una prueba.", output_path="out.wav")

Emotion control

There are three ways to control the emotion of the generated speech:

# 1. From a reference audio
tts.infer(
    spk_audio_prompt="speaker.wav",
    text="Some text",
    output_path="out.wav",
    emo_audio_prompt="happy_reference.wav",
    emo_alpha=0.7,
)

# 2. With an explicit emotion vector
#    [happy, angry, sad, afraid, disgusted, melancholic, surprised, calm]
tts.infer(
    spk_audio_prompt="speaker.wav",
    text="I am very happy!",
    output_path="out.wav",
    emo_vector=[0.8, 0, 0, 0, 0, 0, 0, 0],
)

# 3. Auto-detect emotion from the text itself
tts.infer(
    spk_audio_prompt="speaker.wav",
    text="I am very happy!",
    output_path="out.wav",
    use_emo_text=True,
)

Streaming

for chunk in tts.infer(
    spk_audio_prompt="voice.wav",
    text="Long text to synthesize...",
    output_path="out.wav",
    stream_return=True,
):
    if chunk is not None and hasattr(chunk, "shape"):
        audio_np = chunk.squeeze().cpu().numpy()

Generation parameters

You can tune sampling parameters via kwargs:

tts.infer(
    spk_audio_prompt="voice.wav",
    text="Hello",
    output_path="out.wav",
    temperature=0.6,
    top_k=20,
    top_p=0.8,
    max_mel_tokens=2000,
)

Logging

By default, index-tts-inference only shows warnings. To see detailed logs:

export INDEXTTS_LOG_LEVEL=DEBUG  # DEBUG, INFO, WARNING (default)

PyTorch with CUDA

This package lists torch and torchaudio as dependencies without pinning a specific CUDA version. Install the CUDA variant you need before installing this package:

# Example: PyTorch with CUDA 12.8
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128

# Then install the package
pip install indextts2-inference

Or with uv:

# pyproject.toml of your project
[tool.uv.sources]
torch = [{ index = "pytorch-cuda", marker = "sys_platform == 'linux'" }]
torchaudio = [{ index = "pytorch-cuda", marker = "sys_platform == 'linux'" }]

[[tool.uv.index]]
name = "pytorch-cuda"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

License

See LICENSE and DISCLAIMER.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indextts2_inference-2.1.1.tar.gz (344.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indextts2_inference-2.1.1-py3-none-any.whl (167.9 kB view details)

Uploaded Python 3

File details

Details for the file indextts2_inference-2.1.1.tar.gz.

File metadata

  • Download URL: indextts2_inference-2.1.1.tar.gz
  • Upload date:
  • Size: 344.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for indextts2_inference-2.1.1.tar.gz
Algorithm Hash digest
SHA256 04344efdc3c926f951a9e5d40f8774a872f6859e3f14751526e8bb3ea65d0492
MD5 5e31aa98c69b996fe3a1d07bd10cf43c
BLAKE2b-256 dceb8c2ab4a3261a998da59f876485004b0dcec9b4fbd5ce615cb02787ecf254

See more details on using hashes here.

Provenance

The following attestation bundles were made for indextts2_inference-2.1.1.tar.gz:

Publisher: release.yml on nicokim/indextts2-inference

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file indextts2_inference-2.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for indextts2_inference-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c032da50330964933034a438d87bd1afe23db9387bea2895cca916e60838d36e
MD5 bf68c3dca5a9c263eda4e756346d63aa
BLAKE2b-256 a1c093a5aa49d56aadd409fbc5ccd3ca400b79f38d15cf266ce6652eb048699a

See more details on using hashes here.

Provenance

The following attestation bundles were made for indextts2_inference-2.1.1-py3-none-any.whl:

Publisher: release.yml on nicokim/indextts2-inference

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page