X-Voice multilingual TTS toolkit

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

Python

X-Voice is a flow-matching-based multilingual zero-shot voice cloning system that enables one speaker to speak 30 languages.

News

2026/04/30: X-Voice codebase, model, demo, dataset, and benchmark are released.

Installation

Create a separate environment if needed

# Create a conda env with python_version>=3.11
conda create -n x-voice python=3.11
conda activate x-voice

# Install FFmpeg if you haven't yet
conda install ffmpeg

Install PyTorch with matched device

NVIDIA GPU

# Install pytorch with your CUDA version, e.g.
pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128

AMD GPU

# Install pytorch with your ROCm version (Linux only), e.g.
pip install torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 --extra-index-url https://download.pytorch.org/whl/rocm6.2

Intel GPU

# Install pytorch with your XPU version, e.g.
pip install torch torchaudio --index-url https://download.pytorch.org/whl/test/xpu

Apple Silicon

# Install the stable pytorch, e.g.
pip install torch torchaudio

Install X-Voice

git clone https://github.com/sunnyxrxrx/X-Voice.git
cd X-Voice
pip install -e .

Check your ESpeak-ng installation:

espeak-ng --version

If not found, run src/x_voice/prepare_ipa.sh first.

Inference

In order to achieve desired performance, take a moment to read detailed guidance.

1. Gradio App

x-voice_infer-gradio --host 0.0.0.0 --port 7860

2. CLI Inference

# X-Voice Stage1
python -m x_voice.infer.infer_cli_stage1 -c src/x_voice/infer/examples/basic/basic_stage1.toml

# X-Voice Stage2
python -m x_voice.infer.infer_cli_stage2 -c src/x_voice/infer/examples/basic/basic_stage2.toml

Training

TTS Model Training

Refer to training guidance for best practice.

Speaking Rate Predictor Training

Refer to speaking rate predictor guidance for the multilingual speaking rate predictor used in X-Voice.

Evaluation

Refer to evaluation guidance for benchmark and metric scripts.

Repo Structure

X-Voice/
├── ckpts/                  # checkpoints
├── data/                   # datasets and processed data
├── src/
│   ├── rate_pred/          # speaking rate predictor
│   ├── third_party/
│   │   └── BigVGAN/        # BigVGAN submodule
│   └── x_voice/            # main X-Voice package
└── pyproject.toml          # package definition and dependencies

Development

Use pre-commit to ensure code quality:

pip install pre-commit
pre-commit install
pre-commit run --all-files

Acknowledgements

F5-TTS brilliant work and the foundation of this codebase
Cross-Lingual F5-TTS 2 for its supervised fine-tuning strategy with synthetic audio prompts
Cross-Lingual F5-TTS for its speaking rate predictor
NLLB for translation in the Gradio demo
torchdiffeq as ODE solver, Vocos and BigVGAN as vocoder
FunASR, faster-whisper, UniSpeech, SpeechMOS for evaluation tools
MAVL for Japanese syllable counting

License

Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.1

Apr 29, 2026

0.1.0

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

x_voice-0.1.1.tar.gz (29.9 MB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

x_voice-0.1.1-py3-none-any.whl (30.2 MB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file x_voice-0.1.1.tar.gz.

File metadata

Download URL: x_voice-0.1.1.tar.gz
Upload date: Apr 29, 2026
Size: 29.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for x_voice-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`6c5525f17852125ba6b3171ab35224773af5f501a489abaccc8cd2babbada385`
MD5	`026c2f25bf6517e7c4a3f8a761d9695d`
BLAKE2b-256	`8983fa0f1ef035df45e5fba9b82a4a3c2c0da5572b86ce41cd0df4734ed0889e`

See more details on using hashes here.

File details

Details for the file x_voice-0.1.1-py3-none-any.whl.

File metadata

Download URL: x_voice-0.1.1-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 30.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for x_voice-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0d8009469d39c6f63ec0476f21fbd3e48bb015ee76905beec841759695a1a386`
MD5	`f0a8e4934e941dbb554b84429c1565fc`
BLAKE2b-256	`0b71789dde5a4e849c230f260a43639f9256433320c73079e917fe43719f2abc`

See more details on using hashes here.

x-voice 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

News

Installation

Create a separate environment if needed

Install PyTorch with matched device

Install X-Voice

Inference

1. Gradio App

2. CLI Inference

Training

TTS Model Training

Speaking Rate Predictor Training

Evaluation

Repo Structure

Development

Acknowledgements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes