X-Voice multilingual TTS toolkit
Project description
X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning
X-Voice is a flow-matching-based multilingual zero-shot voice cloning system that enables one speaker to speak 30 languages.
News
Installation
Create a separate environment if needed
# Create a conda env with python_version>=3.11
conda create -n x-voice python=3.11
conda activate x-voice
# Install FFmpeg if you haven't yet
conda install ffmpeg
Install PyTorch with matched device
NVIDIA GPU
# Install pytorch with your CUDA version, e.g. pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
AMD GPU
# Install pytorch with your ROCm version (Linux only), e.g. pip install torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 --extra-index-url https://download.pytorch.org/whl/rocm6.2
Intel GPU
# Install pytorch with your XPU version, e.g. pip install torch torchaudio --index-url https://download.pytorch.org/whl/test/xpu
Apple Silicon
# Install the stable pytorch, e.g. pip install torch torchaudio
Install X-Voice
git clone https://github.com/sunnyxrxrx/X-Voice.git
cd X-Voice
pip install -e .
Check your ESpeak-ng installation:
espeak-ng --version
If not found, run src/x_voice/prepare_ipa.sh first.
Inference
- In order to achieve desired performance, take a moment to read detailed guidance.
1. Gradio App
x-voice_infer-gradio --host 0.0.0.0 --port 7860
2. CLI Inference
# X-Voice Stage1
python -m x_voice.infer.infer_cli_stage1 -c src/x_voice/infer/examples/basic/basic_stage1.toml
# X-Voice Stage2
python -m x_voice.infer.infer_cli_stage2 -c src/x_voice/infer/examples/basic/basic_stage2.toml
Training
TTS Model Training
Refer to training guidance for best practice.
Speaking Rate Predictor Training
Refer to speaking rate predictor guidance for the multilingual speaking rate predictor used in X-Voice.
Evaluation
Refer to evaluation guidance for benchmark and metric scripts.
Repo Structure
X-Voice/
├── ckpts/ # checkpoints
├── data/ # datasets and processed data
├── src/
│ ├── rate_pred/ # speaking rate predictor
│ ├── third_party/
│ │ └── BigVGAN/ # BigVGAN submodule
│ └── x_voice/ # main X-Voice package
└── pyproject.toml # package definition and dependencies
Development
Use pre-commit to ensure code quality:
pip install pre-commit
pre-commit install
pre-commit run --all-files
Acknowledgements
- F5-TTS brilliant work and the foundation of this codebase
- Cross-Lingual F5-TTS 2 for its supervised fine-tuning strategy with synthetic audio prompts
- Cross-Lingual F5-TTS for its speaking rate predictor
- NLLB for translation in the Gradio demo
- torchdiffeq as ODE solver, Vocos and BigVGAN as vocoder
- FunASR, faster-whisper, UniSpeech, SpeechMOS for evaluation tools
- MAVL for Japanese syllable counting
License
Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file x_voice-0.1.1.tar.gz.
File metadata
- Download URL: x_voice-0.1.1.tar.gz
- Upload date:
- Size: 29.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c5525f17852125ba6b3171ab35224773af5f501a489abaccc8cd2babbada385
|
|
| MD5 |
026c2f25bf6517e7c4a3f8a761d9695d
|
|
| BLAKE2b-256 |
8983fa0f1ef035df45e5fba9b82a4a3c2c0da5572b86ce41cd0df4734ed0889e
|
File details
Details for the file x_voice-0.1.1-py3-none-any.whl.
File metadata
- Download URL: x_voice-0.1.1-py3-none-any.whl
- Upload date:
- Size: 30.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d8009469d39c6f63ec0476f21fbd3e48bb015ee76905beec841759695a1a386
|
|
| MD5 |
f0a8e4934e941dbb554b84429c1565fc
|
|
| BLAKE2b-256 |
0b71789dde5a4e849c230f260a43639f9256433320c73079e917fe43719f2abc
|