Skip to main content

X-Voice multilingual TTS toolkit

Project description

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

Paper Demo Python HF Space HF Dataset HF Benchmark ModelScope X-LANCE SII Geely CLSP

X-Voice is a flow-matching-based multilingual zero-shot voice cloning system that enables one speaker to speak 30 languages.

News

Installation

Create a separate environment if needed

# Create a conda env with python_version>=3.11
conda create -n x-voice python=3.11
conda activate x-voice

# Install FFmpeg if you haven't yet
conda install ffmpeg

Install PyTorch with matched device

NVIDIA GPU
# Install pytorch with your CUDA version, e.g.
pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
AMD GPU
# Install pytorch with your ROCm version (Linux only), e.g.
pip install torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 --extra-index-url https://download.pytorch.org/whl/rocm6.2
Intel GPU
# Install pytorch with your XPU version, e.g.
pip install torch torchaudio --index-url https://download.pytorch.org/whl/test/xpu
Apple Silicon
# Install the stable pytorch, e.g.
pip install torch torchaudio

Install X-Voice

git clone https://github.com/sunnyxrxrx/X-Voice.git
cd X-Voice
pip install -e .

Check your ESpeak-ng installation:

espeak-ng --version

If not found, run src/x_voice/prepare_ipa.sh first.

Inference

1. Gradio App

x-voice_infer-gradio --host 0.0.0.0 --port 7860

2. CLI Inference

# X-Voice Stage1
python -m x_voice.infer.infer_cli_stage1 -c src/x_voice/infer/examples/basic/basic_stage1.toml

# X-Voice Stage2
python -m x_voice.infer.infer_cli_stage2 -c src/x_voice/infer/examples/basic/basic_stage2.toml

Training

TTS Model Training

Refer to training guidance for best practice.

Speaking Rate Predictor Training

Refer to speaking rate predictor guidance for the multilingual speaking rate predictor used in X-Voice.

Evaluation

Refer to evaluation guidance for benchmark and metric scripts.

Repo Structure

X-Voice/
├── ckpts/                  # checkpoints
├── data/                   # datasets and processed data
├── src/
│   ├── rate_pred/          # speaking rate predictor
│   ├── third_party/
│   │   └── BigVGAN/        # BigVGAN submodule
│   └── x_voice/            # main X-Voice package
└── pyproject.toml          # package definition and dependencies

Development

Use pre-commit to ensure code quality:

pip install pre-commit
pre-commit install
pre-commit run --all-files

Acknowledgements

License

Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

x_voice-0.1.0.tar.gz (29.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

x_voice-0.1.0-py3-none-any.whl (30.2 MB view details)

Uploaded Python 3

File details

Details for the file x_voice-0.1.0.tar.gz.

File metadata

  • Download URL: x_voice-0.1.0.tar.gz
  • Upload date:
  • Size: 29.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for x_voice-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7e4545f3113c74bc8bbc5649101f340d7628ed0015afe29c3bfab7aa185b5446
MD5 624db92f19c1b2e1a76d3be1b44c0b97
BLAKE2b-256 e027a77da45ee934277024753093fa4a543ee612c17711e0116c2c60a1927014

See more details on using hashes here.

File details

Details for the file x_voice-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: x_voice-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 30.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for x_voice-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9351c65ee8e8c469ede767ccfef2ac8a205f22b9b35e213ebdfc3b7a70a2274f
MD5 dd31bddd7f7eadfeed085d11e8becbad
BLAKE2b-256 fb764ed06304d7c8fcaa15c5325a6b8be79b6d6808e3615c5122ce4196f7f4ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page