Skip to main content

X-Voice multilingual TTS toolkit

Project description

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

Paper Demo Python HF Space HF Dataset HF Benchmark ModelScope X-LANCE SII Geely CLSP

X-Voice is a flow-matching-based multilingual zero-shot voice cloning system that enables one speaker to speak 30 languages.

News

Installation

Create a separate environment if needed

# Create a conda env with python_version>=3.11
conda create -n x-voice python=3.11
conda activate x-voice

# Install FFmpeg if you haven't yet
conda install ffmpeg

Install PyTorch with matched device

NVIDIA GPU
# Install pytorch with your CUDA version, e.g.
pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
AMD GPU
# Install pytorch with your ROCm version (Linux only), e.g.
pip install torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 --extra-index-url https://download.pytorch.org/whl/rocm6.2
Intel GPU
# Install pytorch with your XPU version, e.g.
pip install torch torchaudio --index-url https://download.pytorch.org/whl/test/xpu
Apple Silicon
# Install the stable pytorch, e.g.
pip install torch torchaudio

Install X-Voice

git clone https://github.com/sunnyxrxrx/X-Voice.git
cd X-Voice
pip install -e .

Check your ESpeak-ng installation:

espeak-ng --version

If not found, run src/x_voice/prepare_ipa.sh first.

Inference

1. Gradio App

x-voice_infer-gradio --host 0.0.0.0 --port 7860

2. CLI Inference

# X-Voice Stage1
python -m x_voice.infer.infer_cli_stage1 -c src/x_voice/infer/examples/basic/basic_stage1.toml

# X-Voice Stage2
python -m x_voice.infer.infer_cli_stage2 -c src/x_voice/infer/examples/basic/basic_stage2.toml

Training

TTS Model Training

Refer to training guidance for best practice.

Speaking Rate Predictor Training

Refer to speaking rate predictor guidance for the multilingual speaking rate predictor used in X-Voice.

Evaluation

Refer to evaluation guidance for benchmark and metric scripts.

Repo Structure

X-Voice/
├── ckpts/                  # checkpoints
├── data/                   # datasets and processed data
├── src/
│   ├── rate_pred/          # speaking rate predictor
│   ├── third_party/
│   │   └── BigVGAN/        # BigVGAN submodule
│   └── x_voice/            # main X-Voice package
└── pyproject.toml          # package definition and dependencies

Development

Use pre-commit to ensure code quality:

pip install pre-commit
pre-commit install
pre-commit run --all-files

Acknowledgements

License

Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

x_voice-0.1.1.tar.gz (29.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

x_voice-0.1.1-py3-none-any.whl (30.2 MB view details)

Uploaded Python 3

File details

Details for the file x_voice-0.1.1.tar.gz.

File metadata

  • Download URL: x_voice-0.1.1.tar.gz
  • Upload date:
  • Size: 29.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for x_voice-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6c5525f17852125ba6b3171ab35224773af5f501a489abaccc8cd2babbada385
MD5 026c2f25bf6517e7c4a3f8a761d9695d
BLAKE2b-256 8983fa0f1ef035df45e5fba9b82a4a3c2c0da5572b86ce41cd0df4734ed0889e

See more details on using hashes here.

File details

Details for the file x_voice-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: x_voice-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 30.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for x_voice-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0d8009469d39c6f63ec0476f21fbd3e48bb015ee76905beec841759695a1a386
MD5 f0a8e4934e941dbb554b84429c1565fc
BLAKE2b-256 0b71789dde5a4e849c230f260a43639f9256433320c73079e917fe43719f2abc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page