Skip to main content

🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Project description

🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, and Gustav Eje Henter

python pytorch lightning hydra black isort

This is the official code implementation of 🍵 Matcha-TTS.

We propose 🍵 Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching (similar to rectified flows) to speed up ODE-based speech synthesis. Our method:

  • Is probabilistic
  • Has compact memory footprint
  • Sounds highly natural
  • Is very fast to synthesise from

Check out our demo page. Read our arXiv preprint for more details.

Pretrained models will be auto downloaded with the CLI or gradio interface.

Try 🍵 Matcha-TTS on HuggingFace 🤗 spaces!


Installation

  1. Create an environment (suggested but optional)
conda create -n matcha-tts python=3.10 -y
conda activate matcha-tts
  1. Install Matcha TTS using pip or from source
pip install matcha-tts

from source

pip install git+https://github.com/shivammehta25/Matcha-TTS.git
  1. Run CLI / gradio app / jupyter notebook
# This will download the required models
matcha-tts --text "<INPUT TEXT>"

or

matcha-tts-app

or open synthesis.ipynb on jupyter notebook

CLI Arguments

  • To synthesise from given text, run:
matcha-tts --text "<INPUT TEXT>"
  • To synthesise from a file, run:
matcha-tts --file <PATH TO FILE>
  • To batch synthesise from a file, run:
matcha-tts --file <PATH TO FILE> --batched

Additional arguments

  • Speaking rate
matcha-tts --text "<INPUT TEXT>" --speaking_rate 1.0
  • Sampling temperature
matcha-tts --text "<INPUT TEXT>" --temperature 0.667
  • Euler ODE solver steps
matcha-tts --text "<INPUT TEXT>" --steps 10

Citation information

If you find this work useful, please cite our paper:

@article{mehta2023matcha,
  title={Matcha-TTS: A fast TTS architecture with conditional flow matching},
  author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
  journal={arXiv preprint arXiv:2309.03199},
  year={2023}
}

Train with your own dataset

Let's assume we are training with LJSpeech

  1. Download the dataset from here, extract it to data/LJSpeech-1.1, and prepare the filelists to point to the extracted data like the 5th point of setup in Tacotron2 repo.

  2. Clone and enter this repository

git clone https://github.com/shivammehta25/Matcha-TTS.git
cd Matcha-TTS
  1. Install the package from source
pip install -e .
  1. Go to configs/data/ljspeech.yaml and change
train_filelist_path: data/filelists/ljs_audio_text_train_filelist.txt
valid_filelist_path: data/filelists/ljs_audio_text_val_filelist.txt
  1. Generate normalisation statistics with the yaml file of dataset configuration
matcha-data-stats -i ljspeech.yaml
# Output:
#{'mel_mean': -5.53662231756592, 'mel_std': 2.1161014277038574}

Update these values in configs/data/ljspeech.yaml under data_statistics key.

data_statistics:  # Computed for ljspeech dataset
  mel_mean: -5.536622
  mel_std: 2.116101

to the paths of your train and validation filelists.

  1. Run the training script
make train-ljspeech

or

python matcha/train.py experiment=ljspeech
  • for a minimum memory run
python matcha/train.py experiment=ljspeech_min_memory
  • for multi-gpu training, run
python matcha/train.py experiment=ljspeech trainer.devices=[0,1]
  1. Synthesise from the custom trained model
matcha-tts --text "<INPUT TEXT>" --checkpoint_path <PATH TO CHECKPOINT>

Acknowledgements

Since this code uses: Lightning-Hydra-Template, you have all the powers that comes with it.

Other source codes I would like to acknowledge:

  • Coqui-TTS :For helping me figure out how to make cython binaries pip installable and encouragement
  • Hugging Face Diffusers: For their awesome diffusers library and its components
  • Grad-TTS: For source code of MAS
  • torchdyn: Useful for trying other ODE solvers during research and development
  • labml.ai: For RoPE implementation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matcha-tts-0.0.1.dev4.tar.gz (196.2 kB view hashes)

Uploaded Source

Built Distribution

matcha_tts-0.0.1.dev4-cp310-cp310-manylinux1_x86_64.whl (295.1 kB view hashes)

Uploaded CPython 3.10

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page