🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Project description
🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, and Gustav Eje Henter
This is the official code implementation of 🍵 Matcha-TTS.
We propose 🍵 Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching (similar to rectified flows) to speed up ODE-based speech synthesis. Our method:
- Is probabilistic
- Has compact memory footprint
- Sounds highly natural
- Is very fast to synthesise from
Check out our demo page and read our arXiv preprint for more details.
Pre-trained models will be automatically downloaded with the CLI or gradio interface.
Try 🍵 Matcha-TTS on HuggingFace 🤗 spaces!
Watch the teaser
Installation
- Create an environment (suggested but optional)
conda create -n matcha-tts python=3.10 -y
conda activate matcha-tts
- Install Matcha TTS using pip or from source
pip install matcha-tts
from source
pip install git+https://github.com/shivammehta25/Matcha-TTS.git
- Run CLI / gradio app / jupyter notebook
# This will download the required models
matcha-tts --text "<INPUT TEXT>"
or
matcha-tts-app
or open synthesis.ipynb
on jupyter notebook
CLI Arguments
- To synthesise from given text, run:
matcha-tts --text "<INPUT TEXT>"
- To synthesise from a file, run:
matcha-tts --file <PATH TO FILE>
- To batch synthesise from a file, run:
matcha-tts --file <PATH TO FILE> --batched
Additional arguments
- Speaking rate
matcha-tts --text "<INPUT TEXT>" --speaking_rate 1.0
- Sampling temperature
matcha-tts --text "<INPUT TEXT>" --temperature 0.667
- Euler ODE solver steps
matcha-tts --text "<INPUT TEXT>" --steps 10
Train with your own dataset
Let's assume we are training with LJ Speech
-
Download the dataset from here, extract it to
data/LJSpeech-1.1
, and prepare the file lists to point to the extracted data like for item 5 in the setup of the NVIDIA Tacotron 2 repo. -
Clone and enter the Matcha-TTS repository
git clone https://github.com/shivammehta25/Matcha-TTS.git
cd Matcha-TTS
- Install the package from source
pip install -e .
- Go to
configs/data/ljspeech.yaml
and change
train_filelist_path: data/filelists/ljs_audio_text_train_filelist.txt
valid_filelist_path: data/filelists/ljs_audio_text_val_filelist.txt
- Generate normalisation statistics with the yaml file of dataset configuration
matcha-data-stats -i ljspeech.yaml
# Output:
#{'mel_mean': -5.53662231756592, 'mel_std': 2.1161014277038574}
Update these values in configs/data/ljspeech.yaml
under data_statistics
key.
data_statistics: # Computed for ljspeech dataset
mel_mean: -5.536622
mel_std: 2.116101
to the paths of your train and validation filelists.
- Run the training script
make train-ljspeech
or
python matcha/train.py experiment=ljspeech
- for a minimum memory run
python matcha/train.py experiment=ljspeech_min_memory
- for multi-gpu training, run
python matcha/train.py experiment=ljspeech trainer.devices=[0,1]
- Synthesise from the custom trained model
matcha-tts --text "<INPUT TEXT>" --checkpoint_path <PATH TO CHECKPOINT>
Citation information
If you use our code or otherwise find this work useful, please cite our paper:
@article{mehta2023matcha,
title={Matcha-TTS: A fast TTS architecture with conditional flow matching},
author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
journal={arXiv preprint arXiv:2309.03199},
year={2023}
}
Acknowledgements
Since this code uses Lightning-Hydra-Template, you have all the powers that come with it.
Other source code I would like to acknowledge:
- Coqui-TTS: For helping me figure out how to make cython binaries pip installable and encouragement
- Hugging Face Diffusers: For their awesome diffusers library and its components
- Grad-TTS: For the monotonic alignment search source code
- torchdyn: Useful for trying other ODE solvers during research and development
- labml.ai: For the RoPE implementation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for matcha_tts-0.0.3-cp310-cp310-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6213806b17f8ae67a25d15752f885354c5a4530962b02541776af9f5ff565c6a |
|
MD5 | 117d6ce71cba72078111a64cae910bd1 |
|
BLAKE2b-256 | bd5046b0df45789305e6c9a0631a1d1f9ea1ac2711615d82ef220fd703d2a645 |