Deep learning for Text to Speech by Coqui.
🐸TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. 🐸TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.
💬 Where to ask questions
Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly so that more people can benefit from it.
|🚨 Bug Reports||GitHub Issue Tracker|
|🎁 Feature Requests & Ideas||GitHub Issue Tracker|
|👩💻 Usage Questions||Github Discussions|
|🗯 General Discussion||Github Discussions or Gitter Room|
🔗 Links and Resources
|📌 Road Map||Main Development Plans|
|🚀 Released Models||TTS Releases and Experimental Models|
🥇 TTS Performance
Underlined "TTS*" and "Judy*" are 🐸TTS models
- High-performance Deep Learning models for Text2Speech tasks.
- Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).
- Speaker Encoder to compute speaker embeddings efficiently.
- Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)
- Fast and efficient model training.
- Detailed training logs on the terminal and Tensorboard.
- Support for Multi-speaker TTS.
- Efficient, flexible, lightweight but feature complete
- Released and ready-to-use models.
- Tools to curate Text2Speech datasets under
- Utilities to use and test your models.
- Modular (but not too much) code base enabling easy implementation of new ideas.
- Tacotron: paper
- Tacotron2: paper
- Glow-TTS: paper
- Speedy-Speech: paper
- Align-TTS: paper
- FastPitch: paper
- FastSpeech: paper
- VITS: paper
- Guided Attention: paper
- Forward Backward Decoding: paper
- Graves Attention: paper
- Double Decoder Consistency: blog
- Dynamic Convolutional Attention: paper
- Alignment Network: paper
- MelGAN: paper
- MultiBandMelGAN: paper
- ParallelWaveGAN: paper
- GAN-TTS discriminators: paper
- WaveRNN: origin
- WaveGrad: paper
- HiFiGAN: paper
- UnivNet: paper
You can also help us implement more models.
🐸TTS is tested on Ubuntu 18.04 with python >= 3.7, < 3.11..
If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option.
pip install TTS
If you plan to code or train models, clone 🐸TTS and install it locally.
git clone https://github.com/coqui-ai/TTS pip install -e .[all,dev,notebooks] # Select the relevant extras
If you are on Ubuntu (Debian), you can also run following commands for installation.
$ make system-deps # intended to be used on Ubuntu (Debian). Let us know if you have a diffent OS. $ make install
If you are on Windows, 👑@GuyPaddock wrote installation instructions here.
Single Speaker Models
List provided models:
$ tts --list_models
Run TTS with default models:
$ tts --text "Text for TTS"
Run a TTS model with its default vocoder model:
$ tts --text "Text for TTS" --model_name "<language>/<dataset>/<model_name>
Run with specific TTS and vocoder models from the list:
$ tts --text "Text for TTS" --model_name "<language>/<dataset>/<model_name>" --vocoder_name "<language>/<dataset>/<model_name>" --output_path
Run your own TTS model (Using Griffin-Lim Vocoder):
$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
Run your own TTS and Vocoder models:
$ tts --text "Text for TTS" --model_path path/to/config.json --config_path path/to/model.pth --out_path output/path/speech.wav --vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json
List the available speakers and choose as <speaker_id> among them:
$ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs
Run the multi-speaker TTS model with the target speaker ID:
$ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --speaker_idx <speaker_id>
Run your own multi-speaker TTS model:
$ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/config.json --config_path path/to/model.pth --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>
|- notebooks/ (Jupyter Notebooks for model evaluation, parameter selection and data analysis.) |- utils/ (common utilities.) |- TTS |- bin/ (folder for all the executables.) |- train*.py (train your target model.) |- distribute.py (train your TTS model using Multiple GPUs.) |- compute_statistics.py (compute dataset statistics for normalization.) |- ... |- tts/ (text to speech models) |- layers/ (model layer definitions) |- models/ (model definitions) |- utils/ (model specific utilities.) |- speaker_encoder/ (Speaker Encoder models.) |- (same) |- vocoder/ (Vocoder models.) |- (same)
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.