Skip to main content

Deep learning for Text to Speech by Coqui.

Project description


📣 Clone your voice with a single click on 🐸Coqui.ai

📣 🐸Coqui Studio is launching soon!! Join our waiting list!!


🐸TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. 🐸TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.

Dicord License PyPI version Covenant Downloads DOI

GithubActions GithubActions GithubActions GithubActions GithubActions GithubActions GithubActions GithubActions GithubActions GithubActions GithubActions Docs

📰 Subscribe to 🐸Coqui.ai Newsletter

📢 English Voice Samples and SoundCloud playlist

📄 Text-to-Speech paper collection

💬 Where to ask questions

Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly so that more people can benefit from it.

Type Platforms
🚨 Bug Reports GitHub Issue Tracker
🎁 Feature Requests & Ideas GitHub Issue Tracker
👩‍💻 Usage Questions GitHub Discussions
🗯 General Discussion GitHub Discussions or Discord

🔗 Links and Resources

Type Links
💼 Documentation ReadTheDocs
💾 Installation TTS/README.md
👩‍💻 Contributing CONTRIBUTING.md
📌 Road Map Main Development Plans
🚀 Released Models TTS Releases and Experimental Models

🥇 TTS Performance

Underlined "TTS*" and "Judy*" are 🐸TTS models

Features

  • High-performance Deep Learning models for Text2Speech tasks.
    • Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).
    • Speaker Encoder to compute speaker embeddings efficiently.
    • Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)
  • Fast and efficient model training.
  • Detailed training logs on the terminal and Tensorboard.
  • Support for Multi-speaker TTS.
  • Efficient, flexible, lightweight but feature complete Trainer API.
  • Released and ready-to-use models.
  • Tools to curate Text2Speech datasets underdataset_analysis.
  • Utilities to use and test your models.
  • Modular (but not too much) code base enabling easy implementation of new ideas.

Implemented Models

Spectrogram models

End-to-End Models

Attention Methods

  • Guided Attention: paper
  • Forward Backward Decoding: paper
  • Graves Attention: paper
  • Double Decoder Consistency: blog
  • Dynamic Convolutional Attention: paper
  • Alignment Network: paper

Speaker Encoder

Vocoders

You can also help us implement more models.

Install TTS

🐸TTS is tested on Ubuntu 18.04 with python >= 3.7, < 3.11..

If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option.

pip install TTS

If you plan to code or train models, clone 🐸TTS and install it locally.

git clone https://github.com/coqui-ai/TTS
pip install -e .[all,dev,notebooks]  # Select the relevant extras

If you are on Ubuntu (Debian), you can also run following commands for installation.

$ make system-deps  # intended to be used on Ubuntu (Debian). Let us know if you have a different OS.
$ make install

If you are on Windows, 👑@GuyPaddock wrote installation instructions here.

Docker Image

You can also try TTS without install with the docker image. Simply run the following command and you will be able to run TTS without installing it.

docker run --rm -it -p 5002:5002 --entrypoint /bin/bash ghcr.io/coqui-ai/tts-cpu
python3 TTS/server/server.py --list_models #To get the list of available models
python3 TTS/server/server.py --model_name tts_models/en/vctk/vits # To start a server

You can then enjoy the TTS server here More details about the docker images (like GPU support) can be found here

Synthesizing speech by 🐸TTS

🐍 Python API

from TTS.api import TTS

# Running a multi-speaker and multi-lingual model

# List available 🐸TTS models and choose the first one
model_name = TTS.list_models()[0]
# Init TTS
tts = TTS(model_name)
# Run TTS
# ❗ Since this model is multi-speaker and multi-lingual, we must set the target speaker and the language
# Text to speech with a numpy output
wav = tts.tts("This is a test! This is also a test!!", speaker=tts.speakers[0], language=tts.languages[0])
# Text to speech to a file
tts.tts_to_file(text="Hello world!", speaker=tts.speakers[0], language=tts.languages[0], file_path="output.wav")

# Running a single speaker model

# Init TTS with the target model name
tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False, gpu=False)
# Run TTS
tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH)

Command line tts

Single Speaker Models

  • List provided models:

    $ tts --list_models
    
  • Get model info (for both tts_models and vocoder_models):

    • Query by type/name: The model_info_by_name uses the name as it from the --list_models.

      $ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
      

      For example:

      $ tts --model_info_by_name tts_models/tr/common-voice/glow-tts
      
      $ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2
      
    • Query by type/idx: The model_query_idx uses the corresponding idx from --list_models.

      $ tts --model_info_by_idx "<model_type>/<model_query_idx>"
      

      For example:

      $ tts --model_info_by_idx tts_models/3
      
  • Run TTS with default models:

    $ tts --text "Text for TTS" --out_path output/path/speech.wav
    
  • Run a TTS model with its default vocoder model:

    $ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav
    

    For example:

    $ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --out_path output/path/speech.wav
    
  • Run with specific TTS and vocoder models from the list:

    $ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --vocoder_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav
    

    For example:

    $ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --vocoder_name "vocoder_models/en/ljspeech/univnet" --out_path output/path/speech.wav
    
  • Run your own TTS model (Using Griffin-Lim Vocoder):

    $ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
    
  • Run your own TTS and Vocoder models:

    $ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
        --vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json
    

Multi-speaker Models

  • List the available speakers and choose as <speaker_id> among them:

    $ tts --model_name "<language>/<dataset>/<model_name>"  --list_speaker_idxs
    
  • Run the multi-speaker TTS model with the target speaker ID:

    $ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>"  --speaker_idx <speaker_id>
    
  • Run your own multi-speaker TTS model:

    $ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/model.pth --config_path path/to/config.json --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>
    

Directory Structure

|- notebooks/       (Jupyter Notebooks for model evaluation, parameter selection and data analysis.)
|- utils/           (common utilities.)
|- TTS
    |- bin/             (folder for all the executables.)
      |- train*.py                  (train your target model.)
      |- ...
    |- tts/             (text to speech models)
        |- layers/          (model layer definitions)
        |- models/          (model definitions)
        |- utils/           (model specific utilities.)
    |- speaker_encoder/ (Speaker Encoder models.)
        |- (same)
    |- vocoder/         (Vocoder models.)
        |- (same)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TTS-0.10.1.tar.gz (1.4 MB view details)

Uploaded Source

Built Distributions

TTS-0.10.1-cp310-cp310-manylinux1_x86_64.whl (590.5 kB view details)

Uploaded CPython 3.10

TTS-0.10.1-cp39-cp39-manylinux1_x86_64.whl (590.5 kB view details)

Uploaded CPython 3.9

TTS-0.10.1-cp38-cp38-manylinux1_x86_64.whl (590.5 kB view details)

Uploaded CPython 3.8

TTS-0.10.1-cp37-cp37m-manylinux1_x86_64.whl (590.5 kB view details)

Uploaded CPython 3.7m

File details

Details for the file TTS-0.10.1.tar.gz.

File metadata

  • Download URL: TTS-0.10.1.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.15

File hashes

Hashes for TTS-0.10.1.tar.gz
Algorithm Hash digest
SHA256 93e7bf05fba721ce2c98335c38dac9859e138970c4231c2d8f50ee365630f822
MD5 8ea3a1dc11043867cf9f27f7813e6d9a
BLAKE2b-256 10bc26a5dfaf5547c1c1a2307aac97f039311ef019ce5b19715a182b21c6a098

See more details on using hashes here.

File details

Details for the file TTS-0.10.1-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for TTS-0.10.1-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4e594a9f6ca3cd0068dcb4a62b121b67a5c34aa42da7a542a14ac6be88ab8b78
MD5 448da76fb1c72b6a3d5eed0c7d656d6a
BLAKE2b-256 2b4093e69d8bb1cb9b17ff764297288bf7f26d8d761f654c8dbf2d525402069f

See more details on using hashes here.

File details

Details for the file TTS-0.10.1-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for TTS-0.10.1-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 20d4a72f6de5f1d531797edde0a75cee63e4a1329ab4b496d4bb667f74a9973c
MD5 1348a21457b64d72472084999e8334e7
BLAKE2b-256 a66068597d9ee315ccc06cf1716a1b2b5156516edd448394ecfad0e894bec5ca

See more details on using hashes here.

File details

Details for the file TTS-0.10.1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for TTS-0.10.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 0f9799765a2f04b39d0c3ff2f980bf9557172ac8ddcffb43f052131c4e8ce9dc
MD5 bf9f246e6273353ec80616ca8b8bc5db
BLAKE2b-256 87f7303d333ea64ef3781755a08d6407376ede93976b97a8537915050dba36e1

See more details on using hashes here.

File details

Details for the file TTS-0.10.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for TTS-0.10.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 1ac543a1f2f90a44d40faff3bf8bd15700374ca07e4d9260a72cb8906dd0d4ef
MD5 631d8d0c19885eb67d3185e2faeb92fe
BLAKE2b-256 620d29688d06c31dd683998e318a53a4304f1ffeef58f596b75f848584608bc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page