Skip to main content

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Project description

README: EN | 中文

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

              

EmotiVoice is a powerful and modern open-source text-to-speech engine. EmotiVoice speaks both English and Chinese, and with over 2000 different voices (refer to the List of Voices for details). The most prominent feature is emotional synthesis, allowing you to create speech with a wide range of emotions, including happy, excited, sad, angry and others.

An easy-to-use web interface is provided. There is also a scripting interface for batch generation of results.

Here are a few samples that EmotiVoice generates:

Quickstart

EmotiVoice Docker image

The easiest way to try EmotiVoice is by running the docker image. You need a machine with a NVidia GPU. If you have not done so, set up NVidia container toolkit by following the instructions for Linux or Windows WSL2. Then EmotiVoice can be run with,

docker run -dp 127.0.0.1:8501:8501 syq163/emoti-voice:latest

The Docker image was updated on November 21, 2023. If you have an older version, please update it by running the following commands:

docker pull syq163/emoti-voice:latest
docker run -dp 127.0.0.1:8501:8501 syq163/emoti-voice:latest

Now open your browser and navigate to http://localhost:8501 to start using EmotiVoice's powerful TTS capabilities.

Full installation

conda create -n EmotiVoice python=3.8 -y
conda activate EmotiVoice
pip install torch torchaudio
pip install numpy numba scipy transformers==4.26.1 soundfile yacs g2p_en jieba pypinyin

Prepare model files

We recommend that users refer to the wiki page How to download the pretrained model files if they encounter any issues.

git lfs install
git lfs clone https://huggingface.co/WangZeJun/simbert-base-chinese WangZeJun/simbert-base-chinese

or, you can run:

mkdir -p WangZeJun/simbert-base-chinese
wget https://huggingface.co/WangZeJun/simbert-base-chinese/resolve/main/config.json -P WangZeJun/simbert-base-chinese
wget https://huggingface.co/WangZeJun/simbert-base-chinese/resolve/main/pytorch_model.bin -P WangZeJun/simbert-base-chinese
wget https://huggingface.co/WangZeJun/simbert-base-chinese/resolve/main/vocab.txt -P WangZeJun/simbert-base-chinese

Inference

  1. You have to download the pretrained models, and run:
mkdir -p outputs/style_encoder/ckpt
mkdir -p outputs/prompt_tts_open_source_joint/ckpt
  1. And place g_*, do_* under outputs/prompt_tts_open_source_joint/ckpt and put checkpoint_* in outputs/style_encoder/ckpt.
  2. The inference text format is <speaker>|<style_prompt/emotion_prompt/content>|<phoneme>|<content>.
  • inference text example: 8051|Happy|<sos/eos> [IH0] [M] [AA1] [T] engsp4 [V] [OY1] [S] engsp4 [AH0] engsp1 [M] [AH1] [L] [T] [IY0] engsp4 [V] [OY1] [S] engsp1 [AE1] [N] [D] engsp1 [P] [R] [AA1] [M] [P] [T] engsp4 [K] [AH0] [N] [T] [R] [OW1] [L] [D] engsp1 [T] [IY1] engsp4 [T] [IY1] engsp4 [EH1] [S] engsp1 [EH1] [N] [JH] [AH0] [N] . <sos/eos>|Emoti-Voice - a Multi-Voice and Prompt-Controlled T-T-S Engine.
  1. You can get phonemes by python frontend_en.py data/my_text.txt > data/my_text_for_tts.txt.

  2. Then run:

TEXT=data/inference/text
python inference_am_vocoder_joint.py \
--logdir prompt_tts_open_source_joint \
--config_folder config/joint \
--checkpoint g_00140000 \
--test_file $TEXT

the synthesized speech is under outputs/prompt_tts_open_source_joint/test_audio.

  1. Or if you just want to use the interactive TTS demo page, run:
pip install streamlit
streamlit run demo_page.py

Wiki page

You may find more information from our wiki page.

Training

Please check Example Recipe

Roadmap & Future work

  • Our future plan can be found in the ROADMAP file.
  • The current implementation focuses on emotion/style control by prompts. It uses only pitch, speed, energy, and emotion as style factors, and does not use gender. But it is not complicated to change it to style/timbre control.
  • Suggestions are welcome. You can file issues or @ydopensource on twitter.

WeChat group

Welcome to scan the QR code below and join the WeChat group.

qr

Credits

License

EmotiVoice is provided under the Apache-2.0 License - see the LICENSE file for details.

The interactive page is provided under the User Agreement file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

EmotiVoice-0.2.0.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

EmotiVoice-0.2.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file EmotiVoice-0.2.0.tar.gz.

File metadata

  • Download URL: EmotiVoice-0.2.0.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.18

File hashes

Hashes for EmotiVoice-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f565324598b5cf246e86f41cfc933ec1abf14f0dac5488e782e6ee6a91b2616d
MD5 21828d1bcd9ae2186514d6eda7b6e902
BLAKE2b-256 ded158ed4a76d05dca237a8f74b7d55d77d165a1fb230bcbacdaf708c2db93a7

See more details on using hashes here.

File details

Details for the file EmotiVoice-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: EmotiVoice-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.18

File hashes

Hashes for EmotiVoice-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cab64a541363aa21aec5712bc2f1c499f793a1a661c080ef92d08453f0f7e611
MD5 9b34cd9136adf9a9f6293ea6796f4026
BLAKE2b-256 ede0bf49702c4c4a630648572bf54e70f6d70fd1a117cb14101392bec95540f7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page