Skip to main content

End-to-end text to speech using IPA and onnx models

Project description

Larynx

End-to-end text to speech system using gruut and onnx.

Larynx screenshot

Larynx's goals are:

  • "Good enough" synthesis to avoid using a cloud service
  • Faster than realtime performance on a Raspberry Pi 4
  • Broad language support
  • Voices trained purely from public datasets

Samples

Listen to voice samples from all of the pre-trained models.

Installation

$ pip install larynx

For Raspberry Pi (ARM), you will first need to manually install phonetisaurus.

Language Download

Larynx uses gruut to transform text into phonemes. You must install the appropriate gruut language before using Larynx. U.S. English is included with gruut, but for other languages:

$ python3 -m gruut <LANGUAGE> download

Voice/Vocoder Download

Voices and vocoders are available to download from the release page. They can be extracted anywhere, and the directory simply needs to be referenced in the command-line (e,g, --glow-tts /path/to/voice).

Web Server

You can run a local web server with:

$ python3 -m larynx.server --voices-dir /path/to/voices

Visit http://localhost:5002 to view the site and try out voices. See http://localhost/5002/openapi for documentation on the available HTTP endpoints.

See --help for more options.

Command-Line Example

The command below synthesizes multiple sentences and saves them to a directory. The --csv command-line flag indicates that each sentence is of the form id|text where id will be the name of the WAV file.

$ cat << EOF |
s01|The birch canoe slid on the smooth planks.
s02|Glue the sheet to the dark blue background.
s03|It's easy to tell the depth of a well.
s04|These days a chicken leg is a rare dish.
s05|Rice is often served in round bowls.
s06|The juice of lemons makes fine punch.
s07|The box was thrown beside the parked truck.
s08|The hogs were fed chopped corn and garbage.
s09|Four hours of steady work faced us.
s10|Large size in stockings is hard to sell.
EOF
  larynx \
    --debug \
    --csv \
    --glow-tts local/en-us/harvard-glow_tts \
    --hifi-gan local/hifi_gan/universal_large \
    --output-dir wavs \
    --language en-us \
    --denoiser-strength 0.001

You can use the --interactive flag instead of --output-dir to type sentences and have the audio played immediately using sox.

GlowTTS Settings

The GlowTTS voices support two additional parameters:

  • --noise-scale - determines the speaker volatility during synthesis (0-1, default is 0.333)
  • --length-scale - makes the voice speaker slower (< 1) or faster (> 1)

Vocoder Settings

  • --denoiser-strength - runs the denoiser if > 0; a small value like 0.005 is recommended.

Text to Speech Models

Vocoders

  • Hi-Fi GAN
    • Universal large
    • VCTK medium
    • VCTK small
  • WaveGlow
    • 256 channel trained on LJ Speech

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

larynx-0.3.1.tar.gz (31.9 kB view details)

Uploaded Source

File details

Details for the file larynx-0.3.1.tar.gz.

File metadata

  • Download URL: larynx-0.3.1.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for larynx-0.3.1.tar.gz
Algorithm Hash digest
SHA256 2482811505a7d8296293a011357c7a87baa01f4e07e90fe4493bf778132ba954
MD5 0a75573145d033bee1175e38b4fbc1ae
BLAKE2b-256 f84da2e3a5feffe04730def2fcc6712b3b182a4c89bd8bd08fa410e058a6130f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page