End-to-end text to speech using IPA and onnx models

These details have not been verified by PyPI

Project links

Homepage

Project description

Larynx

End-to-end text to speech system using gruut and onnx.

Larynx screenshot

Larynx's goals are:

"Good enough" synthesis to avoid using a cloud service
Faster than realtime performance on a Raspberry Pi 4
Broad language support
Voices trained purely from public datasets

Samples

Listen to voice samples from all of the pre-trained models.

Installation

$ pip install larynx

For Raspberry Pi (ARM), you will first need to manually install phonetisaurus.

Language Download

Larynx uses gruut to transform text into phonemes. You must install the appropriate gruut language before using Larynx. U.S. English is included with gruut, but for other languages:

$ python3 -m gruut <LANGUAGE> download

Voice/Vocoder Download

Voices and vocoders are available to download from the release page. They can be extracted anywhere, and the directory simply needs to be referenced in the command-line (e,g, --glow-tts /path/to/voice).

Web Server

You can run a local web server with:

$ python3 -m larynx.server --voices-dir /path/to/voices

Visit http://localhost:5002 to view the site and try out voices. See http://localhost/5002/openapi for documentation on the available HTTP endpoints.

See --help for more options.

Command-Line Example

The command below synthesizes multiple sentences and saves them to a directory. The --csv command-line flag indicates that each sentence is of the form id|text where id will be the name of the WAV file.

$ cat << EOF |
s01|The birch canoe slid on the smooth planks.
s02|Glue the sheet to the dark blue background.
s03|It's easy to tell the depth of a well.
s04|These days a chicken leg is a rare dish.
s05|Rice is often served in round bowls.
s06|The juice of lemons makes fine punch.
s07|The box was thrown beside the parked truck.
s08|The hogs were fed chopped corn and garbage.
s09|Four hours of steady work faced us.
s10|Large size in stockings is hard to sell.
EOF
  larynx \
    --debug \
    --csv \
    --glow-tts local/en-us/harvard-glow_tts \
    --hifi-gan local/hifi_gan/universal_large \
    --output-dir wavs \
    --language en-us \
    --denoiser-strength 0.001

You can use the --interactive flag instead of --output-dir to type sentences and have the audio played immediately using sox.

GlowTTS Settings

The GlowTTS voices support two additional parameters:

--noise-scale - determines the speaker volatility during synthesis (0-1, default is 0.333)
--length-scale - makes the voice speaker slower (< 1) or faster (> 1)

Vocoder Settings

--denoiser-strength - runs the denoiser if > 0; a small value like 0.005 is recommended.

Text to Speech Models

GlowTTS (35 voices)
- English (en-us, 20 voices)
  - blizzard_fls (F, accent, Blizzard)
  - cmu_aew (M, Arctic)
  - cmu_ahw (M, Arctic)
  - cmu_aup (M, accent, Arctic)
  - cmu_bdl (M, Arctic)
  - cmu_clb (F, Arctic)
  - cmu_eey (F, Arctic)
  - cmu_fem (M, Arctic)
  - cmu_jmk (M, Arctic)
  - cmu_ksp (M, accent, Arctic)
  - cmu_ljm (F, Arctic)
  - cmu_lnh (F, Arctic)
  - cmu_rms (M, Arctic)
  - cmu_rxr (M, Arctic)
  - cmu_slp (F, accent, Arctic)
  - cmu_slt (F, Arctic)
  - ek (F, accent, M-AILabs)
  - harvard (F, accent, CC/Attr/NC)
  - kathleen (F, CC0)
  - ljspeech (F, Public Domain)
- German (de-de, 1 voice)
  - thorsten (M, CC0)
- French (fr-fr, 3 voices)
  - gilles_le_blanc (M, M-AILabs)
  - siwis (F, CC/Attr)
  - tom (M, ODbL)
- Spanish (es-es, 2 voices)
  - carlfm (M, public domain)
  - karen_savage (F, M-AILabs)
- Dutch (nl, 3 voices)
  - bart_de_leeuw (M, Apache2)
  - flemishguy (M, CC0)
  - rdh (M, CC0)
- Italian (it-it, 2 voices)
  - lisa (F, M-AILabs)
  - riccardo_fasol (M, Apache2)
- Swedish (sv-se, 1 voice)
  - talesyntese (M, CC0)
- Russian (ru-ru, 3 voices)
  - hajdurova (F, M-AILabs)
  - nikolaev (M, M-AILabs)
  - minaev (M, M-AILabs)
Tacotron2
- Coming soon

Vocoders

Hi-Fi GAN
- Universal large
- VCTK medium
- VCTK small
WaveGlow
- 256 channel trained on LJ Speech

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.1.0

Nov 11, 2021

1.0.3

Oct 22, 2021

1.0.2

Oct 21, 2021

1.0.1

Oct 20, 2021

1.0.0

Oct 20, 2021

0.5.0

Aug 23, 2021

This version

0.3.1

Mar 31, 2021

0.3.0

Mar 28, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

larynx-0.3.1.tar.gz (31.9 kB view details)

Uploaded Mar 31, 2021 Source

File details

Details for the file larynx-0.3.1.tar.gz.

File metadata

Download URL: larynx-0.3.1.tar.gz
Upload date: Mar 31, 2021
Size: 31.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for larynx-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`2482811505a7d8296293a011357c7a87baa01f4e07e90fe4493bf778132ba954`
MD5	`0a75573145d033bee1175e38b4fbc1ae`
BLAKE2b-256	`f84da2e3a5feffe04730def2fcc6712b3b182a4c89bd8bd08fa410e058a6130f`

See more details on using hashes here.

larynx 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Larynx

Samples

Installation

Language Download

Voice/Vocoder Download

Web Server

Command-Line Example

GlowTTS Settings

Vocoder Settings

Text to Speech Models

Vocoders

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes