Skip to main content

Generate TTS audio samples for training wake word systems

Project description

Piper Sample Generator

Generate spoken audio samples using Piper for training a wake word system like openWakeWord or microWakeWord.

Supports normal Piper voices or a special generator that can mix speaker embeddings (English only).

Install

Create a virtual environment and install the requirements:

git clone https://github.com/rhasspy/piper-sample-generator.git
cd piper-sample-generator/

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -e .

Piper Voices

Download one or more Piper voices (both the .onnx and .onnx.json files for each voice). Audio samples are available.

As an example, we'll download the U.S. English "lessac" voice in medium quality:

mkdir -p voices
wget -O voices/en_US-lessac-medium.onnx 'https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx?download=true'
wget -O voices/en_US-lessac-medium.onnx.json 'https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json?download=true'

Generate a small set of samples with the CLI:

python3 -m piper_sample_generator 'okay piper.' --model voices/en_US-lessac-medium.onnx --max-samples 10 --output-dir okay_piper/

Check the okay_piper/ directory for 10 WAV files (named 0.wav to 9.wav).

You can add multiple --model <voice> arguments to cycle between different voices when generating samples.

See --help for more options, including --length-scales (speaking speeds).

Generator

Download the LibriTTS-R generator (exported from checkpoint):

wget -O models/en-us-libritts-high.pt 'https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/en_US-libritts_r-medium.pt'

Generate a small set of samples with the CLI:

python3 -m piper_sample_generator 'okay piper.' --model models/en-us-libritts-high.pt --max-samples 10 --output-dir okay_piper/

Check the okay_piper/ directory for 10 WAV files (named 0.wav to 9.wav).

Generation can be much faster and more efficient if you have a GPU available and PyTorch is configured to use it. In this case, increase the batch size:

python3 -m piper_sample_generator 'okay piper.' --model models/en-us-libritts-high.pt --max-samples 100 --batch-size 10 --output-dir okay_piper/

On an NVidia 2080 Ti with 11GB, a batch size of 100 was possible (generating approximately 100 samples per second).

Setting --max-speakers to a value less than 904 (the number of speakers LibriTTS) is recommended. Because very few samples of later speakers were in the original dataset, using them can cause audio artifacts.

See --help for more options, including the --length-scales (speaking speeds) and --slerp-weights (speaker blending) which are cycled per batch.

Augmentation

Once you have samples generated, you can augment them using audiomentation:

python3 -m piper_sample_generator.augment --sample-rate 22050 okay_piper/ okay_piper_augmented/

This will do several things to each sample:

  1. Randomly decrease the volume
    • The original samples are normalized, so different volume levels are needed
  2. Randomly apply an impulse response using the files in piper_sample_generator/impulses/
    • Change the acoustics of the sample to sound like the speaker was in a room with echo or using a poor quality microphone
  3. Resample to 16Khz for training (e.g., openWakeWord)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piper_sample_generator-3.2.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

piper_sample_generator-3.2.0-py3-none-any.whl (1.0 MB view details)

Uploaded Python 3

File details

Details for the file piper_sample_generator-3.2.0.tar.gz.

File metadata

  • Download URL: piper_sample_generator-3.2.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for piper_sample_generator-3.2.0.tar.gz
Algorithm Hash digest
SHA256 1cac5b3689eb3afa44f6b144df4fc4ea2245f25ff6ce1b6fea2e9a2962c03dc7
MD5 5199d4984efddcab2ea6670d2c8c2413
BLAKE2b-256 f7fe0177c666f7f976855cf6add22c4824a31c8bf74cfe4001a04731994f3c82

See more details on using hashes here.

File details

Details for the file piper_sample_generator-3.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for piper_sample_generator-3.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e80fa64029e3139acd98a9f5289d8c3ff9470387f738a8471ddd88fc5fe448d9
MD5 845ca28e4f97acafcb6a521713a07929
BLAKE2b-256 bc465697253c309610fe2c41fa0c5f166db4ae65e9adfa2172982db81dd18ec9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page