Skip to main content

wavegrad

Project description

WaveGrad

PyPI Release License

WaveGrad is a fast, high-quality neural vocoder designed by the folks at Google Brain. The architecture is described in WaveGrad: Estimating Gradients for Waveform Generation. In short, this model takes a log-scaled Mel spectrogram and converts it to a waveform via iterative refinement.

Status

  • stable training (22 kHz, 24 kHz)
  • high-quality synthesis
  • mixed-precision training
  • custom noise schedule (faster inference)
  • programmatic API
  • PyPI package
  • audio samples
  • pretrained models

Audio samples

...coming soon...

Pretrained models

...coming soon...

Install

Install using pip:

pip install wavegrad

or from GitHub:

git clone https://github.com/lmnt-com/wavegrad.git
cd wavegrad
pip install .

Training

Before you start training, you'll need to prepare a training dataset. The dataset can have any directory structure as long as the contained .wav files are 16-bit mono (e.g. LJSpeech, VCTK). By default, this implementation assumes a sample rate of 22 kHz. If you need to change this value, edit params.py.

python -m wavegrad.preprocess /path/to/dir/containing/wavs
python -m wavegrad /path/to/model/dir /path/to/dir/containing/wavs

# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all

You should expect to hear intelligible speech by ~20k steps (~1.5h on a 2080 Ti).

Inference API

Basic usage:

from wavegrad.inference import predict as wavegrad_predict

model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = wavegrad_predict(spectrogram, model_dir)

# audio is a GPU tensor in [N,T] format.

If you have a custom noise schedule (see below):

from wavegrad.inference import predict as wavegrad_predict

params = { 'noise_schedule': np.load('/path/to/noise_schedule.npy') }
model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = wavegrad_predict(spectrogram, model_dir, params=params)

# `audio` is a GPU tensor in [N,T] format.

Inference CLI

python -m wavegrad.inference /path/to/model /path/to/spectrogram -o output.wav

Noise schedule

The default implementation uses 1000 iterations to refine the waveform, which runs slower than real-time. WaveGrad is able to achieve high-quality, faster than real-time synthesis with as few as 6 iterations without re-training the model with new hyperparameters.

To achieve this speed-up, you will need to search for a noise schedule that works well for your dataset. This implementation provides a script to perform the search for you:

python -m wavegrad.noise_schedule /path/to/trained/model /path/to/preprocessed/validation/dataset
python -m wavegrad.inference /path/to/trained/model /path/to/spectrogram -n noise_schedule.npy -o output.wav

The default settings should give good results without spending too much time on the search. If you'd like to find a better noise scheudle or use a different number of inference iterations, run the noise_schedule script with --help to see additional configuration options.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wavegrad-0.1.1.tar.gz (11.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page