Skip to main content

diffwave

Project description

DiffWave

PyPI Release License

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with white noise and converts it into speech via iterative refinement. The speech can be controlled by providing a conditioning signal (e.g. log-scaled Mel spectrogram). The model and architecture details are described in DiffWave: A Versatile Diffusion Model for Audio Synthesis.

Status (2020-09-23)

  • stable training
  • high-quality synthesis
  • mixed-precision training
  • multi-GPU training
  • command-line inference
  • programmatic inference API
  • PyPI package
  • audio samples
  • pretrained models

Big thanks to Zhifeng Kong (lead author of DiffWave) for pointers and bug fixes.

Audio samples

...coming soon...

Pretrained models

...coming soon...

Install

Install using pip:

pip install diffwave

or from GitHub:

git clone https://github.com/lmnt-com/diffwave.git
cd diffwave
pip install .

Training

Before you start training, you'll need to prepare a training dataset. The dataset can have any directory structure as long as the contained .wav files are 16-bit mono (e.g. LJSpeech, VCTK). By default, this implementation assumes a sample rate of 22.05 kHz. If you need to change this value, edit params.py.

python -m diffwave.preprocess /path/to/dir/containing/wavs
python -m diffwave /path/to/model/dir /path/to/dir/containing/wavs

# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all

You should expect to hear intelligible (but noisy) speech by ~8k steps (~1.5h on a 2080 Ti).

Inference API

Basic usage:

from diffwave.inference import predict as diffwave_predict

model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = diffwave_predict(spectrogram, model_dir)

# audio is a GPU tensor in [N,T] format.

Inference CLI

python -m diffwave.inference /path/to/model /path/to/spectrogram -o output.wav

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffwave-0.1.2.tar.gz (10.1 kB view details)

Uploaded Source

File details

Details for the file diffwave-0.1.2.tar.gz.

File metadata

  • Download URL: diffwave-0.1.2.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1.post20200802 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.8.5

File hashes

Hashes for diffwave-0.1.2.tar.gz
Algorithm Hash digest
SHA256 48aad5ddf88d93a837701262035482d89356a7a97446a366ae643016e580243f
MD5 8171aa60c71d328628307b5def33699e
BLAKE2b-256 ffd7725e4cd5efe83a76fb7052c8d2b049776fd7144aca7380c3b44715595d7a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page