diffwave

These details have not been verified by PyPI

Project links

Homepage

Project description

DiffWave

PyPI Release

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with Gaussian noise and converts it into speech via iterative refinement. The speech can be controlled by providing a conditioning signal (e.g. log-scaled Mel spectrogram). The model and architecture details are described in DiffWave: A Versatile Diffusion Model for Audio Synthesis.

What's new (2020-10-14)

new pretrained model trained for 1M steps
updated audio samples with output from new model

Status (2020-10-14)

stable training
high-quality synthesis
mixed-precision training
multi-GPU training
command-line inference
programmatic inference API
PyPI package
audio samples
pretrained models
unconditional waveform synthesis

Big thanks to Zhifeng Kong (lead author of DiffWave) for pointers and bug fixes.

Audio samples

22.05 kHz audio samples

Pretrained models

22.05 kHz pretrained model (31 MB, SHA256: d415d2117bb0bba3999afabdd67ed11d9e43400af26193a451d112e2560821a8)

This pre-trained model is able to synthesize speech with a real-time factor of 0.87 (smaller is faster).

Pre-trained model details

trained on 4x 1080Ti
default parameters
single precision floating point (FP32)
trained on LJSpeech dataset excluding LJ001* and LJ002*
trained for 1000578 steps (1273 epochs)

Install

Install using pip:

pip install diffwave

or from GitHub:

git clone https://github.com/lmnt-com/diffwave.git
cd diffwave
pip install .

Training

Before you start training, you'll need to prepare a training dataset. The dataset can have any directory structure as long as the contained .wav files are 16-bit mono (e.g. LJSpeech, VCTK). By default, this implementation assumes a sample rate of 22.05 kHz. If you need to change this value, edit params.py.

python -m diffwave.preprocess /path/to/dir/containing/wavs
python -m diffwave /path/to/model/dir /path/to/dir/containing/wavs

# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all

You should expect to hear intelligible (but noisy) speech by ~8k steps (~1.5h on a 2080 Ti).

Multi-GPU training

By default, this implementation uses as many GPUs in parallel as returned by torch.cuda.device_count(). You can specify which GPUs to use by setting the CUDA_DEVICES_AVAILABLE environment variable before running the training module.

Inference API

Basic usage:

from diffwave.inference import predict as diffwave_predict

model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = diffwave_predict(spectrogram, model_dir)

# audio is a GPU tensor in [N,T] format.

Inference CLI

python -m diffwave.inference /path/to/model /path/to/spectrogram -o output.wav

References

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.7

Apr 2, 2021

This version

0.1.6

Oct 14, 2020

0.1.5

Sep 28, 2020

0.1.4

Sep 24, 2020

0.1.3

Sep 24, 2020

0.1.2

Sep 23, 2020

0.1.1

Sep 23, 2020

0.1.0

Sep 23, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffwave-0.1.6.tar.gz (11.8 kB view details)

Uploaded Oct 14, 2020 Source

File details

Details for the file diffwave-0.1.6.tar.gz.

File metadata

Download URL: diffwave-0.1.6.tar.gz
Upload date: Oct 14, 2020
Size: 11.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1.post20200802 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.8.5

File hashes

Hashes for diffwave-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`53f58a09fc71a734b3652ae8846563ec25d7f8e8935d8a65ce5b596862fbf3d1`
MD5	`58baceb267fd54cfaebd06679c2a084c`
BLAKE2b-256	`b65d39c22b5881e4a94a7c1f90c8b98744a36bd6ad9fe5a5cb80a4a03c541e35`