diffwave
Project description
DiffWave
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with white noise and converts it into speech via iterative refinement. The speech can be controlled by providing a conditioning signal (e.g. log-scaled Mel spectrogram). The model and architecture details are described in DiffWave: A Versatile Diffusion Model for Audio Synthesis.
Status (2020-09-22)
- stable training
- high-quality synthesis
- mixed-precision training
- command-line inference
- programmatic inference API
- PyPI package
- audio samples
- pretrained models
Audio samples
...coming soon...
Pretrained models
...coming soon...
Install
Install using pip:
pip install diffwave
or from GitHub:
git clone https://github.com/lmnt-com/diffwave.git
cd diffwave
pip install .
Training
Before you start training, you'll need to prepare a training dataset. The dataset can have any directory structure as long as the contained .wav files are 16-bit mono (e.g. LJSpeech, VCTK). By default, this implementation assumes a sample rate of 22.05 kHz. If you need to change this value, edit params.py.
python -m diffwave.preprocess /path/to/dir/containing/wavs
python -m diffwave /path/to/model/dir /path/to/dir/containing/wavs
# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all
You should expect to hear intelligible (but noisy) speech by ~8k steps (~1.5h on a 2080 Ti).
Inference API
Basic usage:
from diffwave.inference import predict as diffwave_predict
model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = diffwave_predict(spectrogram, model_dir)
# audio is a GPU tensor in [N,T] format.
Inference CLI
python -m diffwave.inference /path/to/model /path/to/spectrogram -o output.wav
References
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file diffwave-0.1.0.tar.gz
.
File metadata
- Download URL: diffwave-0.1.0.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1.post20200802 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31ae8d63739308ce1d2bacf588d21791a2577c270465efdbf20191f216a5965d |
|
MD5 | b41cd54d0864e9b70b1531231d65f267 |
|
BLAKE2b-256 | 5afe123962459ad983ab3aa7e5a501f4ac156766be5ff441f958a3231b135887 |