dillwave
Project description
DillWave
DillWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with Gaussian noise and converts it into speech via iterative refinement. The speech can be controlled by providing a conditioning signal (e.g. log-scaled Mel spectrogram). The model and architecture details are described in DiffWave: A Versatile Diffusion Model for Audio Synthesis.
Credit to the original repo here.
Install
(First install Pytorch, GPU version recommended!)
As a package:
pip install dillwave
From GitHub:
git clone https://github.com/dillfrescott/dillwave
pip install -e dillwave
or
pip install git+https://github.com/dillfrescott/dillwave
You need Git installed for either of these "From GitHub" install methods to work.
Training
python -m dillwave.preprocess /path/to/dir/containing/wavs # 48000hz, 1 channel
python -m dillwave /path/to/model/dir /path/to/dir/containing/wavs
# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all
You should expect to hear intelligible (but noisy) speech by ~8k steps (~1.5h on a 2080 Ti).
Inference CLI
python -m dillwave.inference /path/to/model --spectrogram_path /path/to/spectrogram -o output.wav [--fast]
I plan to release a pretrained model if it turns out good enough! :)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.