Skip to main content

dillwave

Project description

DillWave

DillWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with Gaussian noise and converts it into speech via iterative refinement. The speech can be controlled by providing a conditioning signal (e.g. log-scaled Mel spectrogram). The model and architecture details are described in DiffWave: A Versatile Diffusion Model for Audio Synthesis.

Credit to the original repo here.

Install

(First install Pytorch, GPU version recommended!)

As a package:

pip install dillwave

From GitHub:

git clone https://github.com/dillfrescott/dillwave
pip install -e dillwave

or

pip install git+https://github.com/dillfrescott/dillwave

You need Git installed for either of these "From GitHub" install methods to work.

Training

python -m dillwave.preprocess /path/to/dir/containing/wavs # 48000hz, 1 channel
python -m dillwave /path/to/model/dir /path/to/dir/containing/wavs

# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all

You should expect to hear intelligible (but noisy) speech by ~8k steps (~1.5h on a 2080 Ti).

Inference CLI

python -m dillwave.inference /path/to/model --spectrogram_path /path/to/spectrogram -o output.wav [--fast]

I plan to release a pretrained model if it turns out good enough! :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

dillwave-1.0.1-py3-none-any.whl (18.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page