Project description

DillWave

DillWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with Gaussian noise and converts it into speech via iterative refinement. The speech can be controlled by providing a conditioning signal (e.g. log-scaled Mel spectrogram). The model and architecture details are described in DiffWave: A Versatile Diffusion Model for Audio Synthesis.

Credit to the original repo here.

Install

(First install Pytorch, GPU version recommended!)

As a package:

pip install dillwave

From GitHub:

git clone https://github.com/dillfrescott/dillwave
pip install -e dillwave

pip install git+https://github.com/dillfrescott/dillwave

You need Git installed for either of these "From GitHub" install methods to work.

Training

python -m dillwave.preprocess /path/to/dir/containing/wavs # 48000hz, 1 channel
python -m dillwave /path/to/model/dir /path/to/dir/containing/wavs

# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all

You should expect to hear intelligible (but noisy) speech by ~8k steps (~1.5h on a 2080 Ti).

Inference CLI

python -m dillwave.inference /path/to/model --spectrogram_path /path/to/spectrogram -o output.wav [--fast]

I plan to release a pretrained model if it turns out good enough! :)

Project details

These details have not been verified by PyPI

Project links

Homepage

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.1

Mar 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

dillwave-1.0.1-py3-none-any.whl (18.0 kB view hashes)

Uploaded Mar 26, 2024 Python 3

Hashes for dillwave-1.0.1-py3-none-any.whl

Hashes for dillwave-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f18d6434e330ff63136b44772827f643fcd0e4b509363f767ab30a2d9d3f78b`
MD5	`aec363f3c7267d1ea1f684d03a79647c`
BLAKE2b-256	`3618b23778e8e88261c3f9ff6a9688cdd8715a873de99988b65d7312dea218d3`