A PyTorch implementation of Towards Achieving Robust Universal Neural Vocoding.
Project description
Towards Achieving Robust Universal Neural Vocoding
A PyTorch implementation of Towards Achieving Robust Universal Neural Vocoding. Audio samples can be found here.
Fig 1:Architecture of the vocoder.
Quick Start
Ensure you have Python 3.6 and PyTorch 1.7 or greater installed. Then install the package with:
pip install univoc
Example Usage
import torch
import soundfile as sf
from univoc import Vocoder
# download pretrained weights (and optionally move to GPU)
vocoder = Vocoder.from_pretrained(
"https://github.com/bshall/UniversalVocoding/releases/download/v0.2/univoc-ljspeech-7mtpaq.pt"
).cuda()
# load log-Mel spectrogram from file or tts
mel = ...
# generate waveform
with torch.no_grad():
wav, sr = vocoder.generate(mel)
# save output
sf.write("path/to/save.wav", wav, sr)
Train from Scratch
- Clone the repo:
git clone https://github.com/bshall/UniversalVocoding
cd ./UniversalVocoding
- Install requirements:
pip install -r requirements.txt
- Download and extract the LJ-Speech dataset:
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xvjf LJSpeech-1.1.tar.bz2
- Download the train split here and extract it in the root directory of the repo.
- Extract Mel spectrograms and preprocess audio:
python preprocess.py in_dir=path/to/LJSpeech-1.1 out_dir=datasets/LJSpeech-1.1
- Train the model:
python train.py checkpoint_dir=ljspeech dataset_dir=datasets/LJSpeech-1.1
Pretrained Models
Pretrained weights for the 10-bit LJ-Speech model are available here.
Notable Differences from the Paper
- Trained on 16kHz audio from a single speaker. For an older version trained on 102 different speakers form the ZeroSpeech 2019: TTS without T English dataset click here.
- Uses an embedding layer instead of one-hot encoding.
Acknowlegements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
univoc-0.2.1.tar.gz
(6.4 kB
view details)
Built Distribution
File details
Details for the file univoc-0.2.1.tar.gz
.
File metadata
- Download URL: univoc-0.2.1.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41895f808d58ba31f61c954c2f1c8a251c1f057fe2ac48d05227fe8518117c4e |
|
MD5 | 0e340c32375f599f6f5a7e875ecc3439 |
|
BLAKE2b-256 | ddd3a8e8a655dfed558bc45c45320dbde6917eb2041b43c9aa948988b27dce3e |
File details
Details for the file univoc-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: univoc-0.2.1-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b4749f396f2ad32616b5bb2e2ffdf84593dd63093fa115e492b7f604a5ba322 |
|
MD5 | 515b8e568c18eceddda5e78ece4c71d4 |
|
BLAKE2b-256 | 36ec116f007d024b56484083cbaf3b659767ffa9cc42d279f27ca8c07deda6e3 |