A PyTorch implementation of Towards Achieving Robust Universal Neural Vocoding.
Project description
Towards Achieving Robust Universal Neural Vocoding
A PyTorch implementation of Towards Achieving Robust Universal Neural Vocoding. Audio samples can be found here.
Fig 1:Architecture of the vocoder.
Quick Start
Ensure you have Python 3.6 and PyTorch 1.7 or greater installed. Then install the package with:
pip install univoc
Example Usage
import torch
import soundfile as sf
from univoc import Vocoder
# download pretrained weights (and optionally move to GPU)
vocoder = Vocoder.from_pretrained(
"https://github.com/bshall/UniversalVocoding/releases/download/v0.2/univoc-ljspeech-7mtpaq.pt"
).cuda()
# load log-Mel spectrogram from file or tts
mel = ...
# generate waveform
with torch.no_grad():
wav, sr = vocoder.generate(mel)
# save output
sf.write("path/to/save.wav", wav, sr)
Train from Scratch
- Clone the repo:
git clone https://github.com/bshall/UniversalVocoding
cd ./UniversalVocoding
- Install requirements:
pip install -r requirements.txt
- Download and extract the LJ-Speech dataset:
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xvjf LJSpeech-1.1.tar.bz2
- Download the train split here and extract it in the root directory of the repo.
- Extract Mel spectrograms and preprocess audio:
python preprocess.py in_dir=path/to/LJSpeech-1.1 out_dir=datasets/LJSpeech-1.1
- Train the model:
python train.py checkpoint_dir=ljspeech dataset_dir=datasets/LJSpeech-1.1
Pretrained Models
Pretrained weights for the 10-bit LJ-Speech model are available here.
Notable Differences from the Paper
- Trained on 16kHz audio from a single speaker. For an older version trained on 102 different speakers form the ZeroSpeech 2019: TTS without T English dataset click here.
- Uses an embedding layer instead of one-hot encoding.
Acknowlegements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
univoc-0.2.1.tar.gz
(6.4 kB
view hashes)
Built Distribution
univoc-0.2.1-py3-none-any.whl
(6.5 kB
view hashes)