Quasi-Periodic Parallel WaveGAN implementation
Project description
Quasi-Periodic Parallel WaveGAN (QPPWG)
This is official QPPWG PyTorch implementation. QPPWG is a non-autoregressive neural speech generation model developed based on PWG and a QP structure.
In this repo, we provide an example to train and test QPPWG as a vocoder for WORLD acoustic features. More details can be found on our Demo page.
News
- 2020/5/20 release the first version.
Requirements
This repository is tested on Ubuntu 16.04 with a GPU Titan V.
- Python 3.6+
- Cuda 10.0
- CuDNN 7+
- PyTorch 1.0.1+
Environment setup
The code works with both anaconda and virtualenv. The following example uses anaconda.
$ conda create -n venvQPPWG python=3.6
$ source activate venvQPPWG
$ git clone https://github.com/bigpon/QPPWG.git
$ cd QPPWG
$ pip install -e .
More details can refer to the PWG repo.
Folder architecture
- egs the folder for projects.
- egs/vcc18 the folder of the VCC2018 project.
- egs/vcc18/exp the folder for trained models.
- egs/vcc18/conf the folder for configs.
- egs/vcc18/data the folder for corpus related files (wav, feature, list ...).
- qppwg the folder of the source codes.
Run
Corpus and path setup
- Modify the corresponding CUDA paths in
egs/vcc18/run.py
. - Download the Voice Conversion Challenge 2018 (VCC2018) corpus to run the QPPWG example
$ cd egs/vcc18
# Download training and validation corpus
$ wget -o train.log -O data/wav/train.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_training.zip
# Download evaluation corpus
$ wget -o eval.log -O data/wav/eval.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_evaluation.zip
# unzip corpus
$ unzip data/wav/train.zip -d data/wav/
$ unzip data/wav/eval.zip -d data/wav/
- Training wav lists:
data/scp/vcc18_train_22kHz.scp
. - Validation wav lists:
data/scp/vcc18_valid_22kHz.scp
. - Testing wav list:
data/scp/vcc18_eval_22kHz.scp
.
Preprocessing
# Extract WORLD acoustic features and statistics of training and testing data
$ bash run.sh --stage 0 --config PWG_30
- WORLD-related settings can be changed in
egs/vcc18/conf/vcc18.PWG_30.yaml
. - If you want to extract other corpus, please create a corresponding config and a file including power thresholds and f0 ranges like
egs/vcc18/data/pow_f0_dict.yml
. - More details about feature extraction can refer to the QPNet repo.
- The lists of auxiliary features will be automatically generated.
- Training aux lists:
data/scp/vcc18_train_22kHz.list
. - Validation aux lists:
data/scp/vcc18_valid_22kHz.list
. - Testing aux list:
data/scp/vcc18_eval_22kHz.list
.
QPPWG/PWG training
# Training a QPPWG model with the 'QPPWGaf_20' config and the 'vcc18_train_22kHz' and 'vcc18_valid_22kHz' sets.
$ bash run.sh --gpu 0 --stage 1 --conf QPPWGaf_20 \
--trainset vcc18_train_22kHz --validset vcc18_valid_22kHz
- The gpu ID can be set by --gpu GPU_ID (default: 0)
- The model architecture can be set by --conf CONFIG (default: PWG_30)
- The trained model resume can be set by --resume NUM (default: None)
QPPWG/PWG testing
# QPPWG/PWG decoding w/ natural acoustic features
$ bash run.sh --gpu 0 --stage 2 --conf QPPWGaf_20 \
--iter 400000 --trainset vcc18_train_22kHz --evalset vcc18_eval_22kHz
# QPPWG/PWG decoding w/ scaled f0 (ex: halved f0).
$ bash run.sh --gpu 0 --stage 3 --conf QPPWGaf_20 --scaled 0.50 \
--iter 400000 --trainset vcc18_train_22kHz --evalset vcc18_eval_22kHz
Monitor training progress
$ tensorboard --logdir exp
- The RTF of PWG_30 decoding with a TITAN V is 0.016.
- The RTF of PWG_20 decoding with a TITAN V is 0.011.
- The RTF of QPPWGaf_20 decoding with a TITAN V is 0.018.
- The training time of PWG_30 with a TITAN V is around 3 days.
- The training time of QPPWGaf_20 with a TITAN V is around 5 days.
Results
[TODO] We will release the pre-trained models and all generated samples around June 2020.
References
The QPPWG repository is developed based on the following repositories and paper.
Citation
If you find the code is helpful, please cite the following article.
@article{wu2020qppwg,
title={Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation},
author={Wu, Yi-Chiao and Hayashi, Tomoki and Okamoto, Takuma and Kawai, Hisashi and Toda, Tomoki},
journal={arXiv preprint arXiv:2005.08654},
year={2020}
}
Authors
Development:
Yi-Chiao Wu @ Nagoya University (@bigpon)
E-mail: yichiao.wu@g.sp.m.is.nagoya-u.ac.jp
Advisor:
Tomoki Toda @ Nagoya University
E-mail: tomoki@icts.nagoya-u.ac.jp
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.