Parallel WaveGAN implementation
Project description
Parallel WaveGAN implementation with Pytorch
This repository provides UNOFFICIAL Parallel WaveGAN implementation with Pytorch.
You can check our samples in our demo HP!
The goal of this repository is to provide the real-time neural vocoder which is compatible with ESPnet-TTS.
Source of the figure: https://arxiv.org/pdf/1910.11480.pdf
Requirements
This repository is tested on Ubuntu 16.04 with a GPU Titan V.
- Python 3.6+
- Cuda 10.0
- CuDNN 7+
- NCCL 2+ (for distributed multi-gpu training)
Different cuda version should be working but not explicitly tested.
All of the codes are tested on Pytorch 1.0.1, 1.1, 1.2, 1.3 and 1.3.1.
Setup
You can select the installation method from two alternatives.
A. Use pip
$ git clone https://github.com/kan-bayashi/ParallelWaveGAN.git
$ cd ParallelWaveGAN
$ pip install -e .
# If you want to use distributed training, please install
# apex manually by following https://github.com/NVIDIA/apex
$ ...
Note that your cuda version must be exactly matched with the version used for pytorch binary to install apex.
To install pytorch compiled with different cuda version, see tools/Makefile
.
B. Make virtualenv
$ git clone https://github.com/kan-bayashi/ParallelWaveGAN.git
$ cd ParallelWaveGAN/tools
$ make
# If you want to use distributed training, please run following
# command to install apex.
$ make apex
Run
This repository provides Kaldi-style recipes, as the same as ESPnet.
Currently, four recipes are supported.
- CMU Arctic: English speakers
- LJSpeech: English female speaker
- JSUT: Japanese female speaker
- CSMSC: Mandarin female speaker
To run the recipe, please follow the below instruction.
# Let us move on the recipe directory
$ cd egs/ljspeech/voc1
# Run the recipe from scratch
$ ./run.sh
# You can change config via command line
$ ./run.sh --conf <your_customized_yaml_config>
# You can select the stage to start and stop
$ ./run.sh --stage 2 --stop_stage 2
# If you want to specify the gpu
$ CUDA_VISIBLE_DEVICES=1 ./run.sh --stage 2
The integration with job schedulers such as slurm can be done via cmd.sh
and conf/slurm.conf
.
If you want to use it, please check this page.
All of the hyperparameters is written in a single yaml format configuration file.
Please check this example in ljspeech recipe.
The training requires ~3 days with a single GPU (TITAN V).
The speed of the training is 0.5 seconds per an iteration, in total ~ 200000 sec (= 2.31 days).
You can monitor the training progress via tensorboard.
$ tensorboard --logdir exp
If you want to accelerate the training, you can try distributed multi-gpu training based on apex.
You need to install apex for distributed training. Please make sure you already installed it.
Then you can run distributed multi-gpu training via following command:
# in the case of the number of gpus = 8
$ CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" ./run.sh --stage 2 --n_gpus 8
In the case of distributed training, batch size will be automatically multiplied by the number of gpus.
Please be careful.
The decoding speed is RTF = 0.016 with TITAN V, much faster than the real-time.
[decode]: 100%|██████████| 250/250 [00:30<00:00, 8.31it/s, RTF=0.0156]
2019-11-03 09:07:40,480 (decode:127) INFO: finished generation of 250 utterances (RTF = 0.016).
Even on the CPU (Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz 16 threads), it can generate less than the real-time.
[decode]: 100%|██████████| 250/250 [22:16<00:00, 5.35s/it, RTF=0.841]
2019-11-06 09:04:56,697 (decode:129) INFO: finished generation of 250 utterances (RTF = 0.734).
Results
You can listen to the samples and download pretrained models at our google drive.
The training is still on going. Please check the latest progress at https://github.com/kan-bayashi/ParallelWaveGAN/issues/1.
References
Acknowledgement
The author would like to thank Ryuichi Yamamoto (@r9y9) for his great repository, paper and valuable discussions.
Author
Tomoki Hayashi (@kan-bayashi)
E-mail: hayashi.tomoki<at>g.sp.m.is.nagoya-u.ac.jp
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file parallel_wavegan-0.2.4.post2.tar.gz
.
File metadata
- Download URL: parallel_wavegan-0.2.4.post2.tar.gz
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5de11a4e55a4456bebd85561cc1ba573efdd8b7f21123d8e13530a61211cfdd5 |
|
MD5 | 500cb2445b4b5ad5bbd558c7c1a9c53e |
|
BLAKE2b-256 | 8d82af68f104054cbe5638a12b69cf6295dd3411287e4381fc8581a6c183aa97 |