Skip to main content

Non-parallel Voice Conversion called crank

Project description

crank

Non-parallel voice conversion based on vector-quantized variational autoencoder with adversarial learning

Setup

  • Install Python dependency
$ git clone https://github.com/k2kobayashi/crank.git
$ cd crank/tools
$ make
  • install dependency for mosnet
$ sudo apt install ffmpeg   # mosnet dependency

Recipes

  • English
  • Japanese
    • jsv_ver1

Conversion samples

You can access several converted audio samples of VCC 2018 dataset in the URL.

Run VCC2020 recipe

crank has prepared recipe for Voice Conversion Challenge 2020. In crank recipe, there are 7 stages to implement non-parallel voice conversion.

  • stage 0
    • download dataset
  • stage 1
    • initialization
      • generate scp files and figures to be determine speaker-dependent parameters
  • stage 2
    • feature extraction
      • extract mlfb and mcep features
  • stage 3
    • training
  • stage 4
    • reconstuction
      • generate reconstructed feature for fine-tuning of neural vocoder
  • stage 5
    • evaluation
      • convert evaluation waveform
  • stage 6
    • synthesis
      • synthesis waveform by pre-trained ParallelWaveGAN
      • synthesis waveform by GriffinLim
  • stage 7
    • objective evalution
      • mel-cepstrum distortion
      • mosnet

Put dataset to downloads

Note that dataset is only released for the participants (2020/05/26).

$ cd egs/vaevc/vcc2020v1
$ mkdir downloads && cd downloads
$ mv <path_to_zip>/vcc2020_{training,evaluation}.zip downloads
$ unzip vcc2020_training.zip
$ unzip vcc2020_evaluation.zip

Run feature extraction and model training

Because the challenge defines its training and evaluation set, we have initially put configuration files. So, you need to run from 2nd stage.

$ ./run.sh --n_jobs 10 --stage 2 --stop_stage 5

where the n_jobs indicates the number of CPU cores used in the training.

Configuration

Configurations are defined in conf/mlfb_vqvae.yml. Followings are explanation of representative parameters.

  • feature

When you create your own recipe, be carefull to set parameters for feature extraction such as fs, fftl, hop_size, framems, shiftms, and mcep_alpha. These parameters depend on sampling frequency.

  • feat_type

You can choose feat_type either mlfb or mcep. If you choose mlfb, the converted waveforms are generated by either GllifinLim vocoder or ParallelWaveGAN vocoder. If you choose mcep, the converted waveforms are generated by world vocoder (i.e., excitation generation and MLSA filtering).

  • trainer_type

We support training with vqvae, lsgan, cyclegan, and stargan using same generator network.

  • vqvae: default vqvae setting
  • lsgan: vqvae with adversarial learning
  • cyclegan: vqvae with adevesarial learning and cyclic constraints
  • stargan: vqvae with adevesarial learning similar to cyclegan

Create your recipe

Copy recipe template

Please copy template directory to start creation of your recipe.

$ cp -r egs/vaevc/template egs/vaevc/<new_recipe>
$ cd egs/vaevc/<new_recipe>

Put .wav files

You need to put wav files appropriate directory. You can choose either modifying download.sh or putting wav files. In either case, the wav files should be located in each speaker like following <new_recipe>/downloads/wav/{spkr1, spkr2, ..., spkr3}/*.wav.

If you modify downaload.sh,

$ vim local/download.sh

If you put wav files,

$ mkdir downloads
$ mv <path_to_your_wav_directory> downloads/wav
$ touch downloads/.done

Run initialization

The initialization process generates kaldi-like scp files.

$ ./run.sh --stage 0 --stop_stage 1

Then you modify speaker-dependent parameters in conf/spkr.yml using generated figures. Page 20~22 in slide help you how to set these parameters.

Run feature extraction, train, reconstruction, and evaluation

After preparing configuration, you run it.

$ ./run.sh --stage 2 --stop_stage 7

Citation

Please cite this paper when you use crank.

K. Kobayashi, W-C. Huang, Y-C. Wu, P.L. Tobing, T. Hayashi, T. Toda,
"crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder",
Proc. ICASSP, 2021. (accepted)

Achknowledgements

Thank you @kan-bayashi for lots of contributions and encouragement helps.

Who we are

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crank-vc-0.4.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

crank_vc-0.4.1-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file crank-vc-0.4.1.tar.gz.

File metadata

  • Download URL: crank-vc-0.4.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.3.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for crank-vc-0.4.1.tar.gz
Algorithm Hash digest
SHA256 fd1e299b365b06a79c8e49629a5e09427777b5ab767126d3da6d4536937e338f
MD5 217b9e7a619152444073bcf64b4bd0f6
BLAKE2b-256 6eac109a3f33d3f2e11e671c58f9b002a8f1a2bdc6edce6efc44eddb7accd483

See more details on using hashes here.

File details

Details for the file crank_vc-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: crank_vc-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 4.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.3.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for crank_vc-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f6858e183d354e03c054de1fb72790c29c4e7120a5b0679a058548bb40b04471
MD5 a6a2a7ea1d40e39af5b40ee6db283dcc
BLAKE2b-256 6ecdfdb17e908815f940b4f7f2cf57a51b9eb0e053c64f0d2e71df9922387b81

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page