Skip to main content

PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping

Project description

PhaseAug

PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping
Junhyeok Lee, Seungu Han, Hyunjae Cho, Wonbin Jung @ MINDsLab Inc., SNU, KAIST

arXiv GitHub Repo stars githubio

Abstract : Previous generative adversarial network (GAN)-based neural vocoders are trained to reconstruct the exact ground truth waveform from the paired mel-spectrogram and do not consider the one-to-many relationship of speech synthesis. This conventional training causes overfitting for both the discriminators and the generator, leading to the periodicity artifacts in the generated audio signal. In this work, we present PhaseAug, the first differentiable augmentation for speech synthesis that rotates the phase of each frequency bin to simulate one-to-many mapping. With our proposed method, we outperform baselines without any architecture modification. Code and audio samples will be available at https://github.com/mindslab-ai/phaseaug.

Accepted to ICASSP 2023

phasor

TODO

  • PyTorch 2.0 is released, need to modify STFT and iSTFT for complex support (solved at 1.0.0)
  • Arxiv updated
  • Errata in paper will be fixed. Section 2.5 in paper, transition band half-width 0.06-> 0.012.
  • Section 2.5, mention about multiplyinng rotation matrix to "the left side of F(x)" will be added. -> transpose m,k to reduce ambiguity
  • Upload PhaseAug to pypi.
  • Upload VITS+PhaseAug sampels at demo page.
  • Refactoring codes for packaging.

Use PhaseAug

  • Install alias-free-torch==0.0.6 and phaseaug
pip install alias-free-torch==0.0.6 phaseaug 
  • Insert PhaseAug in your code, check train.py as a example.
from phaseaug.phaseaug import PhaseAug
...
# define phaseaug
aug = PhaseAug()
...
# discriminator update phase
aug_y, aug_y_g = aug.forward_sync(y, y_g_hat.detach())
y_df_hat_r, y_df_hat_g, _, _ = mpd(aug_y, aug_y_g)
y_ds_hat_r, y_ds_hat_g, _, _ = msd(aug_y, aug_y_g)
...
# generator update phase
aug_y, aug_y_g = aug.forward_sync(y, y_g_hat)
y_df_hat_r, y_df_hat_g, fmap_f_r, fmap_f_g = mpd(aug_y, aug_y_g)
y_ds_hat_r, y_ds_hat_g, fmap_s_r, fmap_s_g = msd(aug_y, aug_y_g)
  • If you are applying torch.cuda.amp.autocast like VITS, you need to wrap PhaseAug with autocast(enabled=False) to prevent ComplexHalf issue.
from torch.cuda.amp import autocast
...
with autocast(enabled=True)
    # wrapping PhaseAug with autocast(enabled=False)
    with autocast(enabled=False)
        aug_y, aug_y_g = aug.forward_sync(y, y_g_hat)
    # usually net_d parts are inside of autocast(enabled=True)
    y_df_hat_r, y_df_hat_g, fmap_f_r, fmap_f_g = net_d(aug_y, aug_y_g)

Requirements

docker build -t=phaseaug --build-arg USER_ID=$(id -u) --build-arg GROUP_ID=$(id -g) --build-arg USER_NAME=$USER

Training

  1. Clone this repository and copy python files to hifi-gan folder
git clone --recursive https://github.com/mindslab-ai/phaseaug
cp ./phaseaug/*.py ./phaseaug/hifi-gan/
cd ./phaseaug/hifi-gan
  • optional: MelGAN generator
cp ./phaseaug/config_v1_melgan.json ./phaseaug/hifi-gan/
  • Change generator to MelGAN generator at train.py
# import MelGanGenerator as Generator at [train.py](./train.py)
#from models import Generator # remove original import Generator
from models import MelGANGenerator as Generator
  1. Modify dataset path at train.py
     parser.add_argument('--input_wavs_dir',
                         default='path/LJSpeech-1.1/wavs_22k')
     parser.add_argument('--input_mels_dir',
                         default='path/LJSpeech-1.1/wavs_22k')
  1. Run train.py
python train.py --config config_v1.json --aug --filter --data_ratio {0.01/0.1/1.} --name phaseaug_hifigan
python train.py --config config_v1_melgan.json --aug --filter --data_ratio {0.01/0.1/1.} --name phaseaug_melgan

References

This implementation uses code from following repositories:

This README and the webpage for the audio samples are inspired by:

Citation & Contact

If this repostory useful for yout research, please consider citing!

@inproceedings{phaseaug,
  author={Lee, Junhyeok and Han, Seungu and Cho, Hyunjae and Jung, Wonbin},
  title={{PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping}},
  journal = {arXiv preprint arXiv:2211.04610},
  year=2022,
}

Bibtex will be updated after ICASSP 2023.

If you have a question or any kind of inquiries, please contact Junhyeok Lee at jun3518@icloud.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phaseaug-1.0.1.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

phaseaug-1.0.1-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file phaseaug-1.0.1.tar.gz.

File metadata

  • Download URL: phaseaug-1.0.1.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for phaseaug-1.0.1.tar.gz
Algorithm Hash digest
SHA256 721584fb53eab0c38ccaf607f44bc73e523f1ae8b2e3653cf6455efd3deb3371
MD5 9875360f4273029382984e446fbf4d79
BLAKE2b-256 9e25232eb1ec752801f014577c19321862c231bf0ac5fdc4acda70ae3f9a04e3

See more details on using hashes here.

File details

Details for the file phaseaug-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: phaseaug-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for phaseaug-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 888b91e4d2eb7e5bc309a0d9eb2de132cd8dee4631535337f90bc5a0144f2dd2
MD5 a6386812af89680cce63917acfbd2fb2
BLAKE2b-256 d5c4ee8fe3e21adc88b475f42271f650152f334bddd79bbe48fc4ed00d85d75c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page