Skip to main content

Unit HiFi-GAN

Project description

Unit HiFi-GAN

Minimal re-implementation of HiFi-GAN training on discrete speech units.

The package is available on PyPI:

pip install unit-hifigan

It is based on the "Speech Resynthesis from Discrete Disentangled Self-Supervised Representations" repository, with a clean and minimal re-implementation for both training and inference. We follow their default hyperparameters and model configurations.

Usage

Vocoder

The vocoder is available via the UnitVocoder class. You can condition it on speakers and styles (support for f0 will come later).

Load a pretrained model from a local directory or a distant HuggingFace repository like this:

from unit_hifigan import UnitVocoder

vocoder = UnitVocoder.from_pretrained("coml/hubert-phoneme-classification", revision="vocoder-base-l11")

You can also load models from the legacy implementation or from textlesslib with UnitVocoder.from_legacy_pretrained:

import requests
from unit_hifigan import UnitVocoder

url = "https://dl.fbaipublicfiles.com/textless_nlp/expresso/checkpoints/hifigan_expresso_lj_vctk_hubert_base_ls960_L9_km500/"
vocoder = UnitVocoder.from_legacy_pretrained(url + "generator.pt")
vocoder.speakers = requests.get(url + "speakers.txt").text.splitlines()
vocoder.styles = requests.get(url + "styles.txt").text.splitlines()

You can then generate audio at 16kHz with UnitVocoder.generate as below:

import torch
from unit_hifigan import UnitVocoder

units, speaker, style = torch.randint(0, 500, (1, 100)), ["speaker-0"], ["reading"]
vocoder = UnitVocoder(500, speakers=["speaker-0", "speaker-1"], styles=["reading", "crying", "laughing"])
audio = vocoder.generate(units, speaker=speaker, style=style)

Training a Unit HiFi-GAN

Data preparation

For training, you need to have manifest files for the training and validation datasets. The manifests are JSONL files with fields audio (string), units (list of integers), and optionally style (string) or speaker (string):

{"audio":"audio-1.wav","units":[24,24,173,289,289,441,487,370,370],"speaker":"spkr02"}
  • audio: full path to the audio file
  • units: discrete units from the speech encoder (for example, HuBERT layer 11 and K-means K=500)
  • speaker: name of the speaker (for speaker conditioning)
  • style: name of the style (for style conditioning)

Training

Via the CLI:

# Minimal command with default configuration
python -m unit_hifigan.train --train $TRAIN_MANIFEST --val $VAL_MANIFEST --units $N_UNITS

# If you have a JSON config file
python -m unit_hifigan.train --config $CONFIG

You can also use the unit_hifigan.train.train function in your Python code if you prefer. Check out unit_hifigan.train.TrainConfig for the list of configuration options.

The pipeline supports DDP by default, when run with either torchrun or Slurm. Have a look at unit_hifigan.utils.init_distributed for how distributed training is initialized.

Synthesis

Via the CLI:

python -m unit_hifigan.inference $GENERATIONS $PRETRAINED_MODEL $INFERENCE_MANIFEST

where the inference manifest has the same format as the training and validation ones.

Evaluation

Whisper ASR

python -m unit_hifigan.wer.whisper $GENERATIONS $ASR_MANIFEST $JSONL_OUTPUT --model $MODEL

where $MODEL is the name of the Whisper variant (large-v3 if not provided). The ASR manifest has the followings fields:

  • audio: relative path to the audio files from the $GENERATIONS directory.
  • transcript: ground truth transcription

Wav2vec 2.0 ASR

python -m unit_hifigan.wer.torchaudio $GENERATIONS $ASR_MANIFEST $JSONL_OUTPUT --model $MODEL

where $MODEL is the name of the torchaudio ASR pipeline to use (by default WAV2VEC2_ASR_LARGE_LV60K_960H).

TODO

  • Add MCD evaluation
  • Add support for F0

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unit_hifigan-0.1.2.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unit_hifigan-0.1.2-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file unit_hifigan-0.1.2.tar.gz.

File metadata

  • Download URL: unit_hifigan-0.1.2.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for unit_hifigan-0.1.2.tar.gz
Algorithm Hash digest
SHA256 89e3b31ebc47b3421393dab2537ba1b9701d7a44bf55c0dec1bfc94246dac957
MD5 f7c6dff753070c7f0350c7de235c6b3c
BLAKE2b-256 525fb526029940e58f90e072c370abfa2da73f99b920eca473c2be492bbecef7

See more details on using hashes here.

File details

Details for the file unit_hifigan-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: unit_hifigan-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for unit_hifigan-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f6e4ff5e985404cdd62d761fdcc1b8f7c5660ddbb68db94efe4993f03b48e605
MD5 4fe83de57ed5767af03a4f0556dac7c9
BLAKE2b-256 c24dfb36939fa4569f30c99c8a9f14d82deaa5e31a2352b49adb7182847bb90b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page