Unit HiFi-GAN
Project description
Unit HiFi-GAN
Minimal re-implementation of HiFi-GAN training on discrete speech units.
The package is available on PyPI:
pip install unit-hifigan
It is based on the "Speech Resynthesis from Discrete Disentangled Self-Supervised Representations" repository, with a clean and minimal re-implementation for both training and inference. We follow their default hyperparameters and model configurations.
Usage
Vocoder
The vocoder is available via the UnitVocoder class. You can condition it on speakers and styles (support for f0 will come later).
Load a pretrained model from a local directory or a distant HuggingFace repository like this:
from unit_hifigan import UnitVocoder
vocoder = UnitVocoder.from_pretrained("coml/hubert-phoneme-classification", revision="vocoder-base-l11")
You can also load models from the legacy implementation or from textlesslib with UnitVocoder.from_legacy_pretrained:
import requests
from unit_hifigan import UnitVocoder
url = "https://dl.fbaipublicfiles.com/textless_nlp/expresso/checkpoints/hifigan_expresso_lj_vctk_hubert_base_ls960_L9_km500/"
vocoder = UnitVocoder.from_legacy_pretrained(url + "generator.pt")
vocoder.speakers = requests.get(url + "speakers.txt").text.splitlines()
vocoder.styles = requests.get(url + "styles.txt").text.splitlines()
You can then generate audio at 16kHz with UnitVocoder.generate as below:
import torch
from unit_hifigan import UnitVocoder
units, speaker, style = torch.randint(0, 500, (1, 100)), ["speaker-0"], ["reading"]
vocoder = UnitVocoder(500, speakers=["speaker-0", "speaker-1"], styles=["reading", "crying", "laughing"])
audio = vocoder.generate(units, speaker=speaker, style=style)
Training a Unit HiFi-GAN
Data preparation
For training, you need to have manifest files for the training and validation datasets.
The manifests are JSONL files with fields audio (string), units (list of integers), and optionally style (string) or speaker (string):
{"audio":"audio-1.wav","units":[24,24,173,289,289,441,487,370,370],"speaker":"spkr02"}
audio: full path to the audio fileunits: discrete units from the speech encoder (for example, HuBERT layer 11 and K-means K=500)speaker: name of the speaker (for speaker conditioning)style: name of the style (for style conditioning)
Training
Via the CLI:
# Minimal command with default configuration
python -m unit_hifigan.train --train $TRAIN_MANIFEST --val $VAL_MANIFEST --units $N_UNITS
# If you have a JSON config file
python -m unit_hifigan.train --config $CONFIG
You can also use the unit_hifigan.train.train function in your Python code if you prefer. Check out unit_hifigan.train.TrainConfig for the list of configuration options.
The pipeline supports DDP by default, when run with either torchrun or Slurm. Have a look at unit_hifigan.utils.init_distributed for how distributed training is initialized.
Synthesis
Via the CLI:
python -m unit_hifigan.inference $GENERATIONS $PRETRAINED_MODEL $INFERENCE_MANIFEST
where the inference manifest has the same format as the training and validation ones.
Evaluation
Whisper ASR
python -m unit_hifigan.wer.whisper $GENERATIONS $ASR_MANIFEST $JSONL_OUTPUT --model $MODEL
where $MODEL is the name of the Whisper variant (large-v3 if not provided).
The ASR manifest has the followings fields:
audio: relative path to the audio files from the $GENERATIONS directory.transcript: ground truth transcription
Wav2vec 2.0 ASR
python -m unit_hifigan.wer.torchaudio $GENERATIONS $ASR_MANIFEST $JSONL_OUTPUT --model $MODEL
where $MODEL is the name of the torchaudio ASR pipeline to use (by default WAV2VEC2_ASR_LARGE_LV60K_960H).
TODO
- Add MCD evaluation
- Add support for F0
Acknowledgements
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unit_hifigan-0.1.1.tar.gz.
File metadata
- Download URL: unit_hifigan-0.1.1.tar.gz
- Upload date:
- Size: 19.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e5e4cbd25a067a459ca978f52eea42c29e9d266d3a8c0003ccc84f3fa3cfbbb
|
|
| MD5 |
503aedcea99f8c025590f8725b9145bf
|
|
| BLAKE2b-256 |
cf797cb67832dd67aaf6f4adf1958e61a6b8d9d45b0c1bf1b14d30f77a275d57
|
File details
Details for the file unit_hifigan-0.1.1-py3-none-any.whl.
File metadata
- Download URL: unit_hifigan-0.1.1-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76316aa052f9792e994aa21e0fa07467ffb5615c4d6f28dc8e72ea8dfbea9816
|
|
| MD5 |
421aecf6b0c1a22a2514d738ed1bf191
|
|
| BLAKE2b-256 |
34ee7bbdcae3f83d213e46c3c2e8106d6862324621845705f6ba386a272e750e
|