Skip to main content

Fourier-based neural vocoder for high-quality audio synthesis

Project description

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Audio samples | Paper [abs] [pdf]

Installation

To use Vocos only in inference mode, install it using:

pip install vocos

If you wish to train the model, install it with additional dependencies:

pip install vocos[train]

Usage

Reconstruct audio from mel-spectrogram

import torch

from vocos import Vocos

vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")

mel = torch.randn(1, 100, 256)  # B, C, T

with torch.no_grad():
    audio = vocos.decode(mel)

Copy-synthesis from a file:

import torchaudio

y, sr = torchaudio.load(YOUR_AUDIO_FILE)
if y.size(0) > 1:  # mix to mono
    y = y.mean(dim=0, keepdim=True)
y = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)

with torch.no_grad():
    y_hat = vocos(y)

Reconstruct audio from EnCodec

Additionally, you need to provide a bandwidth_id which corresponds to the lookup embedding for bandwidth from the list: [1.5, 3.0, 6.0, 12.0].

vocos = Vocos.from_pretrained("charactr/vocos-encodec-24khz")

quantized_features = torch.randn(1, 128, 256)
bandwidth_id = torch.tensor([3])  # 12 kbps

with torch.no_grad():
    audio = vocos.decode(quantized_features, bandwidth_id=bandwidth_id)  

Copy-synthesis from a file: It extracts and quantizes features with EnCodec, then reconstructs them with Vocos in a single forward pass.

y, sr = torchaudio.load(YOUR_AUDIO_FILE)
if y.size(0) > 1:  # mix to mono
    y = y.mean(dim=0, keepdim=True)
y = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)

with torch.no_grad():
    y_hat = vocos(y, bandwidth_id=bandwidth_id)

Pre-trained models

The provided models were trained up to 2.5 million generator iterations, which resulted in slightly better objective scores compared to those reported in the paper.

Model Name Dataset Training Iterations Parameters
charactr/vocos-mel-24khz LibriTTS 2.5 M 13.5 M
charactr/vocos-encodec-24khz DNS Challenge 2.5 M 7.9 M

Training

Prepare a filelist of audio files for the training and validation set:

find $TRAIN_DATASET_DIR -name *.wav > filelist.train
find $VAL_DATASET_DIR -name *.wav > filelist.val

Fill a config file, e.g. vocos.yaml, with your filelist paths and start training with:

python train.py -c configs/vocos.yaml

Refer to Pytorch Lightning documentation for details about customizing the training pipeline.

Citation

If this code contributes to your research, please cite our work:

@article{siuzdak2023vocos,
  title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
  author={Siuzdak, Hubert},
  journal={arXiv preprint arXiv:2306.00814},
  year={2023}
}

License

The code in this repository is released under the MIT license as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocos-0.0.1.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

vocos-0.0.1-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file vocos-0.0.1.tar.gz.

File metadata

  • Download URL: vocos-0.0.1.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for vocos-0.0.1.tar.gz
Algorithm Hash digest
SHA256 81b5345d0c210bf6f43dd3be6afa8784a4b408d343bb4e364147fb4f7f4d2517
MD5 58b49c02302aae072007da79eddb2042
BLAKE2b-256 168f066eb380a2b28b9e5ded5741fa233d75c0c8403866addbaa73cfd6a7af05

See more details on using hashes here.

File details

Details for the file vocos-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: vocos-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for vocos-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f6b9ae852e9da165e2167a54aa0b0ae06c17b094ac0ed1b057a616841df6b8b8
MD5 6ca637ac718b8e54f83dd353a0eb3b62
BLAKE2b-256 b37f1e7e4cb3f73a7e823d3408f4febf142d4b13819b1511a9afa9ec89f791c0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page