Skip to main content

The DualCodec neural audio codec.

Project description

DualCodec

Installation

pip install dualcodec

News

  • 2025-01-22: I added training and finetuning instructions for DualCodec, as well as a gradio interface. Version is v0.3.0.
  • 2025-01-16: Finished releasing DualCodec inference codes, the version is v0.1.0. Latest versions are synced to pypi.

Available models

Model_ID Frame Rate RVQ Quantizers Semantic Codebook Size (RVQ-1 Size) Acoustic Codebook Size (RVQ-rest Size) Training Data
12hz_v1 12.5Hz Any from 1-8 (maximum 8) 16384 4096 100K hours Emilia
25hz_v1 25Hz Any from 1-12 (maximum 12) 16384 1024 100K hours Emilia

How to inference DualCodec

1. Download checkpoints to local:

# export HF_ENDPOINT=https://hf-mirror.com      # uncomment this to use huggingface mirror if you're in China
huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
huggingface-cli download amphion/dualcodec dualcodec_12hz_16384_4096.safetensors dualcodec_25hz_16384_1024.safetensors w2vbert2_mean_var_stats_emilia.pt --local-dir dualcodec_ckpts

The second command downloads the two DualCodec model (12hz_v1 and 25hz_v1) checkpoints and a w2v-bert-2 mean and variance statistics to the local directory dualcodec_ckpts.

2. Programmic usage:

import dualcodec

w2v_path = "./w2v-bert-2.0" # your downloaded path
dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
model_id = "12hz_v1" # select from available Model_IDs, "12hz_v1" or "25hz_v1"

dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")

# do inference for your wav
import torchaudio
audio, sr = torchaudio.load("YOUR_WAV.wav")
# resample to 24kHz
audio = torchaudio.functional.resample(audio, sr, 24000)
audio = audio.reshape(1,1,-1)
# extract codes, for example, using 8 quantizers here:
semantic_codes, acoustic_codes = inference.encode(audio, n_quantizers=8)
# semantic_codes shape: torch.Size([1, 1, T])
# acoustic_codes shape: torch.Size([1, n_quantizers-1, T])

# produce output audio
out_audio = dualcodec_model.decode_from_codes(semantic_codes, acoustic_codes)

# save output audio
torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)

See "example.ipynb" for a running example.

3. Gradio interface:

If you want to use the Gradio interface, you can run the following command:

python -m dualcodec.app

This will launch an app that allows you to upload a wav file and get the output wav file.

DualCodec-based TTS models

We're releasing DualCodec-based TTS models. Stay tuned!

Finetuning DualCodec

  1. Install other necessary components for training:
pip install "dualcodec[train]"
  1. Clone this repository and cd to the project root folder (the folder that contains this readme).

  2. Get discriminator checkpoints:

huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts
  1. To run example finetuning on Emilia German data (streaming, no need to download files. Need network access to Huggingface):
accelerate launch train.py --config-name=dualcodec_ft_12hzv1 \
trainer.batch_size=3 \
data.segment_speech.segment_length=24000

This finetunes a 12hz_v1 model with a training batch size of 3. (typically you need larger batch sizes like 10)

To finetune a 25Hz_V1 model:

accelerate launch train.py --config-name=dualcodec_ft_25hzv1 \
trainer.batch_size=3 \
data.segment_speech.segment_length=24000

Training DualCodec from scratch

  1. Install other necessary components for training:
pip install "dualcodec[train]"
  1. Clone this repository and cd to the project root folder (the folder that contains this readme).

  2. To run example training on example Emilia German data:

accelerate launch train.py --config-name=dualcodec_train \
model=dualcodec_12hz_16384_4096_8vq \
trainer.batch_size=3 \
data.segment_speech.segment_length=24000

This trains from scratch a v1_12hz model with a training batch size of 3. (typically you need larger batch sizes like 10)

To train a v1_25Hz model:

accelerate launch train.py --config-name=dualcodec_train \
model=dualcodec_25hz_16384_1024_12vq \
trainer.batch_size=3 \
data.segment_speech.segment_length=24000

Citation -->

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dualcodec-0.3.3.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dualcodec-0.3.3-py3-none-any.whl (51.7 kB view details)

Uploaded Python 3

File details

Details for the file dualcodec-0.3.3.tar.gz.

File metadata

  • Download URL: dualcodec-0.3.3.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dualcodec-0.3.3.tar.gz
Algorithm Hash digest
SHA256 7ba52253a70ed13ab8b2e8d6e91fcb6b3741a29ae794f96b78895ffee1e9b1d8
MD5 819fe6a5772650afc534c45bbb6acd2b
BLAKE2b-256 344539a628afbcbd22cdd3732784cfc02bbe4431c911a27dddb52c90839d052c

See more details on using hashes here.

File details

Details for the file dualcodec-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: dualcodec-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 51.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dualcodec-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 81090ef5846f11736ddcab764a2199f8c5745d4e01011763251e3310073a4dba
MD5 ed121dd9fa1cb7cae7df6ea08a10a0c5
BLAKE2b-256 479c3f98eef0db33a97f544f7083d7f5d2f8d383c3229d80d479d3dc4710656f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page