Skip to main content

The DualCodec neural audio codec.

Project description

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

About

Installation

pip install dualcodec

News

  • 2025-01-17: DualCodec inference code is released!

Available models

Model_ID Frame Rate RVQ Quantizers Semantic Codebook Size (RVQ-1 Size) Acoustic Codebook Size (RVQ-rest Size) Training Data
12hz_v1 12.5Hz Any from 1-8 (maximum 8) 16384 4096 100K hours Emilia
25hz_v1 25Hz Any from 1-12 (maximum 12) 16384 1024 100K hours Emilia

How to inference DualCodec

1. Download checkpoints to local:

# export HF_ENDPOINT=https://hf-mirror.com      # uncomment this to use huggingface mirror if you're in China
huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts

2. To inference an audio in a python script:

import dualcodec

w2v_path = "./w2v-bert-2.0" # your downloaded path
dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
model_id = "12hz_v1" # select from available Model_IDs, "12hz_v1" or "25hz_v1"

dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")

# do inference for your wav
import torchaudio
audio, sr = torchaudio.load("YOUR_WAV.wav")
# resample to 24kHz
audio = torchaudio.functional.resample(audio, sr, 24000)
audio = audio.reshape(1,1,-1)
# extract codes, for example, using 8 quantizers here:
semantic_codes, acoustic_codes = inference.encode(audio, n_quantizers=8)
# semantic_codes shape: torch.Size([1, 1, T])
# acoustic_codes shape: torch.Size([1, n_quantizers-1, T])

# produce output audio
out_audio = dualcodec_model.decode_from_codes(semantic_codes, acoustic_codes)

# save output audio
torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)

See "example.ipynb" for a running example.

DualCodec-based TTS models

DualCodec-based TTS

Benchmark results

DualCodec audio quality

DualCodec-based TTS

Training DualCodec

Stay tuned for the training code release!

Citation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dualcodec-0.2.1.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dualcodec-0.2.1-py3-none-any.whl (34.4 kB view details)

Uploaded Python 3

File details

Details for the file dualcodec-0.2.1.tar.gz.

File metadata

  • Download URL: dualcodec-0.2.1.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dualcodec-0.2.1.tar.gz
Algorithm Hash digest
SHA256 479b90cf718869f761454c5f760c001354e933486e82c9962c7e9739b01b1f2c
MD5 1dc891bba47167045eaf41c90d7e7217
BLAKE2b-256 18354b15ffbc005b07b17d973a6549f3dbc724ca66dc487333fe98da1b82d432

See more details on using hashes here.

File details

Details for the file dualcodec-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: dualcodec-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dualcodec-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8aac88633180f3c802a3d3b931a752c36728094a59040402d2b4ce9e4c58a15e
MD5 ae0121a0ec39e0a9f6cac13d3b3c69ee
BLAKE2b-256 6f4413fc18366b0dd1c5f137b0f7fc76292854ca829fb7d06172fbb6969cdcd0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page