Skip to main content

The DualCodec neural audio codec.

Project description

DualCodec: A Speech Generation-Oriented Neural Audio Codec with Dual Encoding of Waveform and Self-Supervised Feature

Installation

pip install dualcodec

Available models

Model_ID Frame Rate RVQ Quantizers Semantic Codebook Size (RVQ-1 Size) Acoustic Codebook Size (RVQ-rest Size) Training Data
12hz_v1 12.5Hz Any from 1-8 (maximum 8) 16384 4096 100K hours Emilia
25hz_v1 25Hz Any from 1-12 (maximum 12) 16384 1024 100K hours Emilia

How to inference DualCodec

1. Download checkpoints to local:

# export HF_ENDPOINT=https://hf-mirror.com      # uncomment this to use huggingface mirror if you're in China
huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts

2. To inference an audio in a python script:

import dualcodec

w2v_path = "./w2v-bert-2.0" # your downloaded path
dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
model_id = "12hz_v1" # or "25hz_v1"

dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")

# do inference for your wav
import torchaudio
audio, sr = torchaudio.load("YOUR_WAV.wav")
# resample to 24kHz
audio = torchaudio.functional.resample(audio, sr, 24000)
audio = audio.reshape(1,1,-1)
# extract codes, for example, using 8 quantizers here:
semantic_codes, acoustic_codes = inference.encode(audio, n_quantizers=8)
# semantic_codes shape: torch.Size([1, 1, T])
# acoustic_codes shape: torch.Size([1, n_quantizers-1, T])

# produce output audio
out_audio = dualcodec_model.decode_from_codes(semantic_codes, acoustic_codes)

# save output audio
torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)

See "example.ipynb" for a running example.

DualCodec-based TTS models

Benchmarking

Link to DualCodec-based TTS repositories

Training DualCodec

Stay tuned for the training code release! Should be within two weeks.

Citation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dualcodec-0.1.2a2.tar.gz (7.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dualcodec-0.1.2a2-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file dualcodec-0.1.2a2.tar.gz.

File metadata

  • Download URL: dualcodec-0.1.2a2.tar.gz
  • Upload date:
  • Size: 7.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dualcodec-0.1.2a2.tar.gz
Algorithm Hash digest
SHA256 bc50d0bbb64c95572c510ddfd7e5a8e6523dc2c35326c27a5c81420300ac0523
MD5 28eca335cab204f4f1b873b8bfedf532
BLAKE2b-256 7e44389002e74d907895bcbca34c9d587cb020c215f2110a542c2b464c803f1e

See more details on using hashes here.

File details

Details for the file dualcodec-0.1.2a2-py3-none-any.whl.

File metadata

  • Download URL: dualcodec-0.1.2a2-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dualcodec-0.1.2a2-py3-none-any.whl
Algorithm Hash digest
SHA256 04c5430b7bac458d7060b301873831644e18ed1985b82f78e9eb4ad0703f008c
MD5 fea399c2c13deaf29ef058f38757799c
BLAKE2b-256 e62ef7a040452a64c80adb336026fc2161830463ed626daaec8f3a6d9c170c96

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page