The DualCodec neural audio codec.
Project description
DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
About
Installation
pip install dualcodec
News
- 2025-01-17: DualCodec inference code is released!
Available models
| Model_ID | Frame Rate | RVQ Quantizers | Semantic Codebook Size (RVQ-1 Size) | Acoustic Codebook Size (RVQ-rest Size) | Training Data |
|---|---|---|---|---|---|
| 12hz_v1 | 12.5Hz | Any from 1-8 (maximum 8) | 16384 | 4096 | 100K hours Emilia |
| 25hz_v1 | 25Hz | Any from 1-12 (maximum 12) | 16384 | 1024 | 100K hours Emilia |
How to inference DualCodec
1. Download checkpoints to local:
# export HF_ENDPOINT=https://hf-mirror.com # uncomment this to use huggingface mirror if you're in China
huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts
2. To inference an audio in a python script:
import dualcodec
w2v_path = "./w2v-bert-2.0" # your downloaded path
dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
model_id = "12hz_v1" # select from available Model_IDs, "12hz_v1" or "25hz_v1"
dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")
# do inference for your wav
import torchaudio
audio, sr = torchaudio.load("YOUR_WAV.wav")
# resample to 24kHz
audio = torchaudio.functional.resample(audio, sr, 24000)
audio = audio.reshape(1,1,-1)
# extract codes, for example, using 8 quantizers here:
semantic_codes, acoustic_codes = inference.encode(audio, n_quantizers=8)
# semantic_codes shape: torch.Size([1, 1, T])
# acoustic_codes shape: torch.Size([1, n_quantizers-1, T])
# produce output audio
out_audio = dualcodec_model.decode_from_codes(semantic_codes, acoustic_codes)
# save output audio
torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)
See "example.ipynb" for a running example.
DualCodec-based TTS models
DualCodec-based TTS
Benchmark results
DualCodec audio quality
DualCodec-based TTS
Training DualCodec
Stay tuned for the training code release!
Citation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dualcodec-0.2.1.tar.gz
(1.3 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
dualcodec-0.2.1-py3-none-any.whl
(34.4 kB
view details)
File details
Details for the file dualcodec-0.2.1.tar.gz.
File metadata
- Download URL: dualcodec-0.2.1.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
479b90cf718869f761454c5f760c001354e933486e82c9962c7e9739b01b1f2c
|
|
| MD5 |
1dc891bba47167045eaf41c90d7e7217
|
|
| BLAKE2b-256 |
18354b15ffbc005b07b17d973a6549f3dbc724ca66dc487333fe98da1b82d432
|
File details
Details for the file dualcodec-0.2.1-py3-none-any.whl.
File metadata
- Download URL: dualcodec-0.2.1-py3-none-any.whl
- Upload date:
- Size: 34.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8aac88633180f3c802a3d3b931a752c36728094a59040402d2b4ce9e4c58a15e
|
|
| MD5 |
ae0121a0ec39e0a9f6cac13d3b3c69ee
|
|
| BLAKE2b-256 |
6f4413fc18366b0dd1c5f137b0f7fc76292854ca829fb7d06172fbb6969cdcd0
|