The DualCodec neural audio codec.
Project description
DualCodec: A Speech Generation-Oriented Neural Audio Codec with Dual Encoding of Waveform and Self-Supervised Feature
Installation
pip install dualcodec
Available models
| Model_ID | Frame Rate | RVQ Quantizers | Semantic Codebook Size (RVQ-1 Size) | Acoustic Codebook Size (RVQ-rest Size) | Training Data |
|---|---|---|---|---|---|
| 12hz_v1 | 12.5Hz | Any from 1-8 (maximum 8) | 16384 | 4096 | 100K hours Emilia |
| 25hz_v1 | 25Hz | Any from 1-12 (maximum 12) | 16384 | 1024 | 100K hours Emilia |
How to inference DualCodec
1. Download checkpoints to local:
# export HF_ENDPOINT=https://hf-mirror.com # uncomment this to use huggingface mirror if you're in China
huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts
2. To inference an audio in a python script:
import dualcodec
w2v_path = "./w2v-bert-2.0" # your downloaded path
dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
model_id = "12hz_v1" # or "25hz_v1"
dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")
# do inference for your wav
import torchaudio
audio, sr = torchaudio.load("YOUR_WAV.wav")
# resample to 24kHz
audio = torchaudio.functional.resample(audio, sr, 24000)
audio = audio.reshape(1,1,-1)
# extract codes, for example, using 8 quantizers here:
semantic_codes, acoustic_codes = inference.encode(audio, n_quantizers=8)
# semantic_codes shape: torch.Size([1, 1, T])
# acoustic_codes shape: torch.Size([1, n_quantizers-1, T])
# produce output audio
out_audio = dualcodec_model.decode_from_codes(semantic_codes, acoustic_codes)
# save output audio
torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)
See "example.ipynb" for a running example.
DualCodec-based TTS models
Benchmarking
Link to DualCodec-based TTS repositories
Training DualCodec
Stay tuned for the training code release! Should be within two weeks.
Citation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dualcodec-0.1.3.tar.gz
(7.2 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
dualcodec-0.1.3-py3-none-any.whl
(27.1 kB
view details)
File details
Details for the file dualcodec-0.1.3.tar.gz.
File metadata
- Download URL: dualcodec-0.1.3.tar.gz
- Upload date:
- Size: 7.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b1a24e5782238621f4c28e72d75941d8bf1961922dc084dfbc4d0259363f3bb
|
|
| MD5 |
55852e2ef981ce033ba945abfafed353
|
|
| BLAKE2b-256 |
c8a8fce4387ab2597079a3c51d3d39f89ded2fe74187c7f19f2a3ae421bf63ad
|
File details
Details for the file dualcodec-0.1.3-py3-none-any.whl.
File metadata
- Download URL: dualcodec-0.1.3-py3-none-any.whl
- Upload date:
- Size: 27.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b0e3ee1a2a959348d379f2f4f93e202759882e6a5342a710e5c8339d3422912
|
|
| MD5 |
906ef6b3bf40d2bc00eb7c4816f68a89
|
|
| BLAKE2b-256 |
c9c4136c605095a80499b45bedd85b4ae1526b51db8d60944f0b2e43b027cb0e
|