The DualCodec neural audio codec.

Project description

DualCodec: A Speech Generation-Oriented Neural Audio Codec with Dual Encoding of Waveform and Self-Supervised Feature

Installation

pip install dualcodec

Available models

Model_ID	Frame Rate	RVQ Quantizers	Semantic Codebook Size (RVQ-1 Size)	Acoustic Codebook Size (RVQ-rest Size)	Training Data
12hz_v1	12.5Hz	Any from 1-8 (maximum 8)	16384	4096	100K hours Emilia
25hz_v1	25Hz	Any from 1-12 (maximum 12)	16384	1024	100K hours Emilia

How to inference DualCodec

1. Download checkpoints to local:

# export HF_ENDPOINT=https://hf-mirror.com      # uncomment this to use huggingface mirror if you're in China
huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts

2. To inference an audio in a python script:

import dualcodec

w2v_path = "./w2v-bert-2.0" # your downloaded path
dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
model_id = "12hz_v1" # or "25hz_v1"

dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")

# do inference for your wav
import torchaudio
audio, sr = torchaudio.load("YOUR_WAV.wav")
# resample to 24kHz
audio = torchaudio.functional.resample(audio, sr, 24000)
audio = audio.reshape(1,1,-1)
# extract codes, for example, using 8 quantizers here:
semantic_codes, acoustic_codes = inference.encode(audio, n_quantizers=8)
# semantic_codes shape: torch.Size([1, 1, T])
# acoustic_codes shape: torch.Size([1, n_quantizers-1, T])

# produce output audio
out_audio = dualcodec_model.decode_from_codes(semantic_codes, acoustic_codes)

# save output audio
torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)

See "example.ipynb" for a running example.

DualCodec-based TTS models

Benchmarking

Link to DualCodec-based TTS repositories

Training DualCodec

Stay tuned for the training code release! Should be within two weeks.

Citation

Project details

Release history Release notifications | RSS feed

0.4.2

Aug 22, 2025

0.4.1

May 27, 2025

0.4.0

May 27, 2025

0.3.7

May 15, 2025

0.3.6

Mar 30, 2025

0.3.3

Feb 17, 2025

0.3.2

Jan 22, 2025

0.3.1

Jan 22, 2025

0.3.0

Jan 22, 2025

0.2.1

Jan 15, 2025

This version

0.1.3

Jan 13, 2025

0.1.2

Jan 12, 2025

0.1.2a2 pre-release

Jan 13, 2025

0.1.1

Jan 12, 2025

0.1.0

Jan 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dualcodec-0.1.3.tar.gz (7.2 MB view details)

Uploaded Jan 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dualcodec-0.1.3-py3-none-any.whl (27.1 kB view details)

Uploaded Jan 13, 2025 Python 3

File details

Details for the file dualcodec-0.1.3.tar.gz.

File metadata

Download URL: dualcodec-0.1.3.tar.gz
Upload date: Jan 13, 2025
Size: 7.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dualcodec-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`5b1a24e5782238621f4c28e72d75941d8bf1961922dc084dfbc4d0259363f3bb`
MD5	`55852e2ef981ce033ba945abfafed353`
BLAKE2b-256	`c8a8fce4387ab2597079a3c51d3d39f89ded2fe74187c7f19f2a3ae421bf63ad`

See more details on using hashes here.

File details

Details for the file dualcodec-0.1.3-py3-none-any.whl.

File metadata

Download URL: dualcodec-0.1.3-py3-none-any.whl
Upload date: Jan 13, 2025
Size: 27.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dualcodec-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b0e3ee1a2a959348d379f2f4f93e202759882e6a5342a710e5c8339d3422912`
MD5	`906ef6b3bf40d2bc00eb7c4816f68a89`
BLAKE2b-256	`c9c4136c605095a80499b45bedd85b4ae1526b51db8d60944f0b2e43b027cb0e`

See more details on using hashes here.

dualcodec 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

DualCodec: A Speech Generation-Oriented Neural Audio Codec with Dual Encoding of Waveform and Self-Supervised Feature

Installation

Available models

How to inference DualCodec

1. Download checkpoints to local:

2. To inference an audio in a python script:

DualCodec-based TTS models

Benchmarking

Link to DualCodec-based TTS repositories

Training DualCodec

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes