The DualCodec neural audio codec.

Project description

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

About

Installation

pip install dualcodec

News

2025-01-17: DualCodec inference code is released!

Available models

Model_ID	Frame Rate	RVQ Quantizers	Semantic Codebook Size (RVQ-1 Size)	Acoustic Codebook Size (RVQ-rest Size)	Training Data
12hz_v1	12.5Hz	Any from 1-8 (maximum 8)	16384	4096	100K hours Emilia
25hz_v1	25Hz	Any from 1-12 (maximum 12)	16384	1024	100K hours Emilia

How to inference DualCodec

1. Download checkpoints to local:

# export HF_ENDPOINT=https://hf-mirror.com      # uncomment this to use huggingface mirror if you're in China
huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts

2. To inference an audio in a python script:

import dualcodec

w2v_path = "./w2v-bert-2.0" # your downloaded path
dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
model_id = "12hz_v1" # select from available Model_IDs, "12hz_v1" or "25hz_v1"

dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")

# do inference for your wav
import torchaudio
audio, sr = torchaudio.load("YOUR_WAV.wav")
# resample to 24kHz
audio = torchaudio.functional.resample(audio, sr, 24000)
audio = audio.reshape(1,1,-1)
# extract codes, for example, using 8 quantizers here:
semantic_codes, acoustic_codes = inference.encode(audio, n_quantizers=8)
# semantic_codes shape: torch.Size([1, 1, T])
# acoustic_codes shape: torch.Size([1, n_quantizers-1, T])

# produce output audio
out_audio = dualcodec_model.decode_from_codes(semantic_codes, acoustic_codes)

# save output audio
torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)

See "example.ipynb" for a running example.

DualCodec-based TTS models

DualCodec-based TTS

Benchmark results

DualCodec audio quality

DualCodec-based TTS

Training DualCodec

Stay tuned for the training code release!

Citation

Project details

Release history Release notifications | RSS feed

0.4.2

Aug 22, 2025

0.4.1

May 27, 2025

0.4.0

May 27, 2025

0.3.7

May 15, 2025

0.3.6

Mar 30, 2025

0.3.3

Feb 17, 2025

0.3.2

Jan 22, 2025

0.3.1

Jan 22, 2025

0.3.0

Jan 22, 2025

This version

0.2.1

Jan 15, 2025

0.1.3

Jan 13, 2025

0.1.2

Jan 12, 2025

0.1.2a2 pre-release

Jan 13, 2025

0.1.1

Jan 12, 2025

0.1.0

Jan 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dualcodec-0.2.1.tar.gz (1.3 MB view details)

Uploaded Jan 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dualcodec-0.2.1-py3-none-any.whl (34.4 kB view details)

Uploaded Jan 15, 2025 Python 3

File details

Details for the file dualcodec-0.2.1.tar.gz.

File metadata

Download URL: dualcodec-0.2.1.tar.gz
Upload date: Jan 15, 2025
Size: 1.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dualcodec-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`479b90cf718869f761454c5f760c001354e933486e82c9962c7e9739b01b1f2c`
MD5	`1dc891bba47167045eaf41c90d7e7217`
BLAKE2b-256	`18354b15ffbc005b07b17d973a6549f3dbc724ca66dc487333fe98da1b82d432`

See more details on using hashes here.

File details

Details for the file dualcodec-0.2.1-py3-none-any.whl.

File metadata

Download URL: dualcodec-0.2.1-py3-none-any.whl
Upload date: Jan 15, 2025
Size: 34.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dualcodec-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8aac88633180f3c802a3d3b931a752c36728094a59040402d2b4ce9e4c58a15e`
MD5	`ae0121a0ec39e0a9f6cac13d3b3c69ee`
BLAKE2b-256	`6f4413fc18366b0dd1c5f137b0f7fc76292854ca829fb7d06172fbb6969cdcd0`

See more details on using hashes here.

dualcodec 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

About

Installation

News

Available models

How to inference DualCodec

1. Download checkpoints to local:

2. To inference an audio in a python script:

DualCodec-based TTS models

DualCodec-based TTS

Benchmark results

DualCodec audio quality

DualCodec-based TTS

Training DualCodec

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes