A toolkit library for Kernel Audio Distance.

These details have not been verified by PyPI

Project links

Project description

Kernel Audio Distance Toolkit

The Kernel Audio Distance Toolkit (KADTK) provides an efficient and standardized implementation of Kernel Audio Distance (KAD)—a distribution-free, unbiased, and computationally efficient metric for evaluating generative audio.

1. Installation

To use the KAD toolkit, you must first install it. This library is created and tested on Python 3.10 on Linux but should work on Python >=3.9,<3.12.

1.1 Install

Requirement: Install torch here (for previous versions); only torch >=2.1,<2.6 officially supported.

To install kad toolkit, run:

pip install kadtk

(to reproduce our exact tested environment,

git clone https://github.com/YoonjinXD/kadtk.git && 
cd kadtk && 
pip install poetry==2.0.1 && 
poetry install && 
pip install -e .

)

1.2 Troubleshooting

if scipy causes some error, reinstall scipy: pip uninstall scipy && pip install scipy==1.11.2
if charset causes some error, (re)install chardet: pip install chardet
if CUDA causes some error, ensure your device is GPU-compatible and install the necessary software for CUDA support.

2. Usage

The toolkit provides a CLI command for computing KAD scores. It automatically extracts embeddings and computes the KAD score between your reference set (e.g. ground truth) and target evaluation set (e.g. generated audio).

kadtk {model_name} {reference-set dir} {target-set dir}

Note that:

KAD generally has a different value when the reference set and the target set are switched, because the kernel bandwidth for the MMD is calculated as the median distance between the embeddings of the reference set. This is to ensure that the score takes on a consistent meaning even when the target set is changed.
KAD is based on an unbiased, finite-sample estimation of the MMD; it may take on negative values if there are too few samples and/or if the two embedding sets are very close in distribution. Refer to our paper for more details. Make sure that the reference set always contains the ground truth samples (e.g. Audiocaps or Clotho for text-to-audio), and that the target set contains the generated samples.

(Enable Options)

--fad compute Fréchet Audio Distance instead of Kernel Audio Distance.
--inf option uses metric-inf extrapolation, and --indiv calculates metric for individual audios.
--force-emb-encode forces re-extraction of embeddings, not using cache.
--force_stats-calc forces re-calculation of kernel statistics, not using cache.

(Examples)

kadtk panns-wavegram-logmel {reference-set dir} {target-set dir} # will calulcate kad btw 2 dirs(each dirs should contains wav files)
kadtk vggish {reference-set dir} {target-set dir} --fad # will calculate FAD instead of KAD
kadtk passt-fsd50k {reference-set dir} {target-set dir} --csv scores.csv # will save results in scores.csv
kadtk-embeds -m wavlm-base -d {reference-set dir} {target-set dir} # will only save each embeddings

3. Supported Models

Model	Name in KADtk	Description	Creator
CLAP	`clap-2023`	general audio representation	Microsoft
CLAP	`clap-laion-{audio/music}`	general audio, music representation	LAION
MERT	`MERT-v1-95M-{layer}`	music understanding	m-a-p
VGGish	`vggish`	general audio embedding	Google
PANNs	`panns-cnn14-{16k/32k}, panns-wavegram-logmel`	general audio embedding	Kong, Qiuqiang, et al.
OpenL3	`openl3-{mel256/mel128}-{env/music}`	general audio embedding	Cramer, Aurora et al.
PaSST	`passt-{base-{10s/20s/30s}, passt-openmic, passt-fsd50k` (10s default, base for AudioSet)	general audio embedding	Koutini, Khaled et al.
Encodec	`encodec-emb`	audio codec	Facebook/Meta Research
DAC	`dac-44kHz`	audio codec	Descript
CDPAM	`cdpam-{acoustic/content}`	perceptual audio metric	Pranay Manocha et al.
Wav2vec 2.0	`w2v2-{base/large}`	speech representation	Facebook/Meta Research
HuBERT	`hubert-{base/large}`	speech representation	Facebook/Meta Research
WavLM	`wavlm-{base/base-plus/large}`	speech representation	Microsoft
Whisper	`whisper-{tiny/base/small/medium/large}`	speech recognition	OpenAI

Optional Dependencies

Optionally, you can install dependencies that add additional embedding support. They are:

CDPAM: pip install cdpam
DAC: pip install descript-audio-codec==1.0.0

4. Citation, Acknowledgments and Licenses

@article{kad,
    author={Chung, Yoonjin and Eu, Pilsun and Lee, Junwon and Choi, Keunwoo and Nam, Juhan and Chon, Ben Sangbae},
    title={KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation}, 
    journal = {arXiv:2502.15602},
    url = {https://arxiv.org/abs/2502.15602},
    year = {2025}
}

We sincerely thank the authors of the following papers for sharing the code as open source: fadtk fadtk with panns

@article{fad_embeddings,
    author = {Tailleur, Modan and Lee, Junwon and Lagrange, Mathieu and Choi, Keunwoo and Heller, Laurie M. and Imoto, Keisuke and Okamoto, Yuki},
    title = {Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant},
    journal = {arXiv:2403.17508},
    url = {https://arxiv.org/abs/2403.17508},
    year = {2024}
}

@inproceedings{fadtk,
  title = {Adapting Frechet Audio Distance for Generative Music Evaluation},
  author = {Azalea Gui, Hannes Gamper, Sebastian Braun, Dimitra Emmanouilidou},
  booktitle = {Proc. IEEE ICASSP 2024},
  year = {2024},
  url = {https://arxiv.org/abs/2311.01616},
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Mar 16, 2025

1.0.0

Mar 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kadtk-1.1.0.tar.gz (66.9 kB view details)

Uploaded Mar 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kadtk-1.1.0-py3-none-any.whl (77.1 kB view details)

Uploaded Mar 16, 2025 Python 3

File details

Details for the file kadtk-1.1.0.tar.gz.

File metadata

Download URL: kadtk-1.1.0.tar.gz
Upload date: Mar 16, 2025
Size: 66.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.9.7 Linux/5.4.0-104-generic

File hashes

Hashes for kadtk-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7d6c53898e996aa8b754e01016589004839457773e3b33846917b01a89806f36`
MD5	`3e88392e1b20f2789a66b7d0df01d965`
BLAKE2b-256	`629e738d71bd8f0abf5f19d8e25c0ec835d6c693dba6046bf0e60b87e8626d99`

See more details on using hashes here.

File details

Details for the file kadtk-1.1.0-py3-none-any.whl.

File metadata

Download URL: kadtk-1.1.0-py3-none-any.whl
Upload date: Mar 16, 2025
Size: 77.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.9.7 Linux/5.4.0-104-generic

File hashes

Hashes for kadtk-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e2faa02cceaf777a53bceb80b0cc3ca0154e13d204f8e73b416b3b077610e3f8`
MD5	`f19e6c4af544590c262941da792a67a8`
BLAKE2b-256	`bea0f1510cf1e702737923377812e027d9ce9b191b026db8f57ca9f3929e0d76`

See more details on using hashes here.

kadtk 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Kernel Audio Distance Toolkit

1. Installation

1.1 Install

1.2 Troubleshooting

2. Usage

3. Supported Models

Optional Dependencies

4. Citation, Acknowledgments and Licenses

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes