Metrics to measure the quality of audio

Project description

Audio Metrics

This repository contains a python package to compute distribution-based quality metrics for audio data using embeddings, with a focus on music.

It features the following metrics:

Accompaniment Prompt Adherence (APA, see https://arxiv.org/abs/2503.06346 )
Fréchet Distance (see https://arxiv.org/abs/1812.08466 )
Kernel Distance/Maximum Mean Discrepancy (see https://arxiv.org/abs/1812.08466 )
Density and Coverage (see https://arxiv.org/abs/2002.09797 )

The measures have in common that they compare a set of candidate audio tracks against a set of reference tracks, rather than evaluating individual tracks, and they all work on embedding representations of audio, obtained from models pretrained on tasks like audio classification.

The first two measures are typically used to measure audio quality (i.e. the naturalness of the sound, and the absence of acoustic artifacts). Density and Coverage explicitly measure how well the candidate set coincides with the reference set by comparing the embedding manifolds.

The Accompaniment Prompt Adherence measures operates on sets whose elements are pairs of audio tracks, typically a mix and an accompaniment, and quantifies how well the accompaniment fits to the mix.

The measures can be combined with embeddings from any of the following models:

VGGish - https://arxiv.org/abs/1609.09430 Trained on audio event classification. 128-dimensional embeddings from the last feature layer before the classification layer.
Laion CLAP - https://github.com/LAION-AI/CLAP using either the checkpoint trained on music and speech, or the checkpoint trained on music only; Embeddings from the last three layers (512, 512, and 128-dimensional)

Installation

Install from PyPI:

pip install audio-metrics

To run the examples, install with the examples dependencies:

pip install 'audio-metrics[examples]'

Development Installation

For development or to use the latest version from source:

git clone https://github.com/SonyCSLParis/audio-metrics.git
cd audio-metrics
pip install -e '.[dev]'

Usage

The following examples demonstrate the use of the package. For more examples see ./examples directory.

import numpy as np
from audio_metrics import AudioMetrics

sr = 48000
n_seconds = 5

n_windows = 100
window_len = sr * n_seconds

# create random audio signals
reference = np.random.random((n_windows, window_len))
candidate = np.random.random((n_windows, window_len))

metrics = AudioMetrics(
    metrics=[
        "prdc",  # precision, recall, density, coverage
        "fad",  # frechet audio distance
        "kd",  # kernel distance
    ],
    input_sr=sr,
)
metrics.add_reference(reference)

print(metrics.evaluate(candidate))

# To compute APA, the input data must be pairs of context and stem (in the
# trailing dimension)
reference = np.random.random((n_windows, window_len, 2))
# Data can also be passed as a generator, to facilitate processing larger
# datasets
candidate = (np.random.random((window_len, 2)) for _ in range(n_windows))

# stem-only metrics (like FAD), can be computed simultaneously with APA
metrics = AudioMetrics(
    metrics=["fad", "apa"],
    input_sr=sr,
)
metrics.add_reference(reference)
print(metrics.evaluate(candidate))

When computing APA the reference and candidate sets must be pairs of context and stem. Note that when FAD and/or PRDC are computed as additional metrics, these are only computed for the stems (the contexts are ignored for these metrics).

Citation

To cite this work, please use:

M. Grachten and J. Nistal. (2025). Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. IEEE. Hyderabad, India.

Project details

Release history Release notifications | RSS feed

1.0.4

Jan 15, 2026

This version

1.0.3

Oct 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_metrics-1.0.3.tar.gz (38.1 kB view details)

Uploaded Oct 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

audio_metrics-1.0.3-py3-none-any.whl (40.4 kB view details)

Uploaded Oct 17, 2025 Python 3

File details

Details for the file audio_metrics-1.0.3.tar.gz.

File metadata

Download URL: audio_metrics-1.0.3.tar.gz
Upload date: Oct 17, 2025
Size: 38.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for audio_metrics-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`624275acd172b8c0325728759f5ed782f669a4129159405c7b606244eedc26df`
MD5	`11384bb3d818339d3d60f85d27bf374b`
BLAKE2b-256	`8cd97c91a23a7e6184d2893b38555b4ca179c244ec31db3c48ee80df8b678a91`

See more details on using hashes here.

Provenance

The following attestation bundles were made for audio_metrics-1.0.3.tar.gz:

Publisher: release.yml on SonyCSLParis/audio-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: audio_metrics-1.0.3.tar.gz
- Subject digest: 624275acd172b8c0325728759f5ed782f669a4129159405c7b606244eedc26df
- Sigstore transparency entry: 619124442
- Sigstore integration time: Oct 17, 2025
Source repository:
- Permalink: SonyCSLParis/audio-metrics@99c65751bae7fd6f8fa86d56d3f3c29803a44b16
- Branch / Tag: refs/heads/main
- Owner: https://github.com/SonyCSLParis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@99c65751bae7fd6f8fa86d56d3f3c29803a44b16
- Trigger Event: push

File details

Details for the file audio_metrics-1.0.3-py3-none-any.whl.

File metadata

Download URL: audio_metrics-1.0.3-py3-none-any.whl
Upload date: Oct 17, 2025
Size: 40.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for audio_metrics-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8c74bab8f85559287314660337ad189f96f35c9dcd54ec16429108072f4a3718`
MD5	`0b96b359d8e0492f5bc23359f9ae5815`
BLAKE2b-256	`bdfa484bd389566341a25d56514a4dde633e0f1a10c03ce67426452125387dcc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for audio_metrics-1.0.3-py3-none-any.whl:

Publisher: release.yml on SonyCSLParis/audio-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: audio_metrics-1.0.3-py3-none-any.whl
- Subject digest: 8c74bab8f85559287314660337ad189f96f35c9dcd54ec16429108072f4a3718
- Sigstore transparency entry: 619124494
- Sigstore integration time: Oct 17, 2025
Source repository:
- Permalink: SonyCSLParis/audio-metrics@99c65751bae7fd6f8fa86d56d3f3c29803a44b16
- Branch / Tag: refs/heads/main
- Owner: https://github.com/SonyCSLParis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@99c65751bae7fd6f8fa86d56d3f3c29803a44b16
- Trigger Event: push

audio-metrics 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Audio Metrics

Installation

Development Installation

Usage

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance