eXtensive Audio Representation and Evaluation Suite

These details have not been verified by PyPI

Project links

Repository

Project description

X-ARES: eXtensive Audio Representation and Evaluation Suite

Introduction

X-ARES is a benchmark for evaluating audio encoders on various audio tasks. It is heavily inspired by the HEAR benchmark.

Supported tasks

Speech

Speech Commands V2
LibriCount
VoxLingua107
VoxCeleb1
LibriSpeech-Male-Female
Fluent Speech Commands
VocalSound
CREMA-D
RAVDESS
LibriSpeech-Phoneme
ASV2015

Environment

ESC-50
FSD50k
UrbanSound 8k
DESED
FSD18-Kaggle
Clotho

Music

MAESTRO
GTZAN Genre
NSynth
FMA

Installation

X-ARES is available on PyPI. You can install it via pip.

pip install xares

For development, you can clone the repository and install the package in editable mode.

git clone <this-repo>
cd xares
pip install -e .[examples]

Run with the baseline pretrained audio encoder (Dasheng)

You can run the benchmark with the baseline pretrained audio encoder (Dasheng) with 8 parallel jobs using the following command:

python -m xares.run --max-jobs 8 example/dasheng/dasheng_encoder.py src/tasks/*.py

It will download the datasets from Zenodo, and then evaluate the encoder on all the tasks. If the automatic download fails, you can also manually download the datasets using tools/download_manually.sh.

Alternatively, you can run tasks from within Python. Here is an example of running the ASVspoof2015 task in a single process:

>>> from example.dasheng.dasheng_encoder import DashengEncoder
>>> from tasks.asvspoof_task import asvspoof2015_config
>>> from xares.task import XaresTask

>>> task = XaresTask(encoder=DashengEncoder(), config=asvspoof2015_config())
>>> task.run()

Baseline Results

X-ARES provides two evaluation methods to assess the quality of audio representations: MLP (Linear Fine-Tuning) and kNN (Unparameterized Evaluation).

MLP: Linear Fine-Tuning on Task-Specific Data. A linear layer will be trained using the provided user embeddings, optimized with predefined hyperparameters for each task. This approach assesses how effectively the fixed representations can be adapted to specific tasks by training an additional linear layer, using predefined hyperparameters tailored for each task. This method evaluates the adaptability and effectiveness of the pre-trained models when applied to new, task-specific contexts without altering the original model parameters.

kNN: Unparameterized Evaluation. Pre-trained model embeddings will be used directly for K-nearest neighbor (KNN) classification without training. This method aims to evaluate the inherent quality of the audio representations without any fine-tuning. While this approach may not always yield the highest performance in real-world applications, it serves as a rigorous test of the fundamental representational power of the embeddings. By avoiding parameterized layers, this method provides a clear view of how well the model captures essential features of the audio data.

Here are the evaluation results for several baseline models using MLP and kNN methods. The weighted average is calculated using the test set size for each dataset.

MLP Result

Dataset	dasheng	wav2vec2	whisper	data2vec
asvspoof (mini)	0.956	0.914	0.885	0.892
crema_d	0.772	0.568	0.600	0.566
esc50	0.869	0.579	0.614	0.249
fluentspeechcommands_kws	0.916	0.417	0.878	0.962
freemusicarchive_genre	0.640	0.518	0.595	0.360
fsdkaggle2018	0.557	0.352	0.478	0.196
gtzan	0.869	0.681	0.751	0.495
libricount	0.688	0.605	0.549	0.507
librispeech_male_female(mini)	0.859	0.703	0.877	0.692
nsynth_instument	0.261	0.251	0.259	0.223
ravdess	0.725	0.440	0.460	0.469
speechcommandsv1	0.967	0.805	0.955	0.930
urbansound8k	0.835	0.676	0.719	0.443
vocalsound	0.910	0.791	0.871	0.807
voxceleb1 (mini)	0.159	0.020	0.088	0.031
voxlingua33 (mini)	0.411	0.050	0.419	0.345
Weighted Average	0.640	0.479	0.588	0.533

kNN Result

Dataset	dasheng	wav2vec2	whisper	data2vec
asvspoof (mini)	0.833	0.611	0.600	0.919
crema_d	0.381	0.175	0.382	0.325
esc50	0.621	0.091	0.191	0.037
fluentspeechcommands_kws	0.025	0.008	0.032	0.156
freemusicarchive_genre	0.589	0.135	0.396	0.126
gtzan	0.753	0.347	0.504	0.119
libricount	0.310	0.241	0.253	0.186
librispeech_male_female (mini)	0.493	0.552	0.586	0.632
nsynth_instument	0.253	0.235	0.233	0.209
ravdess	0.369	0.171	0.287	0.289
speechcommandsv1	0.903	0.208	0.096	0.850
urbansound8k	0.662	0.334	0.214	0.153
vocalsound	0.336	0.265	0.417	0.295
voxceleb1 (mini)	0.016	0.001	0.007	0.001
voxlingua33 (mini)	0.222	0.017	0.207	0.095
Weighted Average	0.372	0.224	0.251	0.351

Run with your own pretrained audio encoder

Two examples of audio encoder wrapper could be found at example/dasheng/dasheng_encoder.py and example/wav2vec2/wav2vec2.py.

We provide a check function to verify if the encoder is correctly implemented:

>>> from xares.audio_encoder_checker import check_audio_encoder

>>> encoder = YourEncoder()
>>> check_audio_encoder(encoder)
True

And then you can run the benchmark with your own encoder:

python -m xares.run --max-jobs 8 your_encoder.py src/tasks/*.py

Add new tasks

Adding a new task is easy. Refer to the existing task implementations for guidance. You need to create a TaskConfig tailored to your chosen dataset.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.1.3

Sep 1, 2025

0.1.2

May 14, 2025

0.1.1

Apr 20, 2025

0.1.0

Feb 19, 2025

This version

0.0.6

Feb 4, 2025

0.0.5

Feb 2, 2025

0.0.4

Jan 31, 2025

0.0.3

Jan 22, 2025

0.0.2

Sep 1, 2024

0.0.1a4 pre-release

Sep 1, 2024

0.0.1a3 pre-release

Jun 16, 2024

0.0.1a2 pre-release

Jun 16, 2024

0.0.1a0 pre-release

Jun 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xares-0.0.6.tar.gz (46.1 kB view details)

Uploaded Feb 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xares-0.0.6-py3-none-any.whl (49.5 kB view details)

Uploaded Feb 4, 2025 Python 3

File details

Details for the file xares-0.0.6.tar.gz.

File metadata

Download URL: xares-0.0.6.tar.gz
Upload date: Feb 4, 2025
Size: 46.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for xares-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`9eede4c52c4d620108af34fa683aec3a4b099c3ae88d7348920d6c682f7804fc`
MD5	`c62c011c3290b7f81a4a0b70e5a0f60a`
BLAKE2b-256	`c60899a6e0a9a18071b27965643456d41997da159cb3b107be7926a1a9ad952c`

See more details on using hashes here.

File details

Details for the file xares-0.0.6-py3-none-any.whl.

File metadata

Download URL: xares-0.0.6-py3-none-any.whl
Upload date: Feb 4, 2025
Size: 49.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for xares-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`79957b2af95ad4390206b48d2e627238b7668e9358215a152098c4c0026460a7`
MD5	`653bd6671b92477c99527650c4d92ce3`
BLAKE2b-256	`d3cc2930b4c1b218717dea59c81bede0c23aca2b89fe2aec7a89044acc6f3b17`

See more details on using hashes here.

xares 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

X-ARES: eXtensive Audio Representation and Evaluation Suite

Introduction

Supported tasks

Speech

Environment

Music

Installation

Run with the baseline pretrained audio encoder (Dasheng)

Baseline Results

MLP Result

kNN Result

Run with your own pretrained audio encoder

Add new tasks

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes