Skip to main content

eXtensive Audio Representation and Evaluation Suite

Project description

xares

X-ARES: eXtensive Audio Representation and Evaluation Suite

This project is still in the early stage of development and welcomes contributions, especially in expanding supported tasks and improving robustness.

Introduction

X-ARES is a toolkit for training, evaluating, and exporting audio encoders for various audio tasks. It is heavily inspired by the HEAR benchmark.

Planned supported tasks

Currently, only four tasks (ESC-50, VocalSound, CREMA-D and LibriSpeech-Male-Female) have been implemented. However, adding new tasks is straightforward: it typically only need to implement the audio data download and packaging functions, as manual audio decoding and training implementations are not necessary. We encourage anyone to submit pull requests to expand the set of supported tasks.

Speech

  • Speech Commands V1
  • Speech Commands V2
  • LibriCount
  • VoxLingua107
  • VoxCeleb1
  • LibriSpeech-Male-Female
  • LibriSpeech-Phoneme
  • Fluent Speech Commands
  • VocalSound
  • CREMA-D
  • RAVDESS
  • ASV2015
  • DiCOVA
  • speechocean762

Environment

  • ESC-50
  • FSD50k
  • UrbanSound 8k
  • DESED
  • (A task designed for car)
  • (Another task designed for car)
  • (A task designed for soundbox)
  • (A task designed for headphone)
  • LITIS Rouen
  • FSD18-Kaggle
  • AudioCaps

Music

  • MAESTRO
  • GTZAN Genre
  • NSynth
  • MTG-Jamendo
  • FMA

Installation

X-ARES is available on PyPI. You can install it via pip.

pip install xares

For development, you can clone the repository and install the package in editable mode.

git clone <this-repo>
cd xares
pip install -e .[examples]

Run with the baseline pretrained audio encoder (Dasheng)

The ESC-50 task is used as an example.

from example.dasheng.dasheng_encoder import DashengEncoder
from tasks.esc50 import esc50_task

task = esc50_task.ESC50Task(env_root="./env", encoder=DashengEncoder())

task.run_all()

Run with your own pretrained audio encoder

An example of audio encoder wrapper could be found at example/dasheng/dasheng_encoder.py. It is very simple because the "dasheng" model is already in the required in/out format.

from dataclasses import dataclass
from dasheng import dasheng_base
from xares.audio_encoder_base import AudioEncoderBase


@dataclass
class DashengEncoder(AudioEncoderBase):
    model = dasheng_base()
    sampling_rate = 16_000
    output_dim = 768

    def __call__(self, audio, sampling_rate):
        # Since the "dasheng" model is already in the required in/out format, we directly use the super class method
        return super().__call__(audio, sampling_rate)

Another example could be found at example/wav2vec2/wav2vec2.py. It is more complex, you need to covert the input audio and the encoded embedding to the required format.

from dataclasses import dataclass
from transformers import Wav2Vec2FeatureExtractor, Wav2Vec2Model
from xares.audio_encoder_base import AudioEncoderBase


@dataclass
class Wav2vec2Encoder(AudioEncoderBase):
    output_dim = 768

    def __post_init__(self):
        self.feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/wav2vec2-base-960h")
        self.model = Wav2Vec2Model.from_pretrained("facebook/wav2vec2-base-960h")

    def __call__(self, audio, sampling_rate):
        input_values = (
            self.feature_extractor(
                self.pre_process_audio(audio, sampling_rate), sampling_rate=self.sampling_rate, return_tensors="pt"
            )
            .input_values.squeeze()
            .to(self.device)
        )

        encoded_audio = self.encode_audio(input_values)["last_hidden_state"]

        if not self.check_encoded_audio(encoded_audio):
            raise ValueError("Invalid encoded audio")

        return self.encoded_audio

Add your own task

To add a new task, refer to existing task implementations for guidance. Essentially, create a task class that inherits from TaskBase and implements the make_audio_tar method tailored to your chosen dataset, handling data download, preprocessing, and packaging into the required format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xares-0.0.2.tar.gz (26.2 kB view hashes)

Uploaded Source

Built Distribution

xares-0.0.2-py3-none-any.whl (27.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page