Skip to main content

A comprehensive Python-based system for training, evaluating, and analyzing audio representation learning models with support for both supervised and self-supervised learning paradigms

Project description

avex - Animal Vocalization Encoder Library

CI status Pre-commit status

An API for model loading and inference, and a Python-based system for training and evaluating bioacoustics representation learning models.

Description

The Animal Vocalization Encoder library avex provides a unified interface for working with pre-trained bioacoustics representation learning models, with support for:

  • Model Loading: Load pre-trained models with checkpoints and class mappings
  • Embedding Extraction: Extract features from audio for downstream tasks
  • Probe System: Flexible probe heads (linear, MLP, LSTM, attention, transformer) for transfer learning
  • Training & Evaluation: Scripts for supervised learning experiments
  • Plugin Architecture: Register and use custom models seamlessly

Installation

Prerequisites

  • Python 3.10, 3.11, or 3.12

Install with pip

pip install avex

Install with uv

uv add avex

For development installation with training/evaluation tools, see the Contributing guide.

Quick Start

import torch
import librosa
from avex import load_model, list_models

# List available models
print(list_models().keys())

# Load a pre-trained model
model = load_model("esp_aves2_sl_beats_all", device="cpu")

# Load and preprocess audio (BEATs expects 16kHz)
audio, sr = librosa.load("your_audio.wav", sr=16000)
audio_tensor = torch.tensor(audio).unsqueeze(0)  # Shape: (1, num_samples)

# Run inference
with torch.no_grad():
    logits = model(audio_tensor)
    predicted_class = logits.argmax(dim=-1).item()

# Get human-readable label
if model.label_mapping:
    label = model.label_mapping.get(str(predicted_class), predicted_class)
    print(f"Predicted: {label}")

Embedding Extraction

# Load for embedding extraction (no classifier head)
model = load_model("esp_aves2_sl_beats_all", return_features_only=True, device="cpu")

with torch.no_grad():
    embeddings = model(audio_tensor)
    # Shape: (batch, time_steps, 768) for BEATs

# Pool to get fixed-size embedding
embedding = embeddings.mean(dim=1)  # Shape: (batch, 768)

Transfer Learning with Probes

from avex.models.probes import build_probe_from_config
from avex.configs import ProbeConfig

# Load backbone for feature extraction
base = load_model("esp_aves2_sl_beats_all", return_features_only=True, device="cpu")

# Define a probe head for your task
probe_config = ProbeConfig(
    probe_type="linear",
    target_layers=["last_layer"],
    aggregation="mean",
    freeze_backbone=True,
    online_training=True,
)

probe = build_probe_from_config(
    probe_config=probe_config,
    base_model=base,
    num_classes=10,  # Your number of classes
    device="cpu",
)

Documentation

Full documentation: docs/index.md

Core Documentation

  • API Reference - Complete API documentation for model loading, registry, and management functions
  • Architecture - Framework architecture, core components, and plugin system
  • Supported Models - List of supported models and their configurations
  • Configuration - ModelSpec parameters, audio requirements, and configuration options

Usage Guides

Advanced Topics

Examples: See the examples/ directory:

  • 00_quick_start.py - Basic model loading
  • 01_basic_model_loading.py - Loading models with different configurations
  • 02_checkpoint_loading.py - Working with checkpoints
  • 03_custom_model_registration.py - Custom model registration
  • 04_training_and_evaluation.py - Training and evaluation examples
  • 05_embedding_extraction.py - Feature extraction
  • 06_classifier_head_loading.py - Classifier head behavior

Supported Models

The framework supports the following audio representation learning models:

  • EfficientNet - EfficientNet-based models for audio classification
  • BEATs - BEATs transformer models for audio representation learning
  • EAT - Efficient Audio Transformer models
  • AVES - AVES model for bioacoustics
  • BirdMAE - BirdMAE masked autoencoder for bioacoustic representation learning
  • ATST - Audio Spectrogram Transformer
  • ResNet - ResNet models (ResNet18, ResNet50, ResNet152)
  • CLIP - Contrastive Language-Audio Pretraining models
  • BirdNet - BirdNet models for bioacoustic classification
  • Perch - Perch models for bioacoustics
  • SurfPerch - SurfPerch models

See Supported Models for detailed information and configuration examples.

Supported Probes

The framework provides flexible probe heads for transfer learning:

  • Linear - Simple linear classifier (fastest, most memory-efficient)
  • MLP - Multi-layer perceptron with configurable hidden layers
  • LSTM - Long Short-Term Memory network for sequence modeling
  • Attention - Self-attention mechanism for sequence modeling
  • Transformer - Full transformer encoder architecture

Probes can be trained:

  • Online: End-to-end with the backbone (raw audio input)
  • Offline: On pre-computed embeddings

See Probe System and API Probes for detailed documentation.

Citing

If you use this framework in your research, please cite:

@article{miron2025matters,
  title={What Matters for Bioacoustic Encoding},
  author={Miron, Marius and Robinson, David and Alizadeh, Milad and Gilsenan-McMahon, Ellen and Narula, Gagan and Pietquin, Olivier and Geist, Matthieu and Chemla, Emmanuel and Cusimano, Maddie and Effenberger, Felix and others},
  journal={arXiv preprint arXiv:2508.11845},
  year={2025}
}

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

  • Development setup
  • Running tests
  • Code style guidelines
  • Adding new functionality
  • Pull request process

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built on top of PyTorch
  • Integrates with various pre-trained audio models

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

avex-0.5.0a1.tar.gz (209.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

avex-0.5.0a1-py3-none-any.whl (262.9 kB view details)

Uploaded Python 3

File details

Details for the file avex-0.5.0a1.tar.gz.

File metadata

  • Download URL: avex-0.5.0a1.tar.gz
  • Upload date:
  • Size: 209.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for avex-0.5.0a1.tar.gz
Algorithm Hash digest
SHA256 55fef56b8ec15b0e7f04ad587491ac34deba30c12ee193f92f96bd3a88c0f3ca
MD5 c5e4959d541aeac5afeeece26fe83d2c
BLAKE2b-256 5584909db5a453a0930de5179acc4dc2504891c6affdbb617ae2e138cd7235ea

See more details on using hashes here.

File details

Details for the file avex-0.5.0a1-py3-none-any.whl.

File metadata

  • Download URL: avex-0.5.0a1-py3-none-any.whl
  • Upload date:
  • Size: 262.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for avex-0.5.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 681251b3c095ec7939b05928a66589ec9aa6e7307530b1380825230db8adcb49
MD5 36b2c82a1c64e49f07be29f88d594c70
BLAKE2b-256 c9923606cbdbf2128392ba9368a6f4f61aa56549cf495844296a9d7ef723368e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page