A comprehensive Python-based system for training, evaluating, and analyzing audio representation learning models with support for both supervised and self-supervised learning paradigms

These details have not been verified by PyPI

Project description

avex - Animal Vocalization Encoder Library

CI status Pre-commit status

An API for model loading and inference, and a Python-based system for training and evaluating bioacoustics representation learning models.

Description

The Animal Vocalization Encoder library avex provides a unified interface for working with pre-trained bioacoustics representation learning models, with support for:

Model Loading: Load pre-trained models with checkpoints and class mappings
Embedding Extraction: Extract features from audio for downstream tasks
Probe System: Flexible probe heads (linear, MLP, LSTM, attention, transformer) for transfer learning
Training & Evaluation: Scripts for supervised learning experiments
Plugin Architecture: Register and use custom models seamlessly

Installation

Prerequisites

Python 3.10, 3.11, or 3.12

Install with pip

pip install avex

Install with uv

uv add avex

For development installation with training/evaluation tools, see the Contributing guide.

Quick Start

import torch
import librosa
from avex import load_model, list_models

# List available models
print(list_models().keys())

# Load a pre-trained model
model = load_model("esp_aves2_sl_beats_all", device="cpu")

# Load and preprocess audio (BEATs expects 16kHz)
audio, sr = librosa.load("your_audio.wav", sr=16000)
audio_tensor = torch.tensor(audio).unsqueeze(0)  # Shape: (1, num_samples)

# Run inference
with torch.no_grad():
    logits = model(audio_tensor)
    predicted_class = logits.argmax(dim=-1).item()

# Get human-readable label
if model.label_mapping:
    label = model.label_mapping.get(str(predicted_class), predicted_class)
    print(f"Predicted: {label}")

Embedding Extraction

# Load for embedding extraction (no classifier head)
model = load_model("esp_aves2_sl_beats_all", return_features_only=True, device="cpu")

with torch.no_grad():
    embeddings = model(audio_tensor)
    # Shape: (batch, time_steps, 768) for BEATs

# Pool to get fixed-size embedding
embedding = embeddings.mean(dim=1)  # Shape: (batch, 768)

Transfer Learning with Probes

from avex.models.probes import build_probe_from_config
from avex.configs import ProbeConfig

# Load backbone for feature extraction
base = load_model("esp_aves2_sl_beats_all", return_features_only=True, device="cpu")

# Define a probe head for your task
probe_config = ProbeConfig(
    probe_type="linear",
    target_layers=["last_layer"],
    aggregation="mean",
    freeze_backbone=True,
    online_training=True,
)

probe = build_probe_from_config(
    probe_config=probe_config,
    base_model=base,
    num_classes=10,  # Your number of classes
    device="cpu",
)

Documentation

Full documentation: docs/index.md

Core Documentation

API Reference - Complete API documentation for model loading, registry, and management functions
Architecture - Framework architecture, core components, and plugin system
Supported Models - List of supported models and their configurations
Configuration - ModelSpec parameters, audio requirements, and configuration options

Usage Guides

Training and Evaluation - Guide to training and evaluating models
Embedding Extraction - Working with feature representations and embeddings
Examples - Comprehensive examples and use cases

Advanced Topics

Probe System - Understanding and using probes for transfer learning
API Probes - API reference for probe-related functionality
Custom Model Registration - Guide on registering custom model classes and loading pre-trained models

Examples: See the examples/ directory:

00_quick_start.py - Basic model loading
01_basic_model_loading.py - Loading models with different configurations
02_checkpoint_loading.py - Working with checkpoints
03_custom_model_registration.py - Custom model registration
04_training_and_evaluation.py - Training and evaluation examples
05_embedding_extraction.py - Feature extraction
06_classifier_head_loading.py - Classifier head behavior

Supported Models

The framework supports the following audio representation learning models:

EfficientNet - EfficientNet-based models for audio classification
BEATs - BEATs transformer models for audio representation learning
EAT - Efficient Audio Transformer models
AVES - AVES model for bioacoustics
BirdMAE - BirdMAE masked autoencoder for bioacoustic representation learning
ATST - Audio Spectrogram Transformer
ResNet - ResNet models (ResNet18, ResNet50, ResNet152)
CLIP - Contrastive Language-Audio Pretraining models
BirdNet - BirdNet models for bioacoustic classification
Perch - Perch models for bioacoustics
SurfPerch - SurfPerch models

See Supported Models for detailed information and configuration examples.

Supported Probes

The framework provides flexible probe heads for transfer learning:

Linear - Simple linear classifier (fastest, most memory-efficient)
MLP - Multi-layer perceptron with configurable hidden layers
LSTM - Long Short-Term Memory network for sequence modeling
Attention - Self-attention mechanism for sequence modeling
Transformer - Full transformer encoder architecture

Probes can be trained:

Online: End-to-end with the backbone (raw audio input)
Offline: On pre-computed embeddings

See Probe System and API Probes for detailed documentation.

Citing

If you use this framework in your research, please cite:

@article{miron2025matters,
  title={What Matters for Bioacoustic Encoding},
  author={Miron, Marius and Robinson, David and Alizadeh, Milad and Gilsenan-McMahon, Ellen and Narula, Gagan and Pietquin, Olivier and Geist, Matthieu and Chemla, Emmanuel and Cusimano, Maddie and Effenberger, Felix and others},
  journal={arXiv preprint arXiv:2508.11845},
  year={2025}
}

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

Development setup
Running tests
Code style guidelines
Adding new functionality
Pull request process

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on top of PyTorch
Integrates with various pre-trained audio models

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.1

Apr 20, 2026

1.1.0

Apr 11, 2026

1.0.0

Feb 2, 2026

This version

0.5.0a1 pre-release

Feb 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

avex-0.5.0a1.tar.gz (209.7 kB view details)

Uploaded Feb 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

avex-0.5.0a1-py3-none-any.whl (262.9 kB view details)

Uploaded Feb 2, 2026 Python 3

File details

Details for the file avex-0.5.0a1.tar.gz.

File metadata

Download URL: avex-0.5.0a1.tar.gz
Upload date: Feb 2, 2026
Size: 209.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for avex-0.5.0a1.tar.gz
Algorithm	Hash digest
SHA256	`55fef56b8ec15b0e7f04ad587491ac34deba30c12ee193f92f96bd3a88c0f3ca`
MD5	`c5e4959d541aeac5afeeece26fe83d2c`
BLAKE2b-256	`5584909db5a453a0930de5179acc4dc2504891c6affdbb617ae2e138cd7235ea`

See more details on using hashes here.

File details

Details for the file avex-0.5.0a1-py3-none-any.whl.

File metadata

Download URL: avex-0.5.0a1-py3-none-any.whl
Upload date: Feb 2, 2026
Size: 262.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for avex-0.5.0a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`681251b3c095ec7939b05928a66589ec9aa6e7307530b1380825230db8adcb49`
MD5	`36b2c82a1c64e49f07be29f88d594c70`
BLAKE2b-256	`c9923606cbdbf2128392ba9368a6f4f61aa56549cf495844296a9d7ef723368e`

See more details on using hashes here.

avex 0.5.0a1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

avex - Animal Vocalization Encoder Library

Description

Installation

Prerequisites

Install with pip

Install with uv

Quick Start

Embedding Extraction

Transfer Learning with Probes

Documentation

Core Documentation

Usage Guides

Advanced Topics

Supported Models

Supported Probes

Citing

Contributing

License

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes