Use bacpipe to streamline the process of generating embeddings and analysing your PAM datasets.
Project description
Welcome to bacpipe (BioAcoustic Collection Pipeline)
bacpipe makes using deep learning models for bioacoustics easy! Using bacpipe you can generate embeddings, classification predictions and clusters. All you need is your audio data and to customize the settings.
And the best part is, it comes with a GUI for you to explore the results.
bacpipe also ties in nicely with acodet, allowing you to generate heatmaps of species activity from your datasets based on predictions of deep learning models. But keep in mind predictions are only as good as the model's you are using - they can seem confident but still be very wrong - bacpipe's aim is to make model evaluations easier and let us improve them.
bacpipe is also available on pip: pip install bacpipe
import bacpipe
# This will execute the whole pipeline
# if nothing is specified it will generate embeddings on
# a set of audio test data using the models birdnet and perch
bacpipe.play()
A more detailed description of the API can be found under API
📚 Table of Contents
- How it works
- API
- Dashboard visualization
- Installation
- Usage
- Available models
- Contribute
- Known issues
- Citation
- Newsletter and Q&A sessions
How it works
This repository aims to streamline the generation and evaluation of embeddings using a large variety of bioacoustic models.
The below image shows a comparison of umap embeddings based on 15 different bioacoustic models. The models are being evaluated on a bird and frog dataset (more details in this conference paper).
bacpipe requires a dataset of audio files, runs them through a series of models, and generates embeddings. These embeddings can then be used visualized and evaluated for various tasks such as clustering or classification.
By default the embeddings will be generated for the models specified in the config.yaml file.
Currently these bioacoustic models are supported (more details below):
available_models : [
"audiomae"
"audioprotopnet"
"avesecho_passt"
"aves_especies"
"beats"
"birdaves_especies"
"biolingual"
"birdnet"
"birdmae"
"hbdet"
"insect66"
"insect459"
"mix2"
"naturebeats"
"perch_bird"
"protoclr"
"rcl_fs_bsed"
"surfperch"
"google_whale"
"vggish"
]
Once the embeddings are generated, 2d reduced embeddings will be created using the dimensionality reduction model specified in the config.yaml file. And these dimensionality reduction models are supported:
available_reduction_models: [
"pca",
"sparse_pca",
"t_sne",
"umap"
]
Furthermore, the embeddings can be evaluated using different metrics. The evaluation is done using the evaluate.py script, which takes the generated embeddings and computes various metrics such as clustering performance and classification performance. The evaluation results are saved in the bacpipe/results directory.
available_evaluation_tasks: [
"classification",
"clustering"
]
The repository also includes a panel dashboard for visualizing the generated embeddings. To enable the dashboard, simply set the dashboard variable to True in the settings.yaml file. The dashboard will automatically open in your browser (at http://localhost:8050) after running the run_dashboard.py script.
The pipeline is designed to be modular, so you can easily add or remove models as needed. The models are organized into pipelines, which are defined in the bacpipe/embedding_generation_pipelines/feature_extractors directory. If you want to add a different dimensionality reduction model, you do so by adding new pipeline to the bacpipe/embedding_generation_pipelines/dimensionality_reduction directory.
Using annotations for evaluation
If you have annotations for your dataset, you can use them to evaluate the generated embeddings. The labels will be used to compute the clustering and classification performance of the embeddings.
To use the annotations for evaluation, create a file called annotations.csv in the directory specified in the audio_dir variable in the config.yaml file. The file should contain the following columns:
audiofilename,start,end,label
Where audiofilename is the name of the audio file, start and end are the start and end times of the annotation in seconds, and label is the label of the annotation.
For reference see the example annotations file.
If this file exists, the evaluation script will automatically use the annotations to compute the clustering and classification performance of the embeddings. The labels will also be used to color the points in the dashboard visualization showing the embeddings.
API
bacpipe can be used as a package and installed from pip
pip install bacpipe
import bacpipe
# This will execute the whole pipeline
# if nothing is specified it will generate embeddings on
# a set of audio test data using the models birdnet and perch
bacpipe.play()
# to modify configurations and settings, you can simply access them
# as attributes
# to see available settings and configs run the above commands
bacpipe.config
bacpipe.settings
# to modify the audio data path for example, do
bacpipe.config.audio_dir = '/path/to/your/audio/dir'
# to modify the models you want to run, do
bacpipe.config.models = ['birdnet', 'birdmae', 'naturebeats']
# keep in mind some models require checkpoints, to find out which ones, run
bacpipe.models_needing_checkpoint
# links to checkpoints are to be found in this readme file,
# location of the checkpoints is specified under
bacpipe.settings.model_base_path = '/path/to/model_checkpoints'
# On first execution the birdnet checkpoint is downloaded and the
# model_checkpoints folder is created from the current working directory
# If you just want to run models and get embeddings and don't want
# the dashboard and all of that, define an embedder object and pass it
# the model name, and the settings you modified
# To ensure birdnet is downloaded, run the following (this is done by default
# if you run bacpipe.play(), but without that, you need to call this explicitly)
bacpipe.ensure_std_models(bacpipe.settings.model_base_path)
em = bacpipe.Embedder('birdnet', **vars(bacpipe.settings))
# the vars part is important!
audio_file = '/path/to/all/the/audio/file'
embeddings = em.get_embeddings_from_model(audio_file)
# if the model has a built in classifier, like birdnet
# you can make sure the class predictions are also saved
# by setting
bacpipe.settings.run_pretrained_classifier = True
# the generating of embeddings above will then let you access
# the class predictions using
em.model.classifier_outputs
# If you want to produce embeddings for various models, bacpipe will always store
# them to keep your memory from overfilling. Still you can use the package to easily
# access the embeddings and all the metadata
loader = bacpipe.model_specific_embedding_creation(
**vars(bacpipe.config), **vars(bacpipe.settings)
)
# this call will initiate the embedding generation process, it will check if embeddings
# already exist for the combination of each model and the dataset and if so it will
# be ready to load them. The loader keys will be the model name and the values will
# be the loader objects for each model. Each object contains all the information
# on the generated embeddings. To name access them:
loader['birdnet'].embedding_dict()
# this will give you a dictionary with the keys corresponding to embedding files
# and the values corresponding to the embeddings as numpy arrays
loader['birdnet'].metadata_dict
# This will give you a dictionary overview of:
# - where the audio data came from,
# - where the embeddings were saved
# - all the audio files,
# - the embedding size of the model,
# - the audio file lengths,
# - the number of embeddings for each audio files
# - the sample rate
# - the number of samples per window
# - and the total length of the processed dataset in seconds
# Thic dictionary is also saved as a yaml file in the directory of the embeddings
Dashboard visualization
bacpipe includes a dashboard visualization by default allowing you to easily explore the generated embeddings
Once embeddings are generated, they can be easily visualized using a dashboard (built using panel) by simply setting the dashboard setting in the config.yaml file to True.
Below you can see a gif showing the basic usage of the dashboard.
The dashboard has 3 main sections:
- Single model
- Two models
- All models
In the single model section, you can select a model and visualize the embeddings generated by that model. The embeddings can be colored by :
- metadata extracted from the files (date and time information, and file and parent directory)
- the labels specified in the
annotations.csvfile - the cluster labels generated by the clustering algorithm (kmeans)
In the dashboard sidebar you can select the model, by which to label the embeddings, whether to remove noise, and the type of classification task to show the results for.
The noise removal is done by removing the embeddings that do not correspond to annotated sections of the audio files. This is useful if you want to focus on the annotated sections of the audio files and disregard the rest of the data.
The visualizations can be saved as png files by clicking the save button in the bottom right corner of the plot.
Try it out and (please) feel free to give feedback and ask questions (or suggestions for improvements) - or in case something does not work raise issues.
Installation
Install uv (recommended) or poetry
It is recommended to use python 3.11 for this repository, as some of the models require it.
For speed and stability it is recommended to use uv. To install uv use the following command (you can also do it all without uv then, just leave the uv part away, as all the commands are also pip commands):
pip install uv
(for windows use /c/Users/$USERNAME/AppData/Local/Programs/Python/Python311/python.exe -m pip install uv)
If you prefer to use poetry, you can install it using:
pipx install poetry
Create a virtual environment
python3.11 -m uv venv .env_bacpipe
(for windows use /c/Users/$USERNAME/AppData/Local/Programs/Python/Python311/python.exe -m uv venv .env_bacpipe)
(alternatively for poetry use poetry env use 3.11)
activate the environment
source .env_bacpipe/bin/activate (for windows use source .env_bacpipe\Scripts\activate)
Clone the repository
git clone https://github.com/bioacoustic-ai/bacpipe.git
cd into the bacpipe directory (cd bacpipe)
Install the dependencies once the prerequisites are satisfied.
uv pip install -r pyproject.toml
- this will automatically install requirements based on your os, so windows should also work fine. However, gpu support is not available on windows
For poetry:
poetry lock
poetry install
Alternatively:
uv sync
If for some reasons you would prefer requirements, use the these for windows:
uv pip install -r requirements_windows.txt
If you do not have admin rights and encounter a permission denied error when using pip install, use python -m pip install ... instead.
OPTIONAL: Add other model checkpoints that are not included by default.
Download the ones that are available from here and create directories corresponding to the pipeline-names and place the checkpoints within them.
Test the installation was successful
By doing so you will also ensure that the directory structure for the model checkpoints will be created.
pytest -v --disable-warnings bacpipe/tests/test_embedding_creation.py
The tests could take a while, so to run a small test, you can also pass the model you would like to test:
pytest -v --disable-warnings bacpipe/tests/test_embedding_creation.py --models=birdnet,perch
(keep in mind you have to have the checkpoints locally for the models that require it)
In case of a permission denied error, run
python -m pytest -v --disable-warnings bacpipe/tests/test_embedding_creation.py
If everything passes then you've successfully installed bacpipe and can now proceed to use it.
Usage
Configurations and settings
To see the capabilities of bacpipe, go ahead and run the run_pipeline.py script. This will run the pipeline with the default settings and configurations on a small set of test data.
To use bacpipe on your own data, you will need to modify the configuration files.
The only two files that need to be modified are the config.yaml and settings.yaml files. The config.yaml is used for the standard configurations:
- path to audio files
- models to run
- dimensionality reduction model
- evaluation tasks
- whether to run the dashboard or not
The settings.yaml file is used for more advanced configurations and does not need to be modified unless you have specific preferences. It includes settings such as to run on a cpu or a cuda (gpu) (by default cpu), the paths where results are saved, configurations for the evaluation tasks and more.
Modify the config.yaml file in the root directory to specify the path to your dataset. Define what models to run by specifying the strings in the models list (copy and paste as needed, I usually just comment the model's I don't want to run).
If you have already computed embeddings on the dataset specified in audio_data, and you want to do the dimensionality reduction and evaluation for the models you have already run, you can set the already_computed variable to True. This will only select the models that have already been computed.
In either case if you have already computed embeddings with a model, bacpipe will skip the model and use the already computed embeddings (if they are still located in the same directory). Even if overwrite is set to True, bacpipe will not overwrite the embeddings if they already exist. It will recompute clusterings and label generation.
Running the pipeline
Once the configuration is complete, execute the run_pipeline.py file (make sure the environment is activated)
python run_pipeline.py
While the scripts are executed, directories will be created corresponding to the main_results_dir setting. Embeddings will be saved in main_results_dir/YOUR_DATASET/embeddings (see here for more info) and if selected, reduced dimensionality embeddings will be saved in main_results_dir/evaluation/dim_reduced_embeddings (see here for more info).
Model selection
Select the models you want to run in the config.yaml file. The models are specified in this ReadMe and in the test_file. You can select the models you want to run by adding them to the models list in the config.yaml file.
Dimensionality reduction
Different dimensionality reduction models can be selected in the config.yaml file. The available models are specified in the section Dimensionality reduction models. Insert the name of the selected model in the dim_reduction_model variable in the config.yaml file. The default is umap, but you can also select pca, sparse_pca or t_sne.
Dashboard
The dashboard is a panel application that allows you to visualize the generated embeddings. To enable the dashboard, set the dashboard variable in the config.yaml file to True. The dashboard will automatically open in your browser (at http://localhost:8050) after running the run_dashboard.py script.
Evaluation
You can use bacpipe to evaluate the generated embeddings using different metrics. To evaluate the embeddings, you need annotations for your dataset. The annotations should be in a file called annotations.csv in the directory specified in the audio_dir variable in the config.yaml file or the results directory of your dataset main_results_dir/YOUR_DATASET. The file should contain the following columns:
audiofilename,start,end,label:species
Where audiofilename is the name of the audio file, start and end are the start and end times of the annotation in seconds, and label is the label of the annotation.
species is a placeholder here and can be replaced with any label description. So if you have labelled call types, change it to label:call_type. But it's important that there are no spaces and that it contains label:. By doing this you will be able to visualize your data based on all of these label columns.
The labels can then be used to perform clustering and classification evaluation. This can be done only in regard to one label, so specify the main label column in the label_column variable in settings.yaml. This defaults to species. Only labels that exceed the min_label_occurances value will be used. This is to make sure you have enough data to train linear classifiers and do meaningful evaluations. If you have enough labeled data, feel free to increase this.
See the file annotations.csv for an example of how the annotations file should look like.
Once the annotations file is created, add either classification or clustering (or both) to the evaluation_task variable in the config.yaml file (use double quotes: "classification" or "clustering"). You can run the evaluation script using normal python run_pipeline.py command. The evaluation script will automatically use the annotations to compute the clustering and classification performance of the embeddings. The results will be saved in the bacpipe/results/YOUR_DATASET/evaluation directory.
If you selected classification, a linear classifier will be trained and saved in the classification subdirectory of the evaluation folder. This .pt file can be used to generate class predictions with a model that wasn't originally trained on these classes. A tutorial will be available shortly explaining this in more detail. The .pt file can be used in the repository acodet to generate class predictions with the combination of a feature extractor and the trained linear classifier.
Models with classifiers
Models that already contain classification heads, are the following:
- AudioProtoPNet
- BirdNET
- Perch_bird
- SurfPerch
- google_whale
With all of these models, you only need to set run_pretrained_classifier to True and then the model will save the classification outputs in the classification/original_classifier_outputs folder. Only predictions exceeding the classifier_threshold value will be saved. A csv file in the shape of the annotations.csv file is also saved corresponding to the class predictions. The dashboard will also contain an extra label_by option default_classifier.
Available models
The models all have their model specific code to ensure inference runs smoothly. More info on the models and their pipelines can be found here.
Models currently include:
| Name | ref paper | ref code | sampling rate | input length | embedding dimension |
|---|---|---|---|---|---|
| AudioMAE | paper | code | 16 kHz | 10 s | 768 |
| AudioProtoPNet | paper | code | 32 kHz | 5 s | 1024 |
| AvesEcho_PASST | paper | code | 32 kHz | 3 s | 768 |
| AVES_ESpecies | paper | code | 16 kHz | 1 s | 768 |
| BEATs | paper | code | 16 kHz | 10 s | 768 |
| BioLingual | paper | code | 48 kHz | 10 s | 512 |
| BirdAVES_ESpecies | paper | code | 16 kHz | 1 s | 1024 |
| BirdMAE | paper | code | 32 kHz | 10 s | 1280 |
| BirdNET | paper | code | 48 kHz | 3 s | 1024 |
| Google_Whale | paper | code | 24 kHz | 5 s | 1280 |
| HumpbackNET | paper | code | 2 kHz | 3.9124 s | 2048 |
| Insect66NET | paper | code | 44.1 kHz | 5.5 s | 1280 |
| Insect459NET | paper | pending | 44.1 kHz | 5.5 s | 1280 |
| Mix2 | paper | code | 16 kHz | 3 s | 960 |
| NatureBEATs | paper | code | 16 kHz | 10 s | 768 |
| Perch_Bird | paper | code | 32 kHz | 5 s | 1280 |
| ProtoCLR | paper | code | 16 kHz | 6 s | 384 |
| RCL_FS_BSED | paper | code | 22.05 kHz | 0.2 s | 2048 |
| SurfPerch | paper | code | 32 kHz | 5 s | 1280 |
| VGGish | paper | code | 16 kHz | 0.96 s | 128 |
Click to see more details on the models
| Name | paper | code | training | CNN/Trafo | architecture | checkpoint link |
|---|---|---|---|---|---|---|
| AudioMAE | paper | code | ssl + ft | trafo | ViT | weights |
| AudioProtoPNet | paper | code | sup l | CNN | ConvNext | included |
| AvesEcho_PaSST | paper | code | sup l | trafo | PaSST | weights |
| AVES_ESpecies | paper | code | ssl | trafo | HuBERT | weights |
| BEATs | paper | code | ssl | trafo | ViT | weights |
| BioLingual | paper | code | ssl | trafo | CLAP | included |
| BirdAVES_ESpecies | paper | code | ssl | trafo | HuBERT | weights |
| BirdMAE | paper | code | ssl | trafo | ViT | included |
| BirdNET | paper | code | sup l | CNN | EffNetB0 | weights |
| Google_Whale | paper | code | sup l | CNN | EffNetb0 | included |
| HumpbackNET | paper | code | sup l | CNN | ResNet50 | weights |
| Insect66NET | paper | code | sup l | CNN | EffNetv2s | weights |
| Insect459NET | paper | pending | sup l | CNN | EffNetv2s | pending |
| Mix2 | paper | code | sup l | CNN | MobNetv3 | release pending |
| NatureBEATs | paper | code | ssl | trafo | BEATs | weights |
| Perch_Bird | paper | code | sup l | CNN | EffNetb0 | included |
| ProtoCLR | paper | code | sup cl | trafo | CvT-13 | weights |
| RCL_FS_BSED | paper | code | sup cl | CNN | ResNet9 | weights |
| SurfPerch | paper | code | sup l | CNN | EffNetb0 | included |
| VGGish | paper | code | sup l | CNN | VGG | weights |
Brief description of models
All information is extracted from the respective repositories and manuscripts. Please refer to them for more details
AudioMAE
- spectrogram input
- self-supervised pretrained model, fine-tuned
- vision transformer
- trained on general audio
AudioMAE from the facebook research group is a vision transformer pretrained on AudioSet-2M data and fine-tuned on AudioSet-20K.
AudioProtoPNet
- spectrogram input
- supervised learning, trained using asymmetric loss
- ConvNext architecture
- trained on the xeno-canto large section of BirdSet
This CNN is trained in two phases. The main contribution of this model is its interpretability. It learned prototypes during its second training phase which can be used during inference time to visualize sections of the spectrogram that were most important for classification. It also reaches competitive performance on bird classification tasks. The (include) classifier can distinguish 9736 classes.
AvesEcho_PaSST
- transformer
- supervised pretrained model, fine-tuned
- pretrained on general audio and bird song data
AvesEcho_PaSST is a vision transformer trained on AudioSet and (deep) fine-tuned on xeno-canto. The model is based on the PaSST framework.
AVES_ESpecies
- transformer
- self-supervised pretrained model
- trained on general audio
AVES_ESpecies is short for Animal Vocalization Encoder based on Self-Supervision by the Earth Species Project. The model is based on the HuBERT-base architecture. The model is pretrained on unannotated audio datasets AudioSet-20K, FSD50K and the animal sounds from AudioSet and VGGSound.
BEATs
- trafo
- self-supervised learning
- trained on AudioSet
BEATs is microsofts SotA audio model based on audio pre-training with acoustic tokenizers. The model reaches competitive results with many bioacosutic models in benchmarks for linear and attentive probing, and is therefore also included in bacpipe as a general audio baseline model.
BioLingual
- transformer
- spectrogram input
- contrastive-learning
- self-supervised pretrained model
- trained on animal sound data (primarily bird song)
BioLingual is a language-audio model trained on captioning bioacoustic datasets inlcuding xeno-canto and iNaturalist. The model architecture is based on the CLAP model architecture.
BirdAVES_ESpecies
- transformer
- self-supervised pretrained model
- trained on general audio and bird song data
BirdAVES_ESpecies is short for Bird Animal Vocalization Encoder based on Self-Supervision by the Earth Species Project. The model is based on the HuBERT-large architecture. The model is pretrained on unannotated audio datasets AudioSet-20K, FSD50K and the animal sounds from AudioSet and VGGSound as well as bird vocalizations from xeno-canto.
BirdMAE
- trafo (ViT)
- self-supervised model
- trained on XCM
BirdMAE is a masked autoencoder inspired by meta's AudioMAE, however the model was heavily adapted for the bioacoustic domain. The model was trained on the xeno-canto M dataset (1.7 million samples) from BirdSet and evaluated on various soundscape datasets, where it outperformed all competing models (including SotA bioacoustic models).
BirdNET
- CNN
- supervised training model
- trained on bird song data
BirdNET (v2.4) is based on a EfficientNET(b0) architecture. The model is trained on a large amount of bird vocalizations from the xeno-canto database alongside other bird song databses.
Google_Whale
- CNN
- supervised training model
- trained on 7 whale species
Google_Whale (multispecies_whale) is a EFficientNet B0 model trained on whale vocalizations and other marine sounds.
HumpbackNET
- CNN
- supervised training model
- trained on humpback whale song
HumpbackNET is a binary classifier based on a ResNet-50 model trained on humpback whale data from different parts in the North Atlantic.
Insect66NET
- CNN
- supervised training model
- trained on insect sounds
InsectNET66 is a EfficientNet v2 s model trained on the Insect66 dataset including sounds of grasshoppers, crickets, cicadas developed by the winning team of the Capgemini Global Data Science Challenge 2023.
Insect459NET
- CNN
- supervised training model
- trained on insect sounds
InsectNET459 is a EfficientNet v2 s model trained on the Insect459 dataset (publication pending).
Mix2
- CNN
- supervised training model
- trained on frog sounds
Mix2 is a MobileNet v3 model trained on the AnuranSet which includes sounds of 42 different species of frogs from different regions in Brazil. The model was trained using a mixture of Mixup augmentations to handle the class imbalance of the data.
NatureBEATs
- trafo
- self-supervised training model
- trained on diverse set of bioacoustics, general sound, music, human speech
NatureLM-Audio is a very ambitious foundational model specifically for bioacoustics. It uses Microsoft's BEATs backbone as an audio encoder along with Meta's Llama-3.1-8B large language model capabilities. In the implementation used here in bacpipe, only the support for BEATs audio-encoder with NatureLM-Audio's weights, referred to here as NatureBEATs, is provided.
RCL_FS_BSED
- CNN
- supervised contrastive learning
- trained on dcase 2023 task 5 dataset link
RCL_FS_BSED stands for Regularized Contrastive Learning for Few-shot Bioacoustic Sound Event Detection and features a model based on a ResNet model. The model was originally created for the DCASE bioacoustic few shot challenge (task 5) and later improved.
ProtoCLR
- transformer
- supervised contrastive learning
- trained on bird song data
ProtoCLR stands for Prototypical Contrastive Learning for robust representation learning. The architecture is a CvT-13 (Convolutional vision transformer) with 20M parameters. ProtoCLR has been validated on transfer learning tasks for bird sound classification, showing strong domain-invariance in few-shot scenarios. The model was trained on the xeno-canto dataset.
Perch_Bird
- CNN
- supervised training model
- trained on bird song data
Perch_Bird is a EFficientNet B1 model trained on the entire Xeno-canto database.
SurfPerch
- CNN
- supervised training model
- trained on bird song, fine-tuned on tropical reef data
Perch is a EFficientNet B1 model trained on the entire Xeno-canto database and fine tuned on coral reef and unrelated sounds.
VGGISH
- CNN
- supervised training model
- trained on general audio
VGGish is a model based on the VGG architecture. The model is trained on audio from youtube videos (YouTube-8M)
Dimensionality reduction models
To evaluate the generated embeddings a number of dimensionality reduction models are included in this repository:
| name | reference | code reference | linear |
|---|---|---|---|
| UMAP | paper | code | No |
| t-SNE | paper | code | No |
| PCA | paper | code | Yes |
| Sparse_PCA | paper | code | Yes |
Add a new model
To add a new model, simply add a pipeline with the name of your model. Make sure your model follows the following criteria:
- define the model specific sampling rate
- define the model specific input segment length
- define a class called "Model" which inherits the ModelBaseClass from bacpipe.utils
- define the init, preproc, and call methods so that the model can be called
- if necessary save the checkpoint in the bacpipe.model_checkpoints dir with the name corresponding to the name of the model
- if you need to import code where your specific model class is defined, create a directory in bacpipe.model_specific_utils corresponding to your model name "newmodel" and add all the necessary code in there
Here is an example:
import torch
from bacpipe.model_specific_utils.newmodel.module import MyClass
SAMPLE_RATE = 12345
LENGTH_IN_SAMPLES = int(10 * SAMPLE_RATE)
from ..utils import ModelBaseClass
class Model(ModelBaseClass):
def __init__(self, **kwargs):
super().__init__(sr=SAMPLE_RATE, segment_length=LENGTH_IN_SAMPLES, **kwargs)
self.model = MyClass()
state_dict = torch.load(
self.model_base_path + "/newmodel/checkpoint_path.pth",
weights_only=True,
)
self.model.load_state_dict(state_dict)
def preprocess(self, audio): # audio is a torch.tensor object
# insert your preprocessing steps
return processed_audio
def __call__(self, x):
# by default the model will be called with .eval() mode
return self.model(x)
Most of the models are based on pytorch. For tensorflow models, see birdnet, hbdet or vggish.
Contribute
This repository is intended to be a collaborative project for people working in the field of bioacoustics. If you think there is some improvement that could be useful, please raise an issues, submit a PR or get in touch.
There are two main intentions for this repository that should always be considered when contributing:
1. Only add new requirements if truly necessary
Given the large number of different models, there are already a lot of requirements. To ensure that the repository is stable, and installation errors are kept minimal, please only add code with new requirements if truly necessary.
2. The main purpose of bacpipe is quickly generating embeddings from models
There should always be a baseline minimal use case, where embeddings are created from different feature extractors and everything else is an add-on.
Known issues
Given that this repository compiles a large number of very different deep learning models with different requirements, some issues have been noted.
Please raise issues if there are questions or bugs.
Previous versions of bacpipe included models like animal2vec, but the requirements conflicts led me to remove them. In the future I hope there will be an updated version of those models and then they will be included again.
Citation
A lot of work has gone into creating these bioacoustic models, both by data collectors and by machine learning practitioners, please cite the authors of the respective models (all models are referenced in the table above).
This work is first described in a conference paper. If you use bacpipe for your research, please include the following reference:
@misc{kather2025clusteringnovelclassrecognition,
title={Clustering and novel class recognition: evaluating bioacoustic deep learning feature extractors},
author={Vincent S. Kather and Burooj Ghani and Dan Stowell},
year={2025},
eprint={2504.06710},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2504.06710},
}
Newsletter and Q&A sessions
Reading from the traffic on the repository, there seems to be an interest in bacpipe. I have set up a newsletter under this link: https://buttondown.com/vskode. Once more than 30 people have signed up for the newsletter, I will schedule a Q&A session and post the link in the newsletter. Hopefully I can then help answer questions and address issues that people are running into.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bacpipe-1.1.1.tar.gz.
File metadata
- Download URL: bacpipe-1.1.1.tar.gz
- Upload date:
- Size: 19.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
289e697afaab3d83cbc770982023d46570067fed785af67ec42ec5f0c59347c1
|
|
| MD5 |
eb96a22c2ececa34e8062396c437d8ba
|
|
| BLAKE2b-256 |
b3cffe3bbddd2ee511822a00187cbaadb3c9fd76a5989a37c90461ba3526eea1
|
File details
Details for the file bacpipe-1.1.1-py3-none-any.whl.
File metadata
- Download URL: bacpipe-1.1.1-py3-none-any.whl
- Upload date:
- Size: 14.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2fc988b5e54d3bb57456e398af0e99ce0c6936472595e40e47901ed2bf145274
|
|
| MD5 |
13b631eb79d6777030272547b4c934dd
|
|
| BLAKE2b-256 |
be557e098ab882cd46a4b9c4a2f52b8b4660b8c01c550c0109d77f3333c68994
|