Tooling for agile modeling on large machine perception embedding databases.
Project description
Perch Hoplite
Hoplite is a system for storing large volumes of embeddings from machine perception models. We focus on combining vector search with active learning workflows, aka agile modeling.
In brief, agile modeling is a process for rapidly developing classifiers using embeddings from a pre-trained 'foundation' model. For bioacoustics work, we find that new classifiers can often be developed for new signals in under an hour.
How does it work?
We first use a bioacoustics model to convert your unlabeled audio data into embeddings - these are like semantic 'fingerprints' of 5-second audio clips. Then, you can search the embeddings of your data by providing an example of what you're looking for. You then give feedback on the results - which examples are and are not what you're looking for. From this feedback, we can quickly train a classifier. You can then improve on the classifier with active learning: Examine the classifier outputs, provide more feedback, and re-train the classifier.
A key feature of this workflow is that we pre-compute the embeddings. This may take a while if you have a large amount of data, but the subsequent search and classifier training is very efficient.
To get started, load up the following Colab/Jupyter notebooks:
agile/01_embed_audio.ipynb– Computes embeddings of your audio data.agile/02_agile_modeling.ipynb– Performs search, classification, and active learning.
Repository Contents
This repository consists of four sub-libraries:
db– The core database functionality for storing embeddings and related metadata. The database also handles labels applied to embeddings and vector search, both exact and approximate.agile– Tooling (and example notebooks) for agile modeling on top of the Hoplite db layer, combining search and active learning approaches. This library includes organizing labeled data and training linear classifiers over embeddings, as well as tooling for embedding large datasets.zoo– A bioacoustics model zoo. A basic wrapper class is provided, and any model which can transform windows of audio samples into embeddings can then be used in the agile modeling workflow.taxonomy– A database of taxonomic information, especially for handling conversions between the various bird taxonomies.
Each sub-library has its own documentation.
Installation
We recommend using uv or pip for installation. uv is a fast rust-based
pip-compatible package installer and resolver.
First, install system dependencies for audio processing:
sudo apt-get update
sudo apt-get install libsndfile1 ffmpeg
With uv
If you don't have uv, you can install it via pipx install uv or
pip install uv.
If you are developing locally, clone the repository and install in editable
mode:
git clone https://github.com/google-research/perch-hoplite.git
cd perch-hoplite
uv pip install -e .
With pip
You can install the latest stable release from PyPI:
pip install perch-hoplite
Or install the latest version from GitHub:
pip install git+https://github.com/google-research/perch-hoplite.git
After installation, you can run the tests to check that everything is working:
python -m unittest discover -s perch_hoplite/db/tests -p "*test.py"
python -m unittest discover -s perch_hoplite/taxonomy -p "*test.py"
python -m unittest discover -s perch_hoplite/zoo -p "*test.py"
python -m unittest discover -s perch_hoplite/agile/tests -p "*test.py"
Notes on Dependencies
Tensorflow is required for agile modeling (classifier training) and for using the Perch or BirdNET models, but is not installed by default. We recommend installing one of the Tensorflow options:
To install with Tensorflow (CPU version):
pip install 'perch-hoplite[tf]'
To install with Tensorflow with CUDA support (for GPU usage):
pip install 'perch-hoplite[tf-cuda]'
The zoo library contains wrappers for various bioacoustic models. Some of
these require JAX. To install with JAX dependencies:
uv pip install -e '.[jax]'
or with pip:
pip install 'perch-hoplite[jax]'
If installing with uv in editable mode, you can use
uv pip install -e '.[tf,jax]'.
Disclaimer
This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file perch_hoplite-1.0.0.tar.gz.
File metadata
- Download URL: perch_hoplite-1.0.0.tar.gz
- Upload date:
- Size: 3.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ac11e4df92549dbe7bc7eb91fb115f2ed2b363ca1c3938cb3a2a4aa7a87fb75
|
|
| MD5 |
4ec66492d001452428492c6136cbc239
|
|
| BLAKE2b-256 |
f05d090e128c4c72ac3182518bbf193b28264308f163fec187c4b3944cdda05a
|
Provenance
The following attestation bundles were made for perch_hoplite-1.0.0.tar.gz:
Publisher:
publish.yml on google-research/perch-hoplite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
perch_hoplite-1.0.0.tar.gz -
Subject digest:
0ac11e4df92549dbe7bc7eb91fb115f2ed2b363ca1c3938cb3a2a4aa7a87fb75 - Sigstore transparency entry: 910748640
- Sigstore integration time:
-
Permalink:
google-research/perch-hoplite@6cc4f2b7c7b490ae6130be77182c9bb479013aeb -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/google-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6cc4f2b7c7b490ae6130be77182c9bb479013aeb -
Trigger Event:
release
-
Statement type:
File details
Details for the file perch_hoplite-1.0.0-py3-none-any.whl.
File metadata
- Download URL: perch_hoplite-1.0.0-py3-none-any.whl
- Upload date:
- Size: 3.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0d53cf1e00f355101d24328b83cf9ecb47696bc885cd60a4c3be05213aedfba
|
|
| MD5 |
25fa49c7a9e16ba82ea25a41f4911d2a
|
|
| BLAKE2b-256 |
5d02d37e078440bdcf899ad4af0a9d134ee3fbbc8e8028585e87ec9c884894cb
|
Provenance
The following attestation bundles were made for perch_hoplite-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on google-research/perch-hoplite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
perch_hoplite-1.0.0-py3-none-any.whl -
Subject digest:
c0d53cf1e00f355101d24328b83cf9ecb47696bc885cd60a4c3be05213aedfba - Sigstore transparency entry: 910748650
- Sigstore integration time:
-
Permalink:
google-research/perch-hoplite@6cc4f2b7c7b490ae6130be77182c9bb479013aeb -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/google-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6cc4f2b7c7b490ae6130be77182c9bb479013aeb -
Trigger Event:
release
-
Statement type: