Skip to main content

Tooling for agile modeling on large machine perception embedding databases.

Project description

Perch Hoplite

CI

Hoplite is a system for storing large volumes of embeddings from machine perception models. We focus on combining vector search with active learning workflows, aka agile modeling.

In brief, agile modeling is a process for rapidly developing classifiers using embeddings from a pre-trained 'foundation' model. For bioacoustics work, we find that new classifiers can often be developed for new signals in under an hour.

How does it work?

We first use a bioacoustics model to convert your unlabeled audio data into embeddings - these are like semantic 'fingerprints' of 5-second audio clips. Then, you can search the embeddings of your data by providing an example of what you're looking for. You then give feedback on the results - which examples are and are not what you're looking for. From this feedback, we can quickly train a classifier. You can then improve on the classifier with active learning: Examine the classifier outputs, provide more feedback, and re-train the classifier.

A key feature of this workflow is that we pre-compute the embeddings. This may take a while if you have a large amount of data, but the subsequent search and classifier training is very efficient.

To get started, load up the following Colab/Jupyter notebooks:

  • agile/1_embed_audio_v2.ipynb - Computes embeddings of your audio data.
  • agile/2_agile_modeling_v2.ipynb - Perform search, classification, and active learning.

Repository Contents

This repository consists of four sub-libraries:

  • db - The core database functionality for storing embeddings and related metadata. The database also handles labels applied to embeddings and vector search, both exact and approximate.
  • agile - Tooling (and example notebooks) for agile modeling on top of the Hoplite db layer, combining search and active learning approaches. This library includes organizing labeled data and training linear classifiers over embeddings, as well as tooling for embedding large datasets.
  • zoo - A bioacoustics model zoo. A basic wrapper class is provided, and any model which can transform windows of audio samples into embeddings can then be used in the agile modeling workflow.
  • taxonomy - A database of taxonomic information, especially for handling conversions between the various bird taxonomies.

Each sub-library has its own documentation.

Installation

The repository can be installed with either pip or poetry. Poetry allows more granular management of dependencies.

First, install some basic dependencies. Note that for GPU support, you may install tensorflow[and-cuda] instead of tensorflow-cpu.

sudo apt-get update
sudo apt-get install libsndfile1 ffmpeg
pip install absl-py
pip install requests
# You may skip tensorflow installation if only using the hoplite/db library.
# However, these are required for agile modeling and most models in the zoo.
pip install tensorflow-cpu
pip install tensorflow-hub

Then to install with pip:

pip install git+https://github.com/google-research/perch-hoplite.git

Then run the tests and check that they pass:

python -m unittest discover -s perch_hoplite/db/tests -p "*test.py"
python -m unittest discover -s perch_hoplite/taxonomy -p "*test.py"
python -m unittest discover -s perch_hoplite/zoo -p "*test.py"
python -m unittest discover -s perch_hoplite/agile/tests -p "*test.py"

Or, install with poetry:

# Install Poetry for package management
curl -sSL https://install.python-poetry.org | python3 -

# Install all dependencies specified in the poetry configs.
poetry install

Notes on Dependencies

Machine learning framework libraries are pretty heavy! It can also be difficult to coordinate CUDA versions across multiple frameworks to ensure good GPU behavior. Thus, we provide some ability to select dependencies according to your needs.

Tensorflow is used in the agile library for training linear classifiers. If you do not need the agile library or any of the tensorflow models in the zoo, you may skip installation of tensorflow dependencies with pip. Alternatively, you can use poetry to install without tensorflow like so:

poetry install --without tf

The primary place where multiple frameworks may be needed is in the zoo library, which provides wrappers for various bioacoustic models. To install with JAX (allowing use of some models in the zoo):

poetry install --with jax

Disclaimer

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perch_hoplite-0.1.3.tar.gz (3.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

perch_hoplite-0.1.3-py3-none-any.whl (3.8 MB view details)

Uploaded Python 3

File details

Details for the file perch_hoplite-0.1.3.tar.gz.

File metadata

  • Download URL: perch_hoplite-0.1.3.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for perch_hoplite-0.1.3.tar.gz
Algorithm Hash digest
SHA256 034ebb6ba6dd7533d59c8073c68916ee1adc9ea9e6d51c0445ac44c42f2d0067
MD5 156fefd521e4542be848df40dd110a8b
BLAKE2b-256 80cdcdc57c8d606e8373783dfb099222c55801489ea780c5318e1043d62b9c8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for perch_hoplite-0.1.3.tar.gz:

Publisher: publish.yml on google-research/perch-hoplite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file perch_hoplite-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: perch_hoplite-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 3.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for perch_hoplite-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2756c3ca16016b23c3eec014f00a519211ab4d8a49bd5a2cdc5b97575fbb1632
MD5 0d88e0a241a2662d76d22d13f8f5f886
BLAKE2b-256 a39a2a7b17af13589e4cc2541d8b0bbb944bc3b61b0abbb3e3042b22f56267ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for perch_hoplite-0.1.3-py3-none-any.whl:

Publisher: publish.yml on google-research/perch-hoplite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page