Skip to main content

A metric learning toolkit

Project description

BioEncoder

BioEncoder is a tool box for image classification and trait discovery in organismal biology. It relies on image classification models trained using metric learning to learn species trait data (i.e., features) from images. This implementation is based on SupCon and timm-vis.

Preprint on BioRxiv: https://doi.org/10.1101/2024.04.03.587987

Features

  • Taxon-agnostic dataloaders (making it applicable to any dataset - not just biological ones)
  • Support of timm models, and pytorch-optimizer
  • Access to state-of-the-art metric losses, such as Supcon and Sub-center ArcFace.
  • Exponential Moving Average for stable training, and Stochastic Moving Average for better generalization and performance.
  • LRFinder for the second stage of the training.
  • Easy customization of hyperparameters, including augmentations, through YAML configs (check the config-templates folder for examples)
  • Custom augmentations techniques via albumentations
  • TensorBoard logs and checkpoints (soon to come: WandB integration)
  • Streamlit app with rich model visualizations (e.g., Grad-CAM)
  • Interactive t-SNE and PCA plots using Bokeh

Quickstart

(for more detailed information consult the help files)

1. Install BioEncoder (into a virtual environment with pytorch/CUDA):

pip install bioencoder

2. Download example dataset from the data repo: https://zenodo.org/records/10909614/files/BioEncoder-data.zip. This archive contains the images and configuration files needed for step 3/4, as well as the final model checkpoints and a script to reproduce the results and figures presented in the paper. To play around with theinteractive figures and the model explorer you can also skip the training / SWA steps.

3. Start interactive session (e.g., in Spyder or VS code) and run the following commands one by one:

## use "overwrite=True to redo a step

import bioencoder

## global setup
bioencoder.configure(root_dir=r"~/bioencoder_wd", run_name="v1")

## split dataset
bioencoder.split_dataset(image_dir=r"~/Downloads/damselflies-aligned-trai_val", max_ratio=6, random_seed=42, val_percent=0.1, min_per_class=20)

## train stage 1
bioencoder.train(config_path=r"bioencoder_configs/train_stage1.yml")
bioencoder.swa(config_path=r"bioencoder_configs/swa_stage1.yml")

## explore embedding space and model from stage 1
bioencoder.interactive_plots(config_path=r"bioencoder_configs/plot_stage1.yml")
bioencoder.model_explorer(config_path=r"bioencoder_configs/explore_stage1.yml")

## (optional) learning rate finder for stage 2
bioencoder.lr_finder(config_path=r"bioencoder_configs/lr_finder.yml")

## train stage 2
bioencoder.train(config_path=r"bioencoder_configs/train_stage2.yml")
bioencoder.swa(config_path=r"bioencoder_configs/swa_stage2.yml")

## explore model from stage 2
bioencoder.model_explorer(config_path=r"bioencoder_configs/explore_stage2.yml")

## inference (stage 1 = embeddings, stage 2 = classification)
bioencoder.inference(config_path="bioencoder_configs/inference.yml", image="path/to/image.jpg")

4. Alternatively, you can directly use the command line interface:

## use the flag "--overwrite" to redo a step

bioencoder_configure --root-dir "~/bioencoder_wd" --run-name v1
bioencoder_split_dataset --image-dir "~/Downloads/damselflies-aligned-trai_val" --max-ratio 6 --random-seed 42
bioencoder_train --config-path "bioencoder_configs/train_stage1.yml"
bioencoder_swa --config-path "bioencoder_configs/swa_stage1.yml"
bioencoder_interactive_plots --config-path "bioencoder_configs/plot_stage1.yml"
bioencoder_model_explorer --config-path "bioencoder_configs/explore_stage1.yml"
bioencoder_lr_finder --config-path "bioencoder_configs/lr_finder.yml"
bioencoder_train --config-path "bioencoder_configs/train_stage2.yml"
bioencoder_swa --config-path "bioencoder_configs/swa_stage2.yml"
bioencoder_model_explorer --config-path "bioencoder_configs/explore_stage2.yml"
bioencoder_inference --config-path "bioencoder_configs/inference.yml" --image "path/to/image.jpg"

Citation

Please cite BioEncoder as follows:

@UNPUBLISHED{Luerig2024-ov,
  title    = "{BioEncoder}: a metric learning toolkit for comparative
              organismal biology",
  author   = "Luerig, Moritz D and Di Martino, Emanuela and Porto, Arthur",
  journal  = "bioRxiv",
  pages    = "2024.04.03.587987",
  month    =  apr,
  year     =  2024,
  language = "en",
  doi      = "10.1101/2024.04.03.587987"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioencoder-0.3.0.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

bioencoder-0.3.0-py3-none-any.whl (51.8 kB view details)

Uploaded Python 3

File details

Details for the file bioencoder-0.3.0.tar.gz.

File metadata

  • Download URL: bioencoder-0.3.0.tar.gz
  • Upload date:
  • Size: 41.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for bioencoder-0.3.0.tar.gz
Algorithm Hash digest
SHA256 34519f14933bafeeed184e5978847dbb20ee5f023728f653a64e1ef2da6f38f3
MD5 d9a1706e015ee1dae4178e0314162908
BLAKE2b-256 1fe1f3c1a8bdc02774852d18b388453dc9d16f4b39973885ce6ba8ac4bd0f844

See more details on using hashes here.

File details

Details for the file bioencoder-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: bioencoder-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 51.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for bioencoder-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 007aea63a26c0f50cdf1f0ed1569e9cf45239e806567bfc37cec880df7580bc2
MD5 4aec57ae97207ef8951ac51f1622797c
BLAKE2b-256 3decd86a29421955d8101825326f247afa3c32e24bc1e25662cec40aae8055c5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page