Skip to main content

A metric learning toolkit

Project description

BioEncoder

BioEncoder is a tool box for image classification and trait discovery in organismal biology. It relies on image classification models trained using metric learning to learn species trait data (i.e., features) from images. This implementation is based on SupCon and timm-vis.

Features

  • Taxon-agnostic dataloaders (making it applicable to any dataset - not just biological ones)
  • Support of timm models, and pytorch-optimizer
  • Access to state-of-the-art metric losses, such as Supcon and Sub-center ArcFace.
  • Exponential Moving Average for stable training, and Stochastic Moving Average for better generalization and performance.
  • LRFinder for the second stage of the training.
  • Easy customization of hyperparameters, including augmentations, through YAML configs
  • Custom augmentations techniques via albumentations
  • TensorBoard logs and checkpoints (soon to come: WandB integration)
  • Streamlit app with rich model visualizations (e.g., Grad-CAM)
  • Interactive t-SNE and PCA plots using Bokeh

Quickstart

(for more detailed information consult the help files)

1. Install BioEncoder (into a virtual environment with pytorch/CUDA):

pip install bioencoder

2. Download example dataset (includes images and configs): https://osf.io/download/gsd5z/

3. Start interactive session (e.g., in Spyder or VS code) and run:

import bioencoder

## global setup
bioencoder.configure(root_dir=r"bioencoder_wd", run_name="v1")

## split dataset
bioencoder.split_dataset(image_dir=r"~/Downloads/damselflies-aligned-trai_val", max_ratio=6, random_seed=42)

## train stage 1
bioencoder.train(config_path=r"bioencoder_configs/train_stage1.yml")
bioencoder.swa(config_path=r"bioencoder_configs/swa_stage1.yml")

## explore embedding space and model from stage 1
bioencoder.interactive_plots(config_path=r"bioencoder_configs/plot_stage1.yml")
bioencoder.model_explorer(config_path=r"bioencoder_configs/explore_stage1.yml")

## (optional) learning rate finder for stage 2
bioencoder.lr_finder(config_path=r"bioencoder_configs/lr_finder.yml")

## train stage 2
bioencoder.train(config_path=r"bioencoder_configs/train_stage2.yml", overwrite=True)
bioencoder.swa(config_path=r"bioencoder_configs/swa_stage2.yml")

## explore model from stage 2
bioencoder.model_explorer(config_path=r"bioencoder_configs/explore_stage2.yml")

Citation

Please cite BioEncoder as follows:

@UNPUBLISHED{
    Lurig2024-pb,
    title     = "BioEncoder: a metric learning toolkit for comparative organismal biology",
    author    = "L{\"u}rig, Moritz D and Di Martino, Emanuela and Porto, Arthur", 
    journal  = "bioRxiv",
    language  = "en",
    doi       = "xxxx"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioencoder-0.1.1.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

bioencoder-0.1.1-py3-none-any.whl (42.9 kB view details)

Uploaded Python 3

File details

Details for the file bioencoder-0.1.1.tar.gz.

File metadata

  • Download URL: bioencoder-0.1.1.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for bioencoder-0.1.1.tar.gz
Algorithm Hash digest
SHA256 49eacf32c9cd8a7c55e9686e23a601061d4eaf699c4846067aa946e896339b25
MD5 d3457db56558b95ecba8acabf60ada5f
BLAKE2b-256 667d31a0349547e7bf7a0be27706331da01a655a739453db44d285e6df7aa184

See more details on using hashes here.

File details

Details for the file bioencoder-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: bioencoder-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 42.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for bioencoder-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 591372ce6ce607805133c50993933512be3831dd7de178ef6054b53f21d55f23
MD5 fec82b86109296331f08441ec6daf8cd
BLAKE2b-256 e00188d3317eaeb81a7603afabe9ed234d9428da26e26d9d5ceb82cd6a749f3b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page