A metric learning toolkit
Project description
BioEncoder
BioEncoder is a toolkit for supervised metric learning to i) learn and extract features from images, ii) enhance biological image classification, and iii) identify the features most relevant to classification. Designed for diverse and complex datasets, the package and the available metric losses can handle unbalanced classes and subtle phenotypic differences more effectively than non-metric approaches. The package includes taxon-agnostic data loaders, custom augmentation techniques, hyperparameter tuning through YAML configuration files, and rich model visualizations, providing a comprehensive solution for high-throughput analysis of biological images.
Preprint on BioRxiv: https://doi.org/10.1101/2024.04.03.587987
Features
>> Full list of available model architectures, losses, optimizers, schedulers, and augmentations <<
- Taxon-agnostic dataloaders (making it applicable to any dataset - not just biological ones)
- Support of timm models, and pytorch-optimizer
- Access to state-of-the-art metric losses, such as Supcon and Sub-center ArcFace.
- Exponential Moving Average for stable training, and Stochastic Moving Average for better generalization and performance.
- LRFinder for the second stage of the training.
- Easy customization of hyperparameters, including augmentations, through
YAML
configs (check the config-templates folder for examples) - Custom augmentations techniques via albumentations
- TensorBoard logs and checkpoints (soon to come: WandB integration)
- Streamlit app with rich model visualizations (e.g., Grad-CAM and timm-vis)
- Interactive t-SNE and PCA plots using Bokeh
Quickstart
>> Comprehensive help files <<
1. Install BioEncoder (into a virtual environment with pytorch/CUDA):
pip install bioencoder
2. Download example dataset from the data repo: https://zenodo.org/records/10909614/files/BioEncoder-data.zip. This archive contains the images and configuration files needed for step 3/4, as well as the final model checkpoints and a script to reproduce the results and figures presented in the paper. To play around with theinteractive figures and the model explorer you can also skip the training / SWA steps.
3. Start interactive session (e.g., in Spyder or VS code) and run the following commands one by one:
## use "overwrite=True to redo a step
import bioencoder
## global setup
bioencoder.configure(root_dir=r"~/bioencoder_wd", run_name="v1")
## split dataset
bioencoder.split_dataset(image_dir=r"~/Downloads/damselflies-aligned-trai_val", max_ratio=6, random_seed=42, val_percent=0.1, min_per_class=20)
## train stage 1
bioencoder.train(config_path=r"bioencoder_configs/train_stage1.yml")
bioencoder.swa(config_path=r"bioencoder_configs/swa_stage1.yml")
## explore embedding space and model from stage 1
bioencoder.interactive_plots(config_path=r"bioencoder_configs/plot_stage1.yml")
bioencoder.model_explorer(config_path=r"bioencoder_configs/explore_stage1.yml")
## (optional) learning rate finder for stage 2
bioencoder.lr_finder(config_path=r"bioencoder_configs/lr_finder.yml")
## train stage 2
bioencoder.train(config_path=r"bioencoder_configs/train_stage2.yml")
bioencoder.swa(config_path=r"bioencoder_configs/swa_stage2.yml")
## explore model from stage 2
bioencoder.model_explorer(config_path=r"bioencoder_configs/explore_stage2.yml")
## inference (stage 1 = embeddings, stage 2 = classification)
bioencoder.inference(config_path="bioencoder_configs/inference.yml", image="path/to/image.jpg" / np.array)
4. Alternatively, you can directly use the command line interface:
## use the flag "--overwrite" to redo a step
bioencoder_configure --root-dir "~/bioencoder_wd" --run-name v1
bioencoder_split_dataset --image-dir "~/Downloads/damselflies-aligned-trai_val" --max-ratio 6 --random-seed 42
bioencoder_train --config-path "bioencoder_configs/train_stage1.yml"
bioencoder_swa --config-path "bioencoder_configs/swa_stage1.yml"
bioencoder_interactive_plots --config-path "bioencoder_configs/plot_stage1.yml"
bioencoder_model_explorer --config-path "bioencoder_configs/explore_stage1.yml"
bioencoder_lr_finder --config-path "bioencoder_configs/lr_finder.yml"
bioencoder_train --config-path "bioencoder_configs/train_stage2.yml"
bioencoder_swa --config-path "bioencoder_configs/swa_stage2.yml"
bioencoder_model_explorer --config-path "bioencoder_configs/explore_stage2.yml"
bioencoder_inference --config-path "bioencoder_configs/inference.yml" --path "path/to/image.jpg"
Citation
Please cite BioEncoder as follows:
@UNPUBLISHED{Luerig2024-ov,
title = "{BioEncoder}: a metric learning toolkit for comparative
organismal biology",
author = "Luerig, Moritz D and Di Martino, Emanuela and Porto, Arthur",
journal = "bioRxiv",
pages = "2024.04.03.587987",
month = apr,
year = 2024,
language = "en",
doi = "10.1101/2024.04.03.587987"
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bioencoder-1.0.0.tar.gz
.
File metadata
- Download URL: bioencoder-1.0.0.tar.gz
- Upload date:
- Size: 42.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbe1206e468e985381fe225d756b1a9e0d5fb6931a4452dfc672f3464d5df088 |
|
MD5 | 95cc9c7f6487a83af6005660af483c4c |
|
BLAKE2b-256 | 10536494726441521e2c8d8041fc37161ad32f3467f3f138d7a56d1c5c9ecccb |
File details
Details for the file bioencoder-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: bioencoder-1.0.0-py3-none-any.whl
- Upload date:
- Size: 52.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70d069c301b51688afa52ec81ea4bfed289bca4dc415bfceb7e633367e5cf338 |
|
MD5 | b57ffd6d3be9a4eb31f9b96a2daae9ff |
|
BLAKE2b-256 | 293cadcafb55acb7546c22faf66a11d526a27e5546febbffd211630eae239ddc |