Skip to main content

A tool for imageomics

Project description

BioEncoder: A toolkit for imageomics

About

BioEncoder is a rich toolset for image classification and trait discovery in organismal biology. It relies on image classification models trained using metric learning to learn species trait data (i.e., features) from images. This implementation is based on SupCon and timm-vis. It includes the following features:

  • Taxon-agnostic dataloaders (making it applicable to any biological dataset)
  • Streamlit app with rich model visualizations (e.g., Grad-CAM)
  • Custom augmentations techniques via albumentations
  • Easy customization of hyperparameters, including augmentations, through YAML configs
  • Interactive t-SNE and PCA plots using Bokeh
  • Exponential Moving Average for stable training, and Stochastic Moving Average for better generalization and performance.
  • Automatic data parallelization for multi-gpu training and automatic mixed precision for larger batch sizes (support varies across graphics cards)
  • Access to state-of-the-art metric losses, such as Supcon and Sub-center ArcFace.
  • LRFinder for the second stage of the training (FC).
  • TensorBoard logs and checkpoints (soon, Weights-and-Biases integration)
  • Support of timm models, and pytorch-optimizer

Install

1. Create a clean virtual environment

mamba create -n bioencoder python=3.9
mamba activate bioencoder

2. Install pytorch with CUDA. Go to https://pytorch.org/get-started/locally/ and choose your version - e.g.:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

3. Install bioencoder from pypi:

pip install bioencoder

Get started (CLI mode)

(for detailed information consult the help files)

1. Download the example image dataset and the yaml configuration and unzip the files

2. Activate your environment

mamba activate bioencoder

3. Run bioencoder_configure to set the bioencoder root dir and the run name - for example:

bioencoder_configure --root-dir bioencoder --run-name damselflies-example

This will create a root folder inside your project, where all relevant bioencoder data, logs, etc. will be stored - it will look like this

project-dir/
    bioencoder-root-dir/
        data
            <run-name>
                train
                    class_1/
                        image_1.jpg
                        image_2.jpg
                        ...
                    class_2/
                        image_1.jpg
                        image_2.jpg
                        ...
                    ...
                val
                    ...
        logs
            <run-name>
                <run-name>.log
        plots
            <run-name>.html
        runs
            <run-name>
                <run-name>_first
                    events.out.tfevents.1700919284.machine-name.15832.0
                <run-name>_second
                    events.out.tfevents.1700919284.machine-name.15832.1
        weights
            <run-name>
                first
                    epoch0
                    epoch1
                    ...
                    swa
                second
                    epoch0
                    epoch1
                    ...
                    swa
    ...

5. Now run bioencoder_split_dataset to create the data folder containing training and validation images

bioencoder_split_dataset --image-dir data_raw\damselflies_aligned_resized

6. Use train_stage1.yml to train the the first stage of the model:

bioencoder_train --config-path damselflies_config_files\train_stage1.yml"

Continue as follows:

bioencoder_swa --config-path damselflies_config_files\swa_stage1.yml
bioencoder_train --config-path damselflies_config_files\train_stage2.yml
bioencoder_swa --config-path damselflies_config_files\swa_stage2.yml

Inspect the training runs with

tensorboard --logdir bioencoder\runs\damselflies-example

7. Create interactive plots:

bioencoder_interactive_plots --config-path damselflies_config_files\plot_stage1.yml

8. Run the model explorer

bioencoder_model_explorer --config-path damselflies_config_files\explore_stage1.yml

Interactive mode

import os
import bioencoder

## set your project dir
os.chdir(r"D:\temp\bioencoder-test")

## set project dir and run name
bioencoder.configure(root_dir = r"bioencoder", run_name = "damselflies1")

## split dataset 
bioencoder.split_dataset(image_dir=r"data_raw\damselflies_aligned_resized")

## training / swa
bioencoder.train(config_path=r"damselflies_config_files\train_stage1.yml")
bioencoder.swa(config_path=r"damselflies_config_files\swa_stage1.yml")
bioencoder.train(config_path=r"damselflies_config_files\train_stage2.yml")
bioencoder.swa(config_path=r"damselflies_config_files\swa_stage2.yml")

## interactive plots
bioencoder.interactive_plots(config_path=r"damselflies_config_files\plot_stage1.yml")

## model explorer
bioencoder.model_explorer(config_path=r"damselflies_config_files\explore_stage1.yml")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioencoder-0.1.0.tar.gz (33.7 kB view hashes)

Uploaded Source

Built Distribution

bioencoder-0.1.0-py3-none-any.whl (42.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page