Skip to main content

A tool for imageomics

Project description

BioEncoder: A toolkit for imageomics

About

BioEncoder is a rich toolset for image classification and trait discovery in organismal biology. It relies on image classification models trained using metric learning to learn species trait data (i.e., features) from images. This implementation is based on SupCon and timm-vis. It includes the following features:

  • Taxon-agnostic dataloaders (making it applicable to any biological dataset)
  • Streamlit app with rich model visualizations (e.g., Grad-CAM)
  • Custom augmentations techniques via albumentations
  • Easy customization of hyperparameters, including augmentations, through YAML configs
  • Interactive t-SNE and PCA plots using Bokeh
  • Exponential Moving Average for stable training, and Stochastic Moving Average for better generalization and performance.
  • Automatic data parallelization for multi-gpu training and automatic mixed precision for larger batch sizes (support varies across graphics cards)
  • Access to state-of-the-art metric losses, such as Supcon and Sub-center ArcFace.
  • LRFinder for the second stage of the training (FC).
  • TensorBoard logs and checkpoints (soon, Weights-and-Biases integration)
  • Support of timm models, and pytorch-optimizer

Install

1. Create a clean virtual environment

mamba create -n bioencoder python=3.9
mamba activate bioencoder

2. Install pytorch with CUDA. Go to https://pytorch.org/get-started/locally/ and choose your version - e.g.:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

3. Install bioencoder from pypi:

pip install bioencoder

Get started (CLI mode)

(for detailed information consult the help files)

1. Download the example image dataset and the yaml configuration and unzip the files

2. Activate your environment

mamba activate bioencoder

3. Run bioencoder_configure to set the bioencoder root dir and the run name - for example:

bioencoder_configure --root-dir bioencoder --run-name damselflies-example

This will create a root folder inside your project, where all relevant bioencoder data, logs, etc. will be stored - it will look like this

project-dir/
    bioencoder-root-dir/
        data
            <run-name>
                train
                    class_1/
                        image_1.jpg
                        image_2.jpg
                        ...
                    class_2/
                        image_1.jpg
                        image_2.jpg
                        ...
                    ...
                val
                    ...
        logs
            <run-name>
                <run-name>.log
        plots
            <run-name>.html
        runs
            <run-name>
                <run-name>_first
                    events.out.tfevents.1700919284.machine-name.15832.0
                <run-name>_second
                    events.out.tfevents.1700919284.machine-name.15832.1
        weights
            <run-name>
                first
                    epoch0
                    epoch1
                    ...
                    swa
                second
                    epoch0
                    epoch1
                    ...
                    swa
    ...

5. Now run bioencoder_split_dataset to create the data folder containing training and validation images

bioencoder_split_dataset --image-dir data_raw\damselflies_aligned_resized

6. Use train_stage1.yml to train the the first stage of the model:

bioencoder_train --config-path damselflies_config_files\train_stage1.yml"

Continue as follows:

bioencoder_swa --config-path damselflies_config_files\swa_stage1.yml
bioencoder_train --config-path damselflies_config_files\train_stage2.yml
bioencoder_swa --config-path damselflies_config_files\swa_stage2.yml

Inspect the training runs with

tensorboard --logdir bioencoder\runs\damselflies-example

7. Create interactive plots:

bioencoder_interactive_plots --config-path damselflies_config_files\plot_stage1.yml

8. Run the model explorer

bioencoder_model_explorer --config-path damselflies_config_files\explore_stage1.yml

Interactive mode

import os
import bioencoder

## set your project dir
os.chdir(r"D:\temp\bioencoder-test")

## set project dir and run name
bioencoder.configure(root_dir = r"bioencoder", run_name = "damselflies1")

## split dataset 
bioencoder.split_dataset(image_dir=r"data_raw\damselflies_aligned_resized")

## training / swa
bioencoder.train(config_path=r"damselflies_config_files\train_stage1.yml")
bioencoder.swa(config_path=r"damselflies_config_files\swa_stage1.yml")
bioencoder.train(config_path=r"damselflies_config_files\train_stage2.yml")
bioencoder.swa(config_path=r"damselflies_config_files\swa_stage2.yml")

## interactive plots
bioencoder.interactive_plots(config_path=r"damselflies_config_files\plot_stage1.yml")

## model explorer
bioencoder.model_explorer(config_path=r"damselflies_config_files\explore_stage1.yml")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioencoder-0.1.0.tar.gz (33.7 kB view details)

Uploaded Source

Built Distribution

bioencoder-0.1.0-py3-none-any.whl (42.4 kB view details)

Uploaded Python 3

File details

Details for the file bioencoder-0.1.0.tar.gz.

File metadata

  • Download URL: bioencoder-0.1.0.tar.gz
  • Upload date:
  • Size: 33.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for bioencoder-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0acede2fcd0551231161d4bc12a09bb7a1ec1ccfb6a71bc1f9ce9d4a0d57d183
MD5 3a77636c0599676f9432118a4d21ce6d
BLAKE2b-256 116eae368aa98fe136c8c95046b4d51ad63bdcaec90b769d11892edf8cacba69

See more details on using hashes here.

File details

Details for the file bioencoder-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bioencoder-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 42.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for bioencoder-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 02d0145f41f4f15163e7a15937e7eb832852f33ec9ffb26da51112d0c44a3198
MD5 85adcba016ce997d78c165493faa1dbc
BLAKE2b-256 79b21051066d7b70db470d2e51b7ce6cfb5e8b152d7a96ad096924f53a353fa0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page