A tool for imageomics
Project description
BioEncoder: A toolkit for imageomics
About
BioEncoder
is a rich toolset for image classification and trait discovery in organismal biology. It relies on image classification models trained using metric learning to learn species trait data (i.e., features) from images. This implementation is based on SupCon and timm-vis. It includes the following features:
- Taxon-agnostic dataloaders (making it applicable to any biological dataset)
- Streamlit app with rich model visualizations (e.g., Grad-CAM)
- Custom augmentations techniques via albumentations
- Easy customization of hyperparameters, including augmentations, through
YAML
configs - Interactive t-SNE and PCA plots using Bokeh
- Exponential Moving Average for stable training, and Stochastic Moving Average for better generalization and performance.
- Automatic data parallelization for multi-gpu training and automatic mixed precision for larger batch sizes (support varies across graphics cards)
- Access to state-of-the-art metric losses, such as Supcon and Sub-center ArcFace.
- LRFinder for the second stage of the training (FC).
- TensorBoard logs and checkpoints (soon, Weights-and-Biases integration)
- Support of timm models, and pytorch-optimizer
Install
1. Create a clean virtual environment
mamba create -n bioencoder python=3.9
mamba activate bioencoder
2. Install pytorch with CUDA. Go to https://pytorch.org/get-started/locally/ and choose your version - e.g.:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
3. Install bioencoder from pypi:
pip install bioencoder
Get started (CLI mode)
(for detailed information consult the help files)
1. Download the example image dataset and the yaml configuration and unzip the files
2. Activate your environment
mamba activate bioencoder
3. Run bioencoder_configure
to set the bioencoder root dir and the run name - for example:
bioencoder_configure --root-dir bioencoder --run-name damselflies-example
This will create a root folder inside your project, where all relevant bioencoder data, logs, etc. will be stored - it will look like this
project-dir/
bioencoder-root-dir/
data
<run-name>
train
class_1/
image_1.jpg
image_2.jpg
...
class_2/
image_1.jpg
image_2.jpg
...
...
val
...
logs
<run-name>
<run-name>.log
plots
<run-name>.html
runs
<run-name>
<run-name>_first
events.out.tfevents.1700919284.machine-name.15832.0
<run-name>_second
events.out.tfevents.1700919284.machine-name.15832.1
weights
<run-name>
first
epoch0
epoch1
...
swa
second
epoch0
epoch1
...
swa
...
5. Now run bioencoder_split_dataset
to create the data folder containing training and validation images
bioencoder_split_dataset --image-dir data_raw\damselflies_aligned_resized
6. Use train_stage1.yml
to train the the first stage of the model:
bioencoder_train --config-path damselflies_config_files\train_stage1.yml"
Continue as follows:
bioencoder_swa --config-path damselflies_config_files\swa_stage1.yml
bioencoder_train --config-path damselflies_config_files\train_stage2.yml
bioencoder_swa --config-path damselflies_config_files\swa_stage2.yml
Inspect the training runs with
tensorboard --logdir bioencoder\runs\damselflies-example
7. Create interactive plots:
bioencoder_interactive_plots --config-path damselflies_config_files\plot_stage1.yml
8. Run the model explorer
bioencoder_model_explorer --config-path damselflies_config_files\explore_stage1.yml
Interactive mode
import os
import bioencoder
## set your project dir
os.chdir(r"D:\temp\bioencoder-test")
## set project dir and run name
bioencoder.configure(root_dir = r"bioencoder", run_name = "damselflies1")
## split dataset
bioencoder.split_dataset(image_dir=r"data_raw\damselflies_aligned_resized")
## training / swa
bioencoder.train(config_path=r"damselflies_config_files\train_stage1.yml")
bioencoder.swa(config_path=r"damselflies_config_files\swa_stage1.yml")
bioencoder.train(config_path=r"damselflies_config_files\train_stage2.yml")
bioencoder.swa(config_path=r"damselflies_config_files\swa_stage2.yml")
## interactive plots
bioencoder.interactive_plots(config_path=r"damselflies_config_files\plot_stage1.yml")
## model explorer
bioencoder.model_explorer(config_path=r"damselflies_config_files\explore_stage1.yml")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bioencoder-0.1.0.tar.gz
.
File metadata
- Download URL: bioencoder-0.1.0.tar.gz
- Upload date:
- Size: 33.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0acede2fcd0551231161d4bc12a09bb7a1ec1ccfb6a71bc1f9ce9d4a0d57d183 |
|
MD5 | 3a77636c0599676f9432118a4d21ce6d |
|
BLAKE2b-256 | 116eae368aa98fe136c8c95046b4d51ad63bdcaec90b769d11892edf8cacba69 |
File details
Details for the file bioencoder-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: bioencoder-0.1.0-py3-none-any.whl
- Upload date:
- Size: 42.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02d0145f41f4f15163e7a15937e7eb832852f33ec9ffb26da51112d0c44a3198 |
|
MD5 | 85adcba016ce997d78c165493faa1dbc |
|
BLAKE2b-256 | 79b21051066d7b70db470d2e51b7ce6cfb5e8b152d7a96ad096924f53a353fa0 |