load bioimages for machine learning
Project description
bioimageloader
Load bioimages for machine learning applications
bioimageloader is a python library to make it easy to load bioimage datasets for machine learning and deep learning. Bioimages come in numerous and inhomogeneous forms. bioimageloader attempts to wrap them in unified interfaces, so that you can easily concatenate, perform image augmentation, and batch-load them.
bioimageloader provides
- collections of interfaces for popular and public bioimage datasets
- image augmentation using albumentations, which is popular and powerful image augmentation library (for 2D images)
- compatibility with pytorch
- and with others such as scikit-learn and tensorflow
Table of Contents
- Quick overview
- Load a single dataset
- Load multiple datasets
- Batch-load datasets
- bioimageloader is not/does not
- Why bioimageloader
- Installation
- Documentation
- Available collections
- QnA
- Contributing
- Contact
Quick overview
Find full guides at bioimageloader-docs:User Guides
-
Load a single dataset
Load and iterate 2018 Data Science Bowl
from bioimageloader.collections import DSB2018 import albumentations as A transforms = A.Compose([ A.RandomCrop(width=256, height=256), A.HorizontalFlip(p=0.5), A.RandomBrightnessContrast(p=0.2), ]) dsb2018 = DSB2018('path/to/root_dir', transforms=transforms) for data in dsb2018: image = data['image'] mask = data['mask']
-
Load multiple datasets
Load DSB2018 and Triple Negative Breast Cancer (TNBC)
from bioimageloader import Config, ConcatDataset from bioimageloader.collections import DSB2018, TNBC import albumentations as A transforms = A.Compose([ A.RandomCrop(width=256, height=256), A.HorizontalFlip(p=0.5), A.RandomBrightnessContrast(p=0.2), ]) cfg = { 'DSB2018': { 'root_dir': 'path/to/root_dir' }, 'TNBC' : { 'root_dir': 'path/to/root_dir' }, } config = Config.from_dict(cfg) datasets = config.load_datasets(transforms=transforms) cat = ConcatDataset(datasets) for meow in cat: image = meow['image'] mask = meow['mask']
-
Batch-load dataset
from bioimageloader import BatchDataloader call_cat = BatchDataloader(cat, batch_size=16, drop_last=True, num_workers=8) for meow in call_cat: batch_image = meow['image'] batch_mask = meow['mask']
or directly use pytorch's
DataLoader
from torch.utils.data import DataLoader call_cat = Dataloader(cat, batch_size=16, drop_last=True, num_workers=8) for meow in call_cat: batch_image = meow['image'] batch_mask = meow['mask']
bioimageloader is not/does not
- not a full pipeline for ML/DL
- not a hub to bioimage datasets (if it ever becomes one, it would be awesome though)
- does not host data (only interfaces)
- does not provide one-click links for downloading data
- does not overwrite the source data
Why bioimageloader
bioimagesloader is a by-product of my thesis. This library collected bioimage datasets for machine learning and deep learning. I needed a lot of diverse bioimages for self-supervised neural networks for my thesis. While I managed to find many great datasets, they all came with different folder structures and formats. In addition, I encountered many issues to load and process them, which were sometimes technical or just rooted from the nature of bioimages.
For instances of technical issues, some datasets were missing one or two pairs of image and annotation, had broken files, had very specific file formats that cannot be easily read in python, or provided mask annotation not in image format but in .xml format. Some filenames have typos, so sometimes I failed to iterate them.
For an example of intrinsic issues of bioimages, selecting a certain channel was an important functionality that I needed, and it was not easy for bioimage datasets. When a dataset provided separate files for each channel image, it was easy to select one. But in many cases, they just put all channels together in one image file. And even worse for 2 channel images (which are quite common), if they chose to use RGB(A) image formats such as JPEG or PNG other than TIFF, I needed to figure out manually which channel refers to what and which channel is the empty one.
There were other issues not mentioned above of course. It was rather painful to deal with all these edge cases one by one. But anyway I did it and I thought it would be valuable to package and share it with community so that others do not have to suffer, even though the number of implemented datasets is small for the moment,
Installation
Install the latest version from PyPI. bioimageloader requires Python 3.8 or higher. Find more options at bioimageloader-docs:Installation
pip install bioimageloader
Documentation
Full documentation is available at bioimageloader-docs
Available collections
Go to bioimageloader-docs:Catalogue
QnA
Why no direct download link to each dataset?
bioimageloader provides only codes (interfaces) to load data but not data itself. We
believe that it is important for you to go there, read papers, understand terms and
licenses to appreciate their works, because bioimages themselves are sciences and
results of time, efforts, and resources. You still can find links to their project pages
or papers at bioimageloader-docs:Catalogue, and you need to follow their instruction
to get data. Once you downloaded a dataset and unzipped it, (if it is supported by
bioimageloader) you simply pass its root directory as the first argument to
corresponding class from collections bioimageloader.collections
.
Dataset that I want is not in the bioimageloader-docs:Catalogue
First of all, I named each dataset class rather arbitrary. Try to find the dataset you want with authors' names or with other keywords (if it has any), and you may find it having an unexpected name. If it is the case, I apologize for bad naming.
If you still cannot find it, then you have two options: either you do it yourself (see below question and please consider contributing!), or you can file an issue so that the community can help.
Don't know how to write my own dataloader.
Writing a dataloader requires a bit of Python skills. No easy way. Please read templates carefully and see how others are implemented. File an issue, and I am willing to help.
How to run a ML/DL model?
bioimageloader only helps loading images/annotations, not running ML/DL models. Still, you may find some useful examples at bioimageloader-docs:User Guides. Also check out ZeroCostDL4Mic.
I want more granular control over datasets individually
Each bioimage dataset is very unique and it is natural that users want more controls and it was true for my work as well. Good news is that bioimageloader suggests a template that you can extend from and make a subclass in your liking. Bad news is that you need to know how to make a subclass in Python (not so bad I hope. I suppose that you may have knowledge of Python, if you want to develop ML/DL in Python anyway). This guide Modifying existing collections covers it.
Contributing
Find guide at bioimageloader-docs:Contributing
Also check out TODO list.
Contact
I am open to any feedbacks, suggestions, and discussions. Reach out to me by github or email.
Seongbin Lim
- Homepage: https://sbinnee.github.io/
- Email: seongbin.lim at polytechnique.edu, sungbin246 at gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bioimageloader-0.1.1.tar.gz
.
File metadata
- Download URL: bioimageloader-0.1.1.tar.gz
- Upload date:
- Size: 60.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.7 tqdm/4.56.0 importlib-metadata/3.4.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a340ba6493a355e2a50ab4d887b585d52e6c6641493904f184a5a83f2513b709 |
|
MD5 | 264141f1cddab3721ea774ccb4f990d7 |
|
BLAKE2b-256 | a9e547ab926f174ff85e6709d44f16dedf96aff26a87d455b65647b1bd63cc32 |
File details
Details for the file bioimageloader-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: bioimageloader-0.1.1-py3-none-any.whl
- Upload date:
- Size: 101.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.7 tqdm/4.56.0 importlib-metadata/3.4.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07dceda3cec0d829f7cbe8e79bbf74a3401f8967fcbaf59e3cc65d74a90493c8 |
|
MD5 | dfbd872543302e0e04de8c17888c9014 |
|
BLAKE2b-256 | 4e55ae309636a72377660b7736f2842037e1836587cb7e030bbd077e210b84f0 |