Converts general images of cells into formats and labels for deep learning pipelines
Project description
Cell Data Loader
Cell Data Loader is a simple AI support tool in Python that can take in images of cells (or other image types) and output them with minimal effort to formats that can be read by Pytorch (Tensor) or Tensorflow (Numpy) format. With Cell Data Loader, users have the option to output their cell images as whole images, sliced images, or, with the support of CellPose, segment their images by cell and output those individually.
It can also be used for normal computer vision research, which is why CellPose is not a strict dependency.
Quick Start
To install Cell Data Loader, simply type into a standard UNIX terminal
pip install cell-data-loader
To use a quick Jupyter Notebook example, navigate to example/CellDataLoaderPlayground.ipynb, download it, and run it in Jupyter Notebook. On a Mac, that may be run with the following command:
python3 -m jupyterlab ~/Downloads/CellDataLoaderPlayground.ipynb
Python
The simplest way to use Cell Data Loader is to instantiate a dataloader as such:
from cell_data_loader import CellDataloader
imfolder = '/path/to/my/images'
dataloader = CellDataloader(imfolder)
for image in dataloader:
...
And viola!
Lists of files are also supported:
imfiles = ['/path/to/image1.png','/path/to/image2.png','/path/to/image3.png']
dataloader = CellDataloader(imfiles)
for image in dataloader:
...
Labels
Cell Data Loader has a few ways to support image labels. The simplest is whole images that are located in different folders, with each folder representing a label. This can be supported via the following:
imfolder1 = '/path/to/my/images'
imfolder2 = '/path/to/your/images'
dataloader = CellDataloader(imfolder1,imfolder2)
for label,image in dataloader:
...
Alternatively, if you have one folder or file list with images that have different naming conventions, a regex match is supported:
imfiles = ['/path/to/CANCER_image1.png',
'/path/to/CANCER_image2.png',
'/path/to/CANCER_image3.png',
'/path/to/HEALTHY_image1.png',
'/path/to/HEALTHY_image2.png',
'/path/to/HEALTHY_image3.png']
dataloader = CellDataloader(imfiles,label_regex = ["CANCER","HEALTHY"])
for label,image in dataloader:
...
Boxes
In cases where you need to cut out individual cells from an image and have the coordinates file, cell_data_loader.py accepts an argument, file_to_label_regex, which is a regex that translates image file names into the paths of CSVs that correspond to the inputs that mark out the coordinates of labels on the cells. The format of the csv is as follows:
| X | Y | W | H | Label |
|---|---|---|---|---|
| 14 | 13 | 5 | 6 | 0 |
| 20 | 25 | 15 | 5 | 1 |
The file_to_label_regex is represented as a tuple of two regexes — a matching and a replacement expression. So if an image is named '/path/to/AF647-1.tif' and a label file is named '/path/to/For_DL_AF647-1.csv', the following expression would be appropriate:
dataloader = CellDataloader('/path/to/AF647-1.tif',file_to_label_regex=((r'%s([^%s+]*).tif' % (os.sep,os.sep),r'%sFor_DL_\1.csv' % os.sep)))
Arguments
Additional arguments taken by Cell Data Loader include
imfolder = '/path/to/folder'
dataloader = CellDataloader(imfolder,
dim = (64,64),
batch_size = 64,
dtype = "torch", # Can also be "numpy"
label_regex = None,
verbose = True,
segment_image = "whole", # "whole" outputs the whole image, resized
# to dim; "sliced" cuts the image checkerboard pattern into
# dim-shaped outputs, so it's suitable for large images; "cell"
# segments cells from the image using CellPose, though it throws
# an error if CellPose is not installed properly. CellPose is
# not included by default in the dependencies and needs to be
# installed separately by the user.
n_channels = 3, # Detected in first image by default; re-samples all
# images to force this number of channels
augment_image = True, # Augments the output image in the standard
# ways -- rotation, color jiggling, etc.
label_balance = True, # Outputs proportional amounts of each label
# in the dataset
gpu_ids = None, # GPUs that the outputs are read to, if present.
channels_first = True # Places channels either first, before the
# batch dimension, or last
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cell_data_loader-0.0.39.tar.gz.
File metadata
- Download URL: cell_data_loader-0.0.39.tar.gz
- Upload date:
- Size: 28.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5330bb0c6eb87d629c7f408aec954e68617332bc5ff784372a50f6dd3f22982c
|
|
| MD5 |
daee5c2ab188915ffaca3ff2a74c79dd
|
|
| BLAKE2b-256 |
21f949491dbd647d536f334cb8c7e6c4b33c555e0f0cb96d458cbd7098f57339
|
Provenance
The following attestation bundles were made for cell_data_loader-0.0.39.tar.gz:
Publisher:
publish-to-pypi.yml on mleming/CellDataLoader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cell_data_loader-0.0.39.tar.gz -
Subject digest:
5330bb0c6eb87d629c7f408aec954e68617332bc5ff784372a50f6dd3f22982c - Sigstore transparency entry: 612646921
- Sigstore integration time:
-
Permalink:
mleming/CellDataLoader@c9922ccb4bb686f101842f8c6eaf6b3beb63972b -
Branch / Tag:
refs/tags/0.0.39 - Owner: https://github.com/mleming
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@c9922ccb4bb686f101842f8c6eaf6b3beb63972b -
Trigger Event:
push
-
Statement type:
File details
Details for the file cell_data_loader-0.0.39-py3-none-any.whl.
File metadata
- Download URL: cell_data_loader-0.0.39-py3-none-any.whl
- Upload date:
- Size: 22.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5518d59515f5c2a68c36233232a5ea061b2ccf3afa6f33ceae27a6bb5e75ae4
|
|
| MD5 |
c699d3f087c1f0e3be34a5e7f7d91e07
|
|
| BLAKE2b-256 |
741796435244ed52218836ddc89961411d1ea560b686eab3e6fe16976915a597
|
Provenance
The following attestation bundles were made for cell_data_loader-0.0.39-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on mleming/CellDataLoader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cell_data_loader-0.0.39-py3-none-any.whl -
Subject digest:
f5518d59515f5c2a68c36233232a5ea061b2ccf3afa6f33ceae27a6bb5e75ae4 - Sigstore transparency entry: 612646961
- Sigstore integration time:
-
Permalink:
mleming/CellDataLoader@c9922ccb4bb686f101842f8c6eaf6b3beb63972b -
Branch / Tag:
refs/tags/0.0.39 - Owner: https://github.com/mleming
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@c9922ccb4bb686f101842f8c6eaf6b3beb63972b -
Trigger Event:
push
-
Statement type: