Skip to main content

Converts general images of cells into formats and labels for deep learning pipelines

Project description

Cell Data Loader

Cell Data Loader is a simple AI support tool in Python that can take in images of cells (or other image types) and output them with minimal effort to formats that can be read by Pytorch (Tensor) or Tensorflow (Numpy) format. With Cell Data Loader, users have the option to output their cell images as whole images, sliced images, or, with the support of CellPose, segment their images by cell and output those individually.

It can also be used for normal computer vision research, which is why CellPose is not a strict dependency.

To install Cell Data Loader, simply type into a standard UNIX terminal

pip install cell-data-loader

The simplest way to use Cell Data Loader is to instantiate a dataloader as such:

from cell_data_loader import CellDataloader

imfolder = '/path/to/my/images'

dataloader = CellDataloader(imfolder)

for image in dataloader:
	...

And viola!

Lists of files are also supported:

imfiles = ['/path/to/image1.png','/path/to/image2.png','/path/to/image3.png']

dataloader = CellDataloader(imfiles)

for image in dataloader:
	...

Labels

Cell Data Loader has a few ways to support image labels. The simplest is whole images that are located in different folders, with each folder representing a label. This can be supported via the following:

imfolder1 = '/path/to/my/images'
imfolder2 = '/path/to/your/images'

dataloader = CellDataloader(imfolder1,imfolder2)

for label,image in dataloader:
	...

Alternatively, if you have one folder or file list with images that have different naming conventions, a regex match is supported:

imfiles = ['/path/to/CANCER_image1.png',
			'/path/to/CANCER_image2.png',
			'/path/to/CANCER_image3.png',
			'/path/to/HEALTHY_image1.png',
			'/path/to/HEALTHY_image2.png',
			'/path/to/HEALTHY_image3.png']

dataloader = CellDataloader(imfiles,label_regex = ["CANCER","HEALTHY"])
for label,image in dataloader:
	...

Boxes

In cases where you need to cut out individual cells from an image and have the coordinates file, cell_data_loader.py accepts an argument, cell_box_filelist, which is a list of files corresponding to the inputs that mark out the coordinates of labels on the cells. The format of the csv is as follows:

X Y W H Label
14 13 5 6 0
20 25 15 5 1
dataloader = CellDataloader('/path/to/file.svs',cell_box_filelist=['/path/to/boxfile.csv'])

Arguments

Additional arguments taken by Cell Data Loader include

imfolder = '/path/to/folder'

dataloader = CellDataloader(imfolder,
			dim = (64,64),
			batch_size = 64,
			dtype = "torch", # Can also be "numpy"
			label_regex = None,
			verbose = True,
			segment_image = "whole", # "whole" outputs the whole image, resized
				# to dim; "sliced" cuts the image checkerboard pattern into
				# dim-shaped outputs, so it's suitable for large images; "cell"
				# segments cells from the image using CellPose, though it throws
				# an error if CellPose is not installed properly. CellPose is
				# not included by default in the dependencies and needs to be
				# installed separately by the user.
			n_channels = 3, # Detected in first image by default; re-samples all
				# images to force this number of channels
			augment_image = True, # Augments the output image in the standard
				# ways -- rotation, color jiggling, etc.
			label_balance = True, # Outputs proportional amounts of each label
				# in the dataset
			gpu_ids = None, # GPUs that the outputs are read to, if present.
			channels_first = True # Places channels either first, before the
				# batch dimension, or last
			)

Dependencies

Note that the strict dependencies are automatically downloaded just with

pip install cell-data-loader

However, to get support with cell-segmentation-specific images (i.e., segment="cell"), CellPose needs to be installed. GPU integration with CellPose would also need to be handled separately.

Strict dependencies:

numpy
torch
torchvision
opencv-python>=4.5.4
slideio==2.4.1
scipy
scikit-image
pillow

Soft dependencies:

CellPose # For cell segmentation support
Tensorflow

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cell_data_loader-0.0.17.tar.gz (50.1 kB view details)

Uploaded Source

Built Distribution

cell_data_loader-0.0.17-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file cell_data_loader-0.0.17.tar.gz.

File metadata

  • Download URL: cell_data_loader-0.0.17.tar.gz
  • Upload date:
  • Size: 50.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for cell_data_loader-0.0.17.tar.gz
Algorithm Hash digest
SHA256 b1d2460765c4d1c78e8a6f78d2965cbc3ddc0b0d1f9e58f8c3d349e0431169e8
MD5 edf22396814d3a0164d7dba9759060b7
BLAKE2b-256 ff6f6e0630e7aa572f18788aac3e542750956de3be6841b429235d466a494ff8

See more details on using hashes here.

Provenance

The following attestation bundles were made for cell_data_loader-0.0.17.tar.gz:

Publisher: publish-to-pypi.yml on mleming/CellDataLoader

Attestations:

File details

Details for the file cell_data_loader-0.0.17-py3-none-any.whl.

File metadata

File hashes

Hashes for cell_data_loader-0.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 a9b4fd179a4b56a06d450b6d1e979ed9ba4582abdb1d035693ac17e96afd3909
MD5 3d70a48a5a1e70a7c08316b62b0efd75
BLAKE2b-256 e7f572931bb354cb110acad2dec57763eb0d82b58a29e2f92349c1202d5460f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for cell_data_loader-0.0.17-py3-none-any.whl:

Publisher: publish-to-pypi.yml on mleming/CellDataLoader

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page