Skip to main content

An efficient PyTorch dataloader for working with Whole-Slide Images

Project description

WSI Dataloader

The WSI Dataloader library offers a simple implementation that enables online access to whole-slide images (WSI) during the training of deep learning models. In most machine learning frameworks designed for WSI analysis the very large WSI files are split into patches, usually for memory limitation reasons. Generating patch datasets can be long, resource-consuming and sometimes impossible when working with limited storage constraints.

The WSIDataloader class offers an alternative solution to generating patch datasets. It is a PyTorch based implementation encapsulating a Dataloader and a Dataset. It enables online patch extraction across a given list of WSI files, directly during training. It supports all usual Dataloader parameters to parallelize and speed up data loading (num_workers, prefetch_ratio).

Supported features

  • Random patch sampling over a list of WSIs
  • Support for data loading over multiple workers
  • CUDA acceleration for data augmentation (more on this below)
  • User-defined patch definition for flexibility (the user how patches should be extracted from WSIs)
  • Support for standard PyTorch Dataloader arguments
  • Easy to adapt to your own pipeline. The wsiloader library only consists of 2 classes: WSIDataloader and WSIIndexDataset, making it easy to create custom classes inheriting from these base classes.

CUDA acceleration for data augmentation

The WSIDataloader class supports CUDA acceleration for transforms application (data augmentation). When the transforms_device parameter is set to "cpu", the default Dataloader behaviour is used and the transforms are applied in the Dataloader workers. When it is set to "cuda", the patches are first loaded using the Dataloader workers, and then transforms are sequentially applied on GPU. This decoupling is necessary due to CUDA's inability to be used in multiprocessing contexts. Depending on the nature of the required transforms, using CUDA for data augmentation can substantially reduce a training loop's iteration time. The basic_example.ipynb notebook provides an example.

Installation

Install the wsiloader library using pip from PyPI:

$ pip install wsiloader

or from GitHub

$ pip install git+https://github.com/gafaua/wsi-dataloader/tree/main

Confirm the installation by importing the WSIDataloader class:

$ python -c "from wsiloader import WSIDataloader"

Examples

Example notebooks can be found in the examples directory. We recommend to take a look at these to get a better idea of how to take advantage of the wsiloader library for your pipeline.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsiloader-0.0.2.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

wsiloader-0.0.2-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file wsiloader-0.0.2.tar.gz.

File metadata

  • Download URL: wsiloader-0.0.2.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for wsiloader-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b4b1f211b0fc9eccbcbfdc71e35e2ebf1bdc532bbe3c0e44837c4a4abe856aef
MD5 b7cf1f8b0e3152653f37e9a2063e6fec
BLAKE2b-256 cc471264be5ba0827298102020fac02ccd4bf423c756dd1c8218ea51dc6f00bf

See more details on using hashes here.

File details

Details for the file wsiloader-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: wsiloader-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for wsiloader-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7694c05f46803024cda71813bccee6c540acba7df85ff195b611c57392773aab
MD5 aa41deca6b690f6d6d32fef270574fbd
BLAKE2b-256 36556f3d9d68c964a8d9cb9d0ec955cea5e93676197bf74d6240050767610d1c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page