An efficient PyTorch dataloader for working with Whole-Slide Images
Project description
WSI Dataloader
The WSI Dataloader library offers a simple implementation that enables online access to whole-slide images (WSI) during the training of deep learning models. In most machine learning frameworks designed for WSI analysis the very large WSI files are split into patches, usually for memory limitation reasons. Generating patch datasets can be long, resource-consuming and sometimes impossible when working with limited storage constraints.
The WSIDataloader
class offers an alternative solution to generating patch datasets. It is a PyTorch based implementation encapsulating a Dataloader
and a Dataset
. It enables online patch extraction across a given list of WSI files, directly during training. It supports all usual Dataloader parameters to parallelize and speed up data loading (num_workers
, prefetch_ratio
).
Supported features
- Random patch sampling over a list of WSIs
- Support for data loading over multiple workers
- CUDA acceleration for data augmentation (more on this below)
- User-defined patch definition for flexibility (the user how patches should be extracted from WSIs)
- Support for standard PyTorch Dataloader arguments
- Easy to adapt to your own pipeline. The
wsiloader
library only consists of 2 classes:WSIDataloader
andWSIIndexDataset
, making it easy to create custom classes inheriting from these base classes.
CUDA acceleration for data augmentation
The WSIDataloader
class supports CUDA acceleration for transforms application (data augmentation). When the transforms_device
parameter is set to "cpu", the default Dataloader behaviour is used and the transforms are applied in the Dataloader workers. When it is set to "cuda", the patches are first loaded using the Dataloader workers, and then transforms are sequentially applied on GPU. This decoupling is necessary due to CUDA's inability to be used in multiprocessing contexts. Depending on the nature of the required transforms, using CUDA for data augmentation can substantially reduce a training loop's iteration time. The basic_example.ipynb
notebook provides an example.
Installation
Install the wsiloader
library using pip from PyPI:
$ pip install wsiloader
or from GitHub
$ pip install git+https://github.com/gafaua/wsi-dataloader/tree/main
Confirm the installation by importing the WSIDataloader
class:
$ python -c "from wsiloader import WSIDataloader"
Examples
Example notebooks can be found in the examples directory. We recommend to take a look at these to get a better idea of how to take advantage of the wsiloader
library for your pipeline.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file wsiloader-0.0.2.tar.gz
.
File metadata
- Download URL: wsiloader-0.0.2.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4b1f211b0fc9eccbcbfdc71e35e2ebf1bdc532bbe3c0e44837c4a4abe856aef |
|
MD5 | b7cf1f8b0e3152653f37e9a2063e6fec |
|
BLAKE2b-256 | cc471264be5ba0827298102020fac02ccd4bf423c756dd1c8218ea51dc6f00bf |
File details
Details for the file wsiloader-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: wsiloader-0.0.2-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7694c05f46803024cda71813bccee6c540acba7df85ff195b611c57392773aab |
|
MD5 | aa41deca6b690f6d6d32fef270574fbd |
|
BLAKE2b-256 | 36556f3d9d68c964a8d9cb9d0ec955cea5e93676197bf74d6240050767610d1c |