Modular online patch streaming from whole-slide images for computational pathology
Project description
Modular online patch streaming from whole-slide images for computational pathology.
Stream patches directly from WSIs during training — no disk pre-extraction, no storage overhead. Every component is pluggable: backends, tissue detectors, samplers, filters, transforms, dataset adapters.
Install
pip install "wsistream[openslide]" # with OpenSlide
pip install "wsistream[tiffslide]" # with TiffSlide (pure Python)
pip install "wsistream[torch]" # add PyTorch integration (WsiStreamDataset, DDP)
pip install "wsistream[all]" # everything (OpenSlide + TiffSlide + PyTorch + albumentations + matplotlib)
For development:
git clone https://github.com/RamonKaspar/wsistream.git
cd wsistream
pip install -e ".[dev]"
Documentation
Full documentation: ramonkaspar.github.io/wsistream
To build locally:
pip install mkdocs-material
mkdocs serve # local preview at http://127.0.0.1:8000
How it works
Each slide goes through a fixed pipeline:
- Open slide: via an explicit backend (
OpenSlideBackendorTiffSlideBackend) - Detect tissue: run a
TissueDetectoron a low-res thumbnail to get a binary mask - Sample coordinates: a
PatchSamplerproposes (x, y) locations within tissue regions - Extract patch: read the pixel data from the slide at each coordinate
- Filter patch: a
PatchFilteraccepts or rejects the tile based on its pixels - Transform patch: apply augmentations (
HEDColorAugmentation,RandomFlipRotate, etc.) - Yield result:
PatchResultwith image, coordinates, tissue fraction, and metadata
Quick start
from wsistream.pipeline import PatchPipeline
from wsistream.backends import OpenSlideBackend
from wsistream.tissue import CLAMTissueDetector
from wsistream.sampling import RandomSampler
from wsistream.filters import HSVPatchFilter
from wsistream.transforms import ComposeTransforms, HEDColorAugmentation, RandomFlipRotate, ResizeTransform
from wsistream.datasets import TCGAAdapter
pipeline = PatchPipeline(
slide_paths="/data/tcga", # directory or list of files
backend=OpenSlideBackend(),
tissue_detector=CLAMTissueDetector(),
sampler=RandomSampler(patch_size=256, num_patches=-1, target_mpp=0.5),
patch_filter=HSVPatchFilter(min_pixel_fraction=0.6),
transforms=ComposeTransforms(transforms=[
HEDColorAugmentation(sigma=0.05),
RandomFlipRotate(),
ResizeTransform(target_size=224),
]),
dataset_adapter=TCGAAdapter(),
pool_size=8,
patches_per_slide=100,
cycle=True,
)
for result in pipeline:
print(result.image.shape) # (224, 224, 3) uint8
print(result.coordinate.mpp) # ~0.5
print(result.tissue_fraction) # 0.87
print(result.slide_metadata.patient_id) # TCGA-3L-AA1B
Pool-based slide interleaving
The pipeline keeps pool_size slides open simultaneously and takes patches_per_slide patches from each before closing it and opening the next. With cycle=True, slides are re-queued for infinite streaming. Set patches_per_visit (default 1) to read multiple patches from the same slide before round-robining, which can significantly improve I/O throughput on network filesystems.
PyTorch integration
wsistream.torch provides WsiStreamDataset (an IterableDataset), MonitoredLoader for throughput tracking, and partition_slides_by_rank for DDP. Worker-level slide partitioning is handled automatically.
from torch.utils.data import DataLoader
from wsistream.backends import OpenSlideBackend
from wsistream.sampling import RandomSampler
from wsistream.tissue import OtsuTissueDetector
from wsistream.torch import WsiStreamDataset, partition_slides_by_rank
my_slides = partition_slides_by_rank("/data/tcga", rank=rank, world_size=world_size)
dataset = WsiStreamDataset(
slide_paths=my_slides,
backend=OpenSlideBackend(),
tissue_detector=OtsuTissueDetector(),
sampler=RandomSampler(patch_size=256, num_patches=-1, target_mpp=0.5),
)
loader = DataLoader(dataset, batch_size=64, num_workers=4, pin_memory=True)
loader_iter = iter(loader)
for step in range(total_steps):
batch = next(loader_iter)
images = batch["image"].to(device, non_blocking=True) # (B, 3, H, W) float32
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wsistream-0.1.4.tar.gz.
File metadata
- Download URL: wsistream-0.1.4.tar.gz
- Upload date:
- Size: 31.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37e9e9341c8f6c174b41b31a6c2b2e7f438b23b15d878d0f2da86d40b514cb14
|
|
| MD5 |
3de92620a95a091ad05a6eb1378478e0
|
|
| BLAKE2b-256 |
474a3ad0d78bb56e22abd48e45d42b871f1892834c563e7dc1a26d3cfd5b9056
|
File details
Details for the file wsistream-0.1.4-py3-none-any.whl.
File metadata
- Download URL: wsistream-0.1.4-py3-none-any.whl
- Upload date:
- Size: 55.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c1081788b54f678d225bce6ac870db9fa4635450f2b05968df498c27785eedf
|
|
| MD5 |
51ada18c7c3993002fcff6c65053bf0b
|
|
| BLAKE2b-256 |
ee12f5c7508ae5029c2dd1cb1d96c1fcad8cf0419fb38bf5abcf38680e0df66e
|