Modular online patch streaming from whole-slide images for computational pathology

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Medical Science Apps.

Project description

`wsistream`

Modular online patch streaming from whole-slide images for computational pathology. Stream patches directly from WSIs during training (no disk pre-extraction, no storage overhead).

Every component is pluggable: backends, tissue detectors, samplers, filters, transforms, dataset adapters.

Install

git clone https://github.com/RamonKaspar/wsistream.git
cd wsistream
pip install -e ".[openslide]"   # with OpenSlide
pip install -e ".[tiffslide]"   # with TiffSlide (pure Python)
pip install -e ".[torch]"       # add PyTorch integration (WsiStreamDataset, DDP)
pip install -e ".[all]"         # everything (OpenSlide + TiffSlide + PyTorch + albumentations + matplotlib)

Documentation

pip install mkdocs-material
mkdocs serve          # local preview at http://127.0.0.1:8000
mkdocs build          # static site in site/

How it works

Each slide goes through a fixed pipeline:

Open slide: via an explicit backend (OpenSlideBackend or TiffSlideBackend)
Detect tissue: run a TissueDetector on a low-res thumbnail to get a binary mask
Sample coordinates: a PatchSampler proposes (x, y) locations within tissue regions
Extract patch: read the pixel data from the slide at each coordinate
Filter patch: a PatchFilter accepts or rejects the tile based on its pixels
Transform patch: apply augmentations (HEDColorAugmentation, RandomFlipRotate, etc.)
Yield result: PatchResult with image, coordinates, tissue fraction, and metadata

Pipeline overview

Quick start

from wsistream.pipeline import PatchPipeline
from wsistream.backends import OpenSlideBackend
from wsistream.tissue import CLAMTissueDetector
from wsistream.sampling import RandomSampler
from wsistream.filters import HSVPatchFilter
from wsistream.transforms import ComposeTransforms, HEDColorAugmentation, RandomFlipRotate, ResizeTransform
from wsistream.datasets import TCGAAdapter

pipeline = PatchPipeline(
    slide_paths="/data/tcga",  # directory or list of files
    backend=OpenSlideBackend(),
    tissue_detector=CLAMTissueDetector(),
    sampler=RandomSampler(patch_size=256, num_patches=1000, target_mpp=0.5),
    patch_filter=HSVPatchFilter(min_pixel_fraction=0.6),
    transforms=ComposeTransforms(transforms=[
        HEDColorAugmentation(sigma=0.05),
        RandomFlipRotate(),
        ResizeTransform(target_size=224),
    ]),
    dataset_adapter=TCGAAdapter(),
    pool_size=8,
    patches_per_slide=100,
    cycle=True,
)

for result in pipeline:
    print(result.image.shape)                # (224, 224, 3) uint8
    print(result.coordinate.mpp)             # ~0.5
    print(result.tissue_fraction)            # 0.87
    print(result.slide_metadata.patient_id)  # TCGA-3L-AA1B

Pool-based slide interleaving

The pipeline keeps pool_size slides open simultaneously and takes patches_per_slide patches from each before closing it and opening the next. With cycle=True, slides are re-queued for infinite streaming.

PyTorch integration

wsistream.torch provides WsiStreamDataset (an IterableDataset) and partition_slides_by_rank for DDP. Worker-level slide partitioning is handled automatically.

from torch.utils.data import DataLoader
from wsistream.backends import OpenSlideBackend
from wsistream.sampling import RandomSampler
from wsistream.tissue import OtsuTissueDetector
from wsistream.torch import WsiStreamDataset, partition_slides_by_rank

my_slides = partition_slides_by_rank("/data/tcga", rank=rank, world_size=world_size)

dataset = WsiStreamDataset(
    slide_paths=my_slides,
    backend=OpenSlideBackend(),
    tissue_detector=OtsuTissueDetector(),
    sampler=RandomSampler(patch_size=256, num_patches=1000, target_mpp=0.5),
)

loader = DataLoader(dataset, batch_size=64, num_workers=4, pin_memory=True)
loader_iter = iter(loader)

for step in range(total_steps):
    batch = next(loader_iter)
    images = batch["image"].to(device, non_blocking=True)  # (B, 3, H, W) float32

See examples/train_single_gpu.py and examples/train_ddp.py for complete training examples.

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Medical Science Apps.

Release history Release notifications | RSS feed

0.1.5

Apr 27, 2026

0.1.4

Apr 1, 2026

0.1.3

Mar 24, 2026

0.1.2

Mar 24, 2026

0.1.1

Mar 22, 2026

This version

0.1.0

Mar 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsistream-0.1.0.tar.gz (71.0 kB view details)

Uploaded Mar 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wsistream-0.1.0-py3-none-any.whl (53.0 kB view details)

Uploaded Mar 22, 2026 Python 3

File details

Details for the file wsistream-0.1.0.tar.gz.

File metadata

Download URL: wsistream-0.1.0.tar.gz
Upload date: Mar 22, 2026
Size: 71.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for wsistream-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fcb385b2f9a70bde3bc8c0bdb2646422eb877e46af92743ab5a495e6428c513b`
MD5	`e127cc9e9c0b2f5c8429866f6b1bdfad`
BLAKE2b-256	`c62086a5ed4daaaa4ced1cd6240b805758cc967da2d741914d37ed4dcb337574`

See more details on using hashes here.

File details

Details for the file wsistream-0.1.0-py3-none-any.whl.

File metadata

Download URL: wsistream-0.1.0-py3-none-any.whl
Upload date: Mar 22, 2026
Size: 53.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for wsistream-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`479390749cd693191c14216e06ff8157f0c6e20590a35ec3d1af9943c3ccde63`
MD5	`73d1fe51ca720f02b07bc89962ccd5e9`
BLAKE2b-256	`cdd3e5a052dcfcf3f379499cb3d3a63fc500f51f6ed9526f4bc0e49c122097d2`

See more details on using hashes here.

wsistream 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

`wsistream`

Install

Documentation

How it works

Quick start

Pool-based slide interleaving

PyTorch integration

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes