Skip to main content

Modular online patch streaming from whole-slide images for computational pathology

Project description

wsistream

Modular online patch streaming from whole-slide images for computational pathology. Stream patches directly from WSIs during training (no disk pre-extraction, no storage overhead).

Every component is pluggable: backends, tissue detectors, samplers, filters, transforms, dataset adapters.

Install

git clone https://github.com/RamonKaspar/wsistream.git
cd wsistream
pip install -e ".[openslide]"   # with OpenSlide
pip install -e ".[tiffslide]"   # with TiffSlide (pure Python)
pip install -e ".[torch]"       # add PyTorch integration (WsiStreamDataset, DDP)
pip install -e ".[all]"         # everything (OpenSlide + TiffSlide + PyTorch + albumentations + matplotlib)

Documentation

pip install mkdocs-material
mkdocs serve          # local preview at http://127.0.0.1:8000
mkdocs build          # static site in site/

How it works

Each slide goes through a fixed pipeline:

  1. Open slide: via an explicit backend (OpenSlideBackend or TiffSlideBackend)
  2. Detect tissue: run a TissueDetector on a low-res thumbnail to get a binary mask
  3. Sample coordinates: a PatchSampler proposes (x, y) locations within tissue regions
  4. Extract patch: read the pixel data from the slide at each coordinate
  5. Filter patch: a PatchFilter accepts or rejects the tile based on its pixels
  6. Transform patch: apply augmentations (HEDColorAugmentation, RandomFlipRotate, etc.)
  7. Yield result: PatchResult with image, coordinates, tissue fraction, and metadata

Pipeline overview

Quick start

from wsistream.pipeline import PatchPipeline
from wsistream.backends import OpenSlideBackend
from wsistream.tissue import CLAMTissueDetector
from wsistream.sampling import RandomSampler
from wsistream.filters import HSVPatchFilter
from wsistream.transforms import ComposeTransforms, HEDColorAugmentation, RandomFlipRotate, ResizeTransform
from wsistream.datasets import TCGAAdapter

pipeline = PatchPipeline(
    slide_paths="/data/tcga",  # directory or list of files
    backend=OpenSlideBackend(),
    tissue_detector=CLAMTissueDetector(),
    sampler=RandomSampler(patch_size=256, num_patches=1000, target_mpp=0.5),
    patch_filter=HSVPatchFilter(min_pixel_fraction=0.6),
    transforms=ComposeTransforms(transforms=[
        HEDColorAugmentation(sigma=0.05),
        RandomFlipRotate(),
        ResizeTransform(target_size=224),
    ]),
    dataset_adapter=TCGAAdapter(),
    pool_size=8,
    patches_per_slide=100,
    cycle=True,
)

for result in pipeline:
    print(result.image.shape)                # (224, 224, 3) uint8
    print(result.coordinate.mpp)             # ~0.5
    print(result.tissue_fraction)            # 0.87
    print(result.slide_metadata.patient_id)  # TCGA-3L-AA1B

Pool-based slide interleaving

The pipeline keeps pool_size slides open simultaneously and takes patches_per_slide patches from each before closing it and opening the next. With cycle=True, slides are re-queued for infinite streaming.

PyTorch integration

wsistream.torch provides WsiStreamDataset (an IterableDataset) and partition_slides_by_rank for DDP. Worker-level slide partitioning is handled automatically.

from torch.utils.data import DataLoader
from wsistream.backends import OpenSlideBackend
from wsistream.sampling import RandomSampler
from wsistream.tissue import OtsuTissueDetector
from wsistream.torch import WsiStreamDataset, partition_slides_by_rank

my_slides = partition_slides_by_rank("/data/tcga", rank=rank, world_size=world_size)

dataset = WsiStreamDataset(
    slide_paths=my_slides,
    backend=OpenSlideBackend(),
    tissue_detector=OtsuTissueDetector(),
    sampler=RandomSampler(patch_size=256, num_patches=1000, target_mpp=0.5),
)

loader = DataLoader(dataset, batch_size=64, num_workers=4, pin_memory=True)
loader_iter = iter(loader)

for step in range(total_steps):
    batch = next(loader_iter)
    images = batch["image"].to(device, non_blocking=True)  # (B, 3, H, W) float32

See examples/train_single_gpu.py and examples/train_ddp.py for complete training examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsistream-0.1.0.tar.gz (71.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wsistream-0.1.0-py3-none-any.whl (53.0 kB view details)

Uploaded Python 3

File details

Details for the file wsistream-0.1.0.tar.gz.

File metadata

  • Download URL: wsistream-0.1.0.tar.gz
  • Upload date:
  • Size: 71.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for wsistream-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fcb385b2f9a70bde3bc8c0bdb2646422eb877e46af92743ab5a495e6428c513b
MD5 e127cc9e9c0b2f5c8429866f6b1bdfad
BLAKE2b-256 c62086a5ed4daaaa4ced1cd6240b805758cc967da2d741914d37ed4dcb337574

See more details on using hashes here.

File details

Details for the file wsistream-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: wsistream-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 53.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for wsistream-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 479390749cd693191c14216e06ff8157f0c6e20590a35ec3d1af9943c3ccde63
MD5 73d1fe51ca720f02b07bc89962ccd5e9
BLAKE2b-256 cdd3e5a052dcfcf3f379499cb3d3a63fc500f51f6ed9526f4bc0e49c122097d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page