Skip to main content

Modular online patch streaming from whole-slide images for computational pathology

Project description

wsistream

Modular online patch streaming from whole-slide images for computational pathology. Stream patches directly from WSIs during training (no disk pre-extraction, no storage overhead).

Every component is pluggable: backends, tissue detectors, samplers, filters, transforms, dataset adapters.

Install

pip install "wsistream[openslide]"   # with OpenSlide
pip install "wsistream[tiffslide]"   # with TiffSlide (pure Python)
pip install "wsistream[torch]"       # add PyTorch integration (WsiStreamDataset, DDP)
pip install "wsistream[all]"         # everything (OpenSlide + TiffSlide + PyTorch + albumentations + matplotlib)

For development:

git clone https://github.com/RamonKaspar/wsistream.git
cd wsistream
pip install -e ".[dev]"

Documentation

Full documentation: ramonkaspar.github.io/wsistream

To build locally:

pip install mkdocs-material
mkdocs serve          # local preview at http://127.0.0.1:8000

How it works

Each slide goes through a fixed pipeline:

  1. Open slide: via an explicit backend (OpenSlideBackend or TiffSlideBackend)
  2. Detect tissue: run a TissueDetector on a low-res thumbnail to get a binary mask
  3. Sample coordinates: a PatchSampler proposes (x, y) locations within tissue regions
  4. Extract patch: read the pixel data from the slide at each coordinate
  5. Filter patch: a PatchFilter accepts or rejects the tile based on its pixels
  6. Transform patch: apply augmentations (HEDColorAugmentation, RandomFlipRotate, etc.)
  7. Yield result: PatchResult with image, coordinates, tissue fraction, and metadata

Quick start

from wsistream.pipeline import PatchPipeline
from wsistream.backends import OpenSlideBackend
from wsistream.tissue import CLAMTissueDetector
from wsistream.sampling import RandomSampler
from wsistream.filters import HSVPatchFilter
from wsistream.transforms import ComposeTransforms, HEDColorAugmentation, RandomFlipRotate, ResizeTransform
from wsistream.datasets import TCGAAdapter

pipeline = PatchPipeline(
    slide_paths="/data/tcga",  # directory or list of files
    backend=OpenSlideBackend(),
    tissue_detector=CLAMTissueDetector(),
    sampler=RandomSampler(patch_size=256, num_patches=-1, target_mpp=0.5),
    patch_filter=HSVPatchFilter(min_pixel_fraction=0.6),
    transforms=ComposeTransforms(transforms=[
        HEDColorAugmentation(sigma=0.05),
        RandomFlipRotate(),
        ResizeTransform(target_size=224),
    ]),
    dataset_adapter=TCGAAdapter(),
    pool_size=8,
    patches_per_slide=100,
    cycle=True,
)

for result in pipeline:
    print(result.image.shape)                # (224, 224, 3) uint8
    print(result.coordinate.mpp)             # ~0.5
    print(result.tissue_fraction)            # 0.87
    print(result.slide_metadata.patient_id)  # TCGA-3L-AA1B

Pool-based slide interleaving

The pipeline keeps pool_size slides open simultaneously and takes patches_per_slide patches from each before closing it and opening the next. With cycle=True, slides are re-queued for infinite streaming.

PyTorch integration

wsistream.torch provides WsiStreamDataset (an IterableDataset), MonitoredLoader for throughput tracking, and partition_slides_by_rank for DDP. Worker-level slide partitioning is handled automatically.

from torch.utils.data import DataLoader
from wsistream.backends import OpenSlideBackend
from wsistream.sampling import RandomSampler
from wsistream.tissue import OtsuTissueDetector
from wsistream.torch import WsiStreamDataset, partition_slides_by_rank

my_slides = partition_slides_by_rank("/data/tcga", rank=rank, world_size=world_size)

dataset = WsiStreamDataset(
    slide_paths=my_slides,
    backend=OpenSlideBackend(),
    tissue_detector=OtsuTissueDetector(),
    sampler=RandomSampler(patch_size=256, num_patches=-1, target_mpp=0.5),
)

loader = DataLoader(dataset, batch_size=64, num_workers=4, pin_memory=True)
loader_iter = iter(loader)

for step in range(total_steps):
    batch = next(loader_iter)
    images = batch["image"].to(device, non_blocking=True)  # (B, 3, H, W) float32

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsistream-0.1.1.tar.gz (71.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wsistream-0.1.1-py3-none-any.whl (53.0 kB view details)

Uploaded Python 3

File details

Details for the file wsistream-0.1.1.tar.gz.

File metadata

  • Download URL: wsistream-0.1.1.tar.gz
  • Upload date:
  • Size: 71.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for wsistream-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cbf61dff4ae85dd6776cb04590775e7cb91a1d47bc2c98c5741277189330e3d2
MD5 edb5770de2f8be09ec86d9d696bc7f79
BLAKE2b-256 a40b578cea854fbd296553cc15e2868eb9ca0983982ae26ada8a2209e21e69c0

See more details on using hashes here.

File details

Details for the file wsistream-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: wsistream-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 53.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for wsistream-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ab50f56f6a8627490c26e93499599e136f87c72d3b071daf4719ad664263a8e8
MD5 1c445dd9c9438fae32b736ecb0fe5064
BLAKE2b-256 69dcd6cf4d8ad4a95d5261c393fbe9ea3224b030a4f3076d8f2e453643109e6e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page