Skip to main content

Optimized slide tiling library for histopathology

Project description

hs2p

PyPI version empty empty

hs2p is a Python package for efficient slide tiling and tile sampling at any requested spacing, whether or not that spacing is natively present in the whole-slide image. It is designed for computational pathology workflows that need reproducible coordinates.

We support two main workflows:

  • a Python API for library-style integration
  • a CLI for batch preprocessing from a CSV and YAML config

Installation

pip install hs2p

If a mask is not provided, hs2p can segment tissue directly from the slide; if you want to precompute tissue masks, a standalone script is available.

Workflows

Tiling

Tiling computes a reproducible grid of tile coordinates for each slide and saves them as named artifacts with extraction metadata, ready for downstream use.

HS2P tiling workflow

Sampling

Sampling filters or partitions tile coordinates by annotation coverage so you can keep only tiles relevant to a tissue class or label.

HS2P sampling workflow

Python API

hs2p supports pre-extracted tissue masks. If you don't have such tissue masks, you can either:

  • use our standalone tissue segmentation script (Recommended)
  • tune the SegmentationConfig parameters and let hs2p segments tissue on the fly

Minimal tiling example:

from pathlib import Path

from hs2p import (
    FilterConfig,
    SegmentationConfig,
    TilingConfig,
    WholeSlide,
    overlay_mask_on_slide,
    save_tiling_result,
    tile_slide,
    write_tiling_preview,
)

result = tile_slide(
    WholeSlide(
        sample_id="slide-1",
        image_path=Path("/data/wsi/slide-1.tif"),
        mask_path=Path("/data/mask/slide-1.tif"),
    ),
    tiling=TilingConfig(
        backend="openslide",
        target_spacing_um=0.5,
        target_tile_size_px=224,
        tolerance=0.07,
        overlap=0.0,
        tissue_threshold=0.1,
    ),
    segmentation=SegmentationConfig(downsample=64),
    filtering=FilterConfig(ref_tile_size=224, a_t=4, a_h=2),
    num_workers=1,
)

artifacts = save_tiling_result(result, output_dir=Path("output"))
tiling_preview_path = write_tiling_preview(
    result=result,
    output_dir=Path("output"),
    downsample=32,
)

mask_overlay = overlay_mask_on_slide(
    wsi_path=result.image_path,
    annotation_mask_path=Path("/data/mask/slide-1.tif"),
    downsample=32,
    backend=result.backend,
)
mask_overlay.save("output/visualization/mask/slide-1.jpg")

print(artifacts.tiles_npz_path)
print(artifacts.tiles_meta_path)
print(tiling_preview_path)

result is a TilingResult for one slide. It gives downstream pipelines the tile coordinates plus the metadata needed to relate those coordinates back to the slide pyramid and persist them as reusable named artifacts.

More API details: docs/api.md

CLI

Both CLI entrypoints use the same input CSV schema:

sample_id,image_path,mask_path
slide-1,/data/wsi/slide-1.tif,/data/mask/slide-1.tif
slide-2,/data/wsi/slide-2.tif,

For a first run, start from hs2p/configs/default.yaml and edit only the essentials:

  • csv
  • output_dir
  • tiling.backend
  • tiling.params.target_spacing_um
  • tiling.params.target_tile_size_px

Run tiling:

python -m hs2p.tiling --config-file /path/to/config.yaml

Run sampling:

python -m hs2p.sampling --config-file /path/to/config.yaml

For sampling, add tiling.sampling_params.pixel_mapping and tiling.sampling_params.tissue_percentage for the annotations you want to keep.

More CLI details: docs/cli.md

Outputs

hs2p writes explicit named artifacts rather than anonymous coordinate dumps.

  • Tiling writes coordinates/{sample_id}.tiles.npz and coordinates/{sample_id}.tiles.meta.json
  • Sampling writes the same pair under coordinates/<annotation>/
  • Batch runs also write process_list.csv
  • Saved coordinate arrays use a deterministic column-major order: numeric x first, then numeric y within each shared x

Artifact field reference: docs/artifacts.md

Docker

Docker Version

If you prefer running hs2p in a container, a published Docker image is available:

docker pull waticlems/hs2p:latest
docker run --rm -it -v /path/to/your/data:/data waticlems/hs2p:latest

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hs2p-2.0.0.tar.gz (62.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hs2p-2.0.0-py3-none-any.whl (45.8 kB view details)

Uploaded Python 3

File details

Details for the file hs2p-2.0.0.tar.gz.

File metadata

  • Download URL: hs2p-2.0.0.tar.gz
  • Upload date:
  • Size: 62.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for hs2p-2.0.0.tar.gz
Algorithm Hash digest
SHA256 eccc62fed5f10839f30c90e894872d3a0dc8ee4e65457c4c0b9edc38e4b4d76c
MD5 b2da0a82c9b289b32ca1c07fcf40ceb0
BLAKE2b-256 5692d6dcd0b9b465baacf6829bafbceb1decf424742eeaf27f0abaffe4c1b793

See more details on using hashes here.

File details

Details for the file hs2p-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: hs2p-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 45.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for hs2p-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c705e679b1fdff5abea875fb423043995eaea0e94421801a8e2021ee546199e1
MD5 faf1a60bb0e85f4f37180886ce3acf4f
BLAKE2b-256 34935ba5216ffd089a8f83ba103b6bb4a5fe67c3f17ff905f6d6f5ce40b616a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page