Skip to main content

Optimized slide tiling library for histopathology

Project description

hs2p

PyPI version Python 3.10+ empty empty HuggingFace Space

hs2p is a Python package for fast, scalable whole-slide tiling. You can request tiles at any spacing, whether or not that spacing is natively present in the image pyramid. It is designed for computational pathology workflows that need reproducible coordinates.

We support two main workflows:

  • a Python API for library-style integration
  • a CLI for batch preprocessing

Demo

Try hs2p interactively: hs2p-demo on HuggingFace Spaces
You can adjust tiling parameters (spacing, tile size, tissue threshold, overlap) and instantly see a tiling preview and tissue mask overlay.
You can also upload your own pyramidal WSI (up to 1 GB).

Installation

pip install hs2p

For GPU-accelerated tile reading via cuCIM:

pip install "hs2p[cucim]"

This pulls in cucim-cu12, cupy-cuda12x, and nvidia-nvimgcodec-cu12 for batched GPU JPEG decoding during tar export. Use the cuCIM wheel that matches your CUDA runtime. The base hs2p install does not require cuCIM.

Workflows

Tiling

Tiling computes a reproducible grid of tile coordinates for each slide and saves them as named artifacts with extraction metadata, ready for downstream use.
When a precomputed tissue mask is not provided, hs2p segments tissue on-the-fly. If you want to precompute tissue masks, a standalone script is available.

hs2p tiling workflow

Sampling

Sampling filters or partitions tile coordinates by annotation coverage so you can keep only tiles relevant to a tissue class or label.

hs2p sampling workflow

Python API

hs2p supports pre-extracted tissue masks. If you don't have such tissue masks, you can either:

  • use our standalone tissue segmentation script (Recommended)
  • tune the SegmentationConfig parameters and let hs2p segments tissue on the fly

Minimal tiling example:

from pathlib import Path

from hs2p import (
    SlideSpec,
    TilingConfig,
    overlay_mask_on_slide,
    save_tiling_result,
    tile_slide,
    write_tiling_preview,
)

result = tile_slide(
    SlideSpec(
        sample_id="slide-1",
        image_path=Path("/data/wsi/slide-1.tif"),
        mask_path=Path("/data/mask/slide-1.tif"),
    ),
    tiling=TilingConfig(
        backend="openslide",
        target_spacing_um=0.5,
        target_tile_size_px=224,
        tolerance=0.07,
        overlap=0.0,
        tissue_threshold=0.1,
    ),
)

artifacts = save_tiling_result(result, output_dir=Path("output"))

print(artifacts.coordinates_npz_path)   # output/tiles/slide-1.coordinates.npz ; more info in docs/artifacts.md
print(artifacts.coordinates_meta_path)  # output/tiles/slide-1.coordinates.meta.json ; more info in docs/artifacts.md

tiling_preview_path = write_tiling_preview(
    result=result,
    output_dir=Path("output"),
    downsample=32,
)
print(tiling_preview_path)  # output/preview/tiling/slide-1.jpg ; low resolution preview of tiling result, good for QC

mask_overlay = overlay_mask_on_slide(
    wsi_path=result.image_path,
    annotation_mask_path=Path("/data/mask/slide-1.tif"),
    downsample=32,
    backend=result.backend,
)
mask_overlay.save("output/preview/mask/slide-1.jpg")

result is a TilingResult for one slide. It gives downstream pipelines the tile coordinates plus the metadata needed to relate those coordinates back to the slide pyramid and persist them as reusable named artifacts.

More API details: docs/api.md

CLI

The CLI is intended for fast batch processing of multiple slides with the same config. Both CLI entrypoints expect the same input csv schema:

sample_id,image_path,mask_path
slide-1,/data/wsi/slide-1.tif,/data/mask/slide-1.tif
slide-2,/data/wsi/slide-2.tif,

For a first run, start from hs2p/configs/default.yaml and edit only the essentials:

  • csv
  • output_dir
  • tiling.backend
  • tiling.params.target_spacing_um
  • tiling.params.target_tile_size_px

Optional:

  • save_tiles
    • also write tiles/{sample_id}.tiles.tar archives; with tiling.backend="cucim" this uses batched CuCIM reads during tar extraction, and other backends coalesce dense 8x8 / 4x4 regions before slicing them back into tiles

Run tiling:

python -m hs2p.tiling --config-file /path/to/config.yaml

Run sampling:

python -m hs2p.sampling --config-file /path/to/config.yaml

For sampling, add tiling.sampling_params.pixel_mapping and tiling.sampling_params.tissue_percentage for the annotations you want to keep.

Progress UX

When stdout is an interactive terminal, both CLI entrypoints show live rich progress with:

  • slide-level batch progress
  • elapsed and remaining time
  • live tile counts for tiling discovery or sampling retention
  • final summary panels with output and process_list.csv locations

When stdout is redirected or otherwise non-interactive, hs2p falls back to concise plain-text stage updates.

If a run fails, check output_dir/logs/log.txt for the full log stream.

More CLI details: docs/cli.md

Outputs

hs2p writes explicit named artifacts rather than anonymous coordinate dumps.

  • Tiling writes tiles/{sample_id}.coordinates.npz and tiles/{sample_id}.coordinates.meta.json
  • Sampling writes the same pair under tiles/<annotation>/
  • Batch runs also write process_list.csv
  • Saved coordinate arrays use a deterministic column-major order: numeric x first, then numeric y within each shared x

Artifact field reference: docs/artifacts.md

Docker

Docker Version

If you prefer running hs2p in a container, a published Docker image is available:

docker pull waticlems/hs2p:latest
docker run --rm -it -v /path/to/your/data:/data waticlems/hs2p:latest

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hs2p-2.5.0.tar.gz (101.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hs2p-2.5.0-py3-none-any.whl (64.6 kB view details)

Uploaded Python 3

File details

Details for the file hs2p-2.5.0.tar.gz.

File metadata

  • Download URL: hs2p-2.5.0.tar.gz
  • Upload date:
  • Size: 101.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for hs2p-2.5.0.tar.gz
Algorithm Hash digest
SHA256 3b45ff92e899e13151145c9ecb7948aa70dac3f9f1742a27659d8277271f3db5
MD5 43836ed72269ad1b3129937741f9b6ef
BLAKE2b-256 b1c2bf710690e5d93b2d7717056377ffcec5f57dd2400ff6ee24bc18ac742156

See more details on using hashes here.

File details

Details for the file hs2p-2.5.0-py3-none-any.whl.

File metadata

  • Download URL: hs2p-2.5.0-py3-none-any.whl
  • Upload date:
  • Size: 64.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for hs2p-2.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 79b8936b3e12b197b6348f26a687bd70e7c054b82d4c40628dd5fec0e464b873
MD5 ccf6eae8dbca25be1ab0091237470844
BLAKE2b-256 969fdf1bc246d59f4e6678f37bfc6b2a786bb4652b7b27284978502a673a9aa1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page