Skip to main content

A histopathology toolkit for working with whole slide images and their annotations.

Project description

Histokit

Histokit is a histopathology whole slide image preprocessing package for Python and command line tool.

Install as a tool

Histokit can be installed as a command line tool directly from this repository using uv. Here is an example:

uv tool update-shell
uv tool install git+https://github.com/davemor/histokit

CLI Interface

 histokit --help

 Usage: histokit [OPTIONS] COMMAND [ARGS]...

 histokit  histopathology toolkit CLI.

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --install-completion          Install completion for the current shell.      │
│ --show-completion             Show completion for the current shell, to copy │
│                               it or customize the installation.              │
│ --help                        Show this message and exit.                    │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────╮
│ list     List available built-in pipelines.                                  │
│ plan     Show pipeline stages and resolved parameters.                       │
│ run      Run a pipeline on a dataset and save the resulting PatchSet.        │
│ preview  Preview a pipeline on a single sample with diagnostic images.       │
│ export   Export patch images from a saved PatchSet to label-name             │
│          directories.                                                        │
╰──────────────────────────────────────────────────────────────────────────────╯

histokit list

Show all built-in pipelines that ship with histokit. Each entry shows the import reference and stage count.

histokit list
Available pipelines:

  histokit.pipelines.presets.basic:pipeline  (4 stages)
  histokit.pipelines.presets.research:pipeline  (4 stages)

histokit plan

Inspect a pipeline's stages and see what parameters they use. Use --set to preview overrides without running anything.

# Show default parameters
histokit plan histokit.pipelines.presets.basic:pipeline

# Preview with overrides
histokit plan histokit.pipelines.presets.basic:pipeline --set patch_size=512 --set level=0

histokit run

Run a pipeline on a full dataset and save the combined PatchSet to disk.

# Basic run
histokit run histokit.pipelines.presets.basic:pipeline \
  --index data/icaird/cervical_mini/index.csv \
  --labels data/icaird/cervical_mini/labels.json \
  --output runs/cervical_basic

# With parameter overrides
histokit run histokit.pipelines.presets.basic:pipeline \
  --index data/icaird/cervical_mini/index.csv \
  --labels data/icaird/cervical_mini/labels.json \
  --output runs/cervical_512 \
  --set patch_size=512

# Overwrite a previous run
histokit run histokit.pipelines.presets.basic:pipeline \
  --index data/icaird/cervical_mini/index.csv \
  --labels data/icaird/cervical_mini/labels.json \
  --output runs/cervical_basic \
  --overwrite

histokit preview

Run a pipeline on a single sample and save diagnostic images (thumbnail, patch overlay) to an output directory. Useful for checking pipeline settings before a full run.

# Preview the first sample in the dataset
histokit preview histokit.pipelines.presets.basic:pipeline \
  --index data/icaird/cervical_mini/index.csv \
  --labels data/icaird/cervical_mini/labels.json \
  --output preview/cervical

# Preview a specific sample
histokit preview histokit.pipelines.presets.basic:pipeline \
  --index data/icaird/cervical_mini/index.csv \
  --labels data/icaird/cervical_mini/labels.json \
  --output preview/cervical \
  --sample IC-CX-00001-01

histokit export

Export patch images from a saved PatchSet to label-name subdirectories, compatible with torchvision.datasets.ImageFolder.

histokit export runs/cervical_basic \
  --index data/icaird/cervical_mini/index.csv \
  --labels data/icaird/cervical_mini/labels.json \
  --output patches/cervical_basic

The output directory will contain one folder per label with individual patch PNG files and a provenance.json recording how the export was produced.

Pipeline Stages

Histokit is based around configurable pipelines with the following stages:

  1. Patch Selection
  2. Foreground Identification
  3. Patch Labelling
  4. Patch Filtering
  5. Patch Rendering

Pipelines consume dataset objects that represent a set of slides and, optionally, their annotations. They output a patchset: an object that stores patch coordinates, their labels, and their provenance (the slide they came from, the way they have been processed). The patchset can then be used to generate patch images or an input into a downstream feature extraction model.

Patch Selection

This stage decides how the patches will be sampled from the whole slide images, based on:

  • geometry (level, patch size, stride)
  • strategy (grid, from annotation)
  • sampling (dense, sparse, multiscale)

Foreground Identification

The stage classifies the patches into Foreground and Background. This usually means tissue detection but it might also mean blood and mucus detection.

  • methods (fixed threshold, otsu)
  • resolution (thumbnail vs patch-level)
  • threshold (minimum tissue fraction)

Patch Labelling

Assign labels to the patches derived from annotations. This involves rendering the annotation polygons and converting them to patch labels. This may involve having a default label that everything that is not labelled uses.

Patch Selection

The stage applies selection rules such as:

  • should the background be dropped?
  • should only labelled patches be used?
  • remove unreliable or corrupted patches based on quality

Patch Rendering

The patchset can then be used to export the patch images. The patches can be normalised using stain normalisation.

Data Model

Datasets - represent a set of slide, their annoations, and meta data. Datasets provide the following information:

  • a list of the slides
  • a list of slide annotations (optional)
  • the format to use to load the slides
  • meta data about each slide (multiple slide level labels)

Patchset - an index of the patches on a slide, their labels, provinence, and other meta data.

The patchset is constructed thoughout the different stages, i.e. the different stages add the

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

histokit-0.0.1.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

histokit-0.0.1-py3-none-any.whl (49.4 kB view details)

Uploaded Python 3

File details

Details for the file histokit-0.0.1.tar.gz.

File metadata

  • Download URL: histokit-0.0.1.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for histokit-0.0.1.tar.gz
Algorithm Hash digest
SHA256 0637d22f7afb92fdaa01ffb7b00ed21fb9c0f53104013dbbc5e8fdbe6833cb3b
MD5 46c8b9f0b1170c1af8275f2ae324a729
BLAKE2b-256 9ce94ceb3f5f3c7dfa8adbb56c41f52a22677e0f979b057460aaaa0e9836f6a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for histokit-0.0.1.tar.gz:

Publisher: publish.yml on davemor/histokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file histokit-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: histokit-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 49.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for histokit-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cb2ed552361048791ad8297244509920193a257a7c8ef61b79c1348643fd8b40
MD5 00557efbfddeb149cc247c7157d34da4
BLAKE2b-256 6e00ba6139c72f0ed656268ad1524e89522bd3f6ad73f1f61ff42ea9a1e33942

See more details on using hashes here.

Provenance

The following attestation bundles were made for histokit-0.0.1-py3-none-any.whl:

Publisher: publish.yml on davemor/histokit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page