A histopathology toolkit for working with whole slide images and their annotations.
Project description
Histokit
Histokit is a histopathology whole slide image preprocessing package for Python and command line tool.
Install as a tool
Histokit can be installed as a command line tool directly from this repository using uv. Here is an example:
uv tool update-shell
uv tool install git+https://github.com/davemor/histokit
CLI Interface
❯ histokit --help
Usage: histokit [OPTIONS] COMMAND [ARGS]...
histokit — histopathology toolkit CLI.
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy │
│ it or customize the installation. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────╮
│ list List available built-in pipelines. │
│ plan Show pipeline stages and resolved parameters. │
│ run Run a pipeline on a dataset and save the resulting PatchSet. │
│ preview Preview a pipeline on a single sample with diagnostic images. │
│ export Export patch images from a saved PatchSet to label-name │
│ directories. │
╰──────────────────────────────────────────────────────────────────────────────╯
histokit list
Show all built-in pipelines that ship with histokit. Each entry shows the import reference and stage count.
histokit list
Available pipelines:
histokit.pipelines.presets.basic:pipeline (4 stages)
histokit.pipelines.presets.research:pipeline (4 stages)
histokit plan
Inspect a pipeline's stages and see what parameters they use. Use --set to preview overrides without running anything.
# Show default parameters
histokit plan histokit.pipelines.presets.basic:pipeline
# Preview with overrides
histokit plan histokit.pipelines.presets.basic:pipeline --set patch_size=512 --set level=0
histokit run
Run a pipeline on a full dataset and save the combined PatchSet to disk.
# Basic run
histokit run histokit.pipelines.presets.basic:pipeline \
--index data/icaird/cervical_mini/index.csv \
--labels data/icaird/cervical_mini/labels.json \
--output runs/cervical_basic
# With parameter overrides
histokit run histokit.pipelines.presets.basic:pipeline \
--index data/icaird/cervical_mini/index.csv \
--labels data/icaird/cervical_mini/labels.json \
--output runs/cervical_512 \
--set patch_size=512
# Overwrite a previous run
histokit run histokit.pipelines.presets.basic:pipeline \
--index data/icaird/cervical_mini/index.csv \
--labels data/icaird/cervical_mini/labels.json \
--output runs/cervical_basic \
--overwrite
histokit preview
Run a pipeline on a single sample and save diagnostic images (thumbnail, patch overlay) to an output directory. Useful for checking pipeline settings before a full run.
# Preview the first sample in the dataset
histokit preview histokit.pipelines.presets.basic:pipeline \
--index data/icaird/cervical_mini/index.csv \
--labels data/icaird/cervical_mini/labels.json \
--output preview/cervical
# Preview a specific sample
histokit preview histokit.pipelines.presets.basic:pipeline \
--index data/icaird/cervical_mini/index.csv \
--labels data/icaird/cervical_mini/labels.json \
--output preview/cervical \
--sample IC-CX-00001-01
histokit export
Export patch images from a saved PatchSet to label-name subdirectories, compatible with torchvision.datasets.ImageFolder.
histokit export runs/cervical_basic \
--index data/icaird/cervical_mini/index.csv \
--labels data/icaird/cervical_mini/labels.json \
--output patches/cervical_basic
The output directory will contain one folder per label with individual patch PNG files and a provenance.json recording how the export was produced.
Pipeline Stages
Histokit is based around configurable pipelines with the following stages:
- Patch Selection
- Foreground Identification
- Patch Labelling
- Patch Filtering
- Patch Rendering
Pipelines consume dataset objects that represent a set of slides and, optionally, their annotations. They output a patchset: an object that stores patch coordinates, their labels, and their provenance (the slide they came from, the way they have been processed). The patchset can then be used to generate patch images or an input into a downstream feature extraction model.
Patch Selection
This stage decides how the patches will be sampled from the whole slide images, based on:
- geometry (level, patch size, stride)
- strategy (grid, from annotation)
- sampling (dense, sparse, multiscale)
Foreground Identification
The stage classifies the patches into Foreground and Background. This usually means tissue detection but it might also mean blood and mucus detection.
- methods (fixed threshold, otsu)
- resolution (thumbnail vs patch-level)
- threshold (minimum tissue fraction)
Patch Labelling
Assign labels to the patches derived from annotations. This involves rendering the annotation polygons and converting them to patch labels. This may involve having a default label that everything that is not labelled uses.
Patch Selection
The stage applies selection rules such as:
- should the background be dropped?
- should only labelled patches be used?
- remove unreliable or corrupted patches based on quality
Patch Rendering
The patchset can then be used to export the patch images. The patches can be normalised using stain normalisation.
Data Model
Datasets - represent a set of slide, their annoations, and meta data. Datasets provide the following information:
- a list of the slides
- a list of slide annotations (optional)
- the format to use to load the slides
- meta data about each slide (multiple slide level labels)
Patchset - an index of the patches on a slide, their labels, provinence, and other meta data.
The patchset is constructed thoughout the different stages, i.e. the different stages add the
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file histokit-0.0.1.tar.gz.
File metadata
- Download URL: histokit-0.0.1.tar.gz
- Upload date:
- Size: 29.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0637d22f7afb92fdaa01ffb7b00ed21fb9c0f53104013dbbc5e8fdbe6833cb3b
|
|
| MD5 |
46c8b9f0b1170c1af8275f2ae324a729
|
|
| BLAKE2b-256 |
9ce94ceb3f5f3c7dfa8adbb56c41f52a22677e0f979b057460aaaa0e9836f6a0
|
Provenance
The following attestation bundles were made for histokit-0.0.1.tar.gz:
Publisher:
publish.yml on davemor/histokit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
histokit-0.0.1.tar.gz -
Subject digest:
0637d22f7afb92fdaa01ffb7b00ed21fb9c0f53104013dbbc5e8fdbe6833cb3b - Sigstore transparency entry: 1293642000
- Sigstore integration time:
-
Permalink:
davemor/histokit@6033626ff23eafeaaa49e77e751b129e46d13a17 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/davemor
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6033626ff23eafeaaa49e77e751b129e46d13a17 -
Trigger Event:
release
-
Statement type:
File details
Details for the file histokit-0.0.1-py3-none-any.whl.
File metadata
- Download URL: histokit-0.0.1-py3-none-any.whl
- Upload date:
- Size: 49.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb2ed552361048791ad8297244509920193a257a7c8ef61b79c1348643fd8b40
|
|
| MD5 |
00557efbfddeb149cc247c7157d34da4
|
|
| BLAKE2b-256 |
6e00ba6139c72f0ed656268ad1524e89522bd3f6ad73f1f61ff42ea9a1e33942
|
Provenance
The following attestation bundles were made for histokit-0.0.1-py3-none-any.whl:
Publisher:
publish.yml on davemor/histokit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
histokit-0.0.1-py3-none-any.whl -
Subject digest:
cb2ed552361048791ad8297244509920193a257a7c8ef61b79c1348643fd8b40 - Sigstore transparency entry: 1293642021
- Sigstore integration time:
-
Permalink:
davemor/histokit@6033626ff23eafeaaa49e77e751b129e46d13a17 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/davemor
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6033626ff23eafeaaa49e77e751b129e46d13a17 -
Trigger Event:
release
-
Statement type: