Skip to main content

A comprehensive toolkit for Whole Slide Image processing, feature extraction, and clustering analysis

Project description

WSI Toolbox

Note: This package is currently unstable. API may change without notice.

A comprehensive toolkit for Whole Slide Image (WSI) processing, feature extraction, and clustering analysis.

Installation

# From PyPI
pip install wsi-toolbox

# From GitHub (latest)
pip install git+https://github.com/technoplasm/wsi-toolbox.git

Quick Start

As a Python Library

import wsi_toolbox as wt

wt.set_default_model_preset('uni')
cmd = wt.Wsi2HDF5Command(patch_size=256)
result = cmd('input.ndpi', 'output.h5')

See README_API.md for API documentation.

As a CLI Tool

After pip install wsi-toolbox, the CLI is available as wsi-toolbox or wt. For development, use uv run wt.

# Extract tile patches from WSI into HDF5
wt wsi2h5 -i input.ndpi -o output.h5

# Extract features using foundation model
wt extract -i output.h5

# Run Leiden clustering on embeddings
wt cluster -i output.h5

# Compute UMAP projection
wt umap -i output.h5

# Compute PCA projection
wt pca -i output.h5

# Generate cluster overlay preview image
wt preview -i output.h5

# Generate PCA score heatmap preview
wt preview-score -i output.h5 -n pca1

# Show HDF5 file structure
wt show -i output.h5

# Export WSI to Deep Zoom Image format
wt dzi -i input.ndpi -o ./output

# Generate thumbnail from WSI
wt thumb -i input.ndpi

Each subcommand has detailed help: wt <subcommand> --help

Streamlit Web Application

uv run task app

HDF5 File Structure

WSI-toolbox stores all data in a single HDF5 file.

Core Data

patches                    # Patch images: [N, H, W, 3]
coordinates                # Patch pixel coordinates: [N, 2]

Metadata

Metadata is stored in file attrs (recommended):

with h5py.File('output.h5', 'r') as f:
    mpp = f.attrs['mpp']
    patch_size = f.attrs['patch_size']
    patch_count = f.attrs['patch_count']
    # ...

Available attrs: original_mpp, original_width, original_height, image_level, mpp, scale, patch_size, patch_count, cols, rows

Legacy metadata/* datasets are kept for backward compatibility but attrs are preferred.

Model Features

{model}/features           # Patch features: [N, D]
                           #   uni: [N, 1024]
                           #   gigapath: [N, 1536]
                           #   virchow2: [N, 2560]
{model}/latent_features    # Latent features (optional): [N, L, D]

Clustering & Analysis (Hierarchical)

Results are stored in a hierarchical namespace structure:

{model}/{namespace}/clusters     # Cluster labels: [N]
{model}/{namespace}/umap         # UMAP coordinates: [N, 2]
{model}/{namespace}/pca1         # PCA scores: [N] or [N, k]

Namespace:

  • Single file: default
  • Multiple files: file1+file2+... (auto-generated from filenames)

Filter hierarchy: Sub-clustering creates nested paths:

# Base clustering
uni/default/clusters

# Sub-cluster patches in clusters 1, 2, 3
uni/default/filter/1+2+3/clusters

# Further sub-cluster within that
uni/default/filter/1+2+3/filter/0+1/clusters

Each level stores its own clusters, umap, pca results independently.

Dataset Writing Status

Large datasets (patches, features, latent_features) have a writing attribute to indicate write status (True during write, False when complete). Incomplete datasets are automatically deleted on error.

ds = f['patches']  # or f['uni/features']
if ds.attrs.get('writing', False):
    raise RuntimeError('Dataset is incomplete')

Features

  • WSI processing (.ndpi, .svs, .tiff → HDF5)
  • Feature extraction (UNI, Gigapath, Virchow2)
  • Leiden clustering with UMAP visualization
  • Preview generation (cluster overlays, PCA heatmaps)
  • Type-safe command pattern with Pydantic results
  • CLI, Python API, and Streamlit GUI

Documentation

Development

# Clone and install
git clone https://github.com/technoplasm/wsi-toolbox.git
cd wsi-toolbox
uv sync

# Run CLI
uv run wt --help

# Run Streamlit app
uv run task app

Optional: Gigapath support

uv sync --group gigapath

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsi_toolbox-0.3.0.tar.gz (254.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wsi_toolbox-0.3.0-py3-none-any.whl (69.4 kB view details)

Uploaded Python 3

File details

Details for the file wsi_toolbox-0.3.0.tar.gz.

File metadata

  • Download URL: wsi_toolbox-0.3.0.tar.gz
  • Upload date:
  • Size: 254.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for wsi_toolbox-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9fa1f3609e049e7a989c6173b54b05847c426614320bbedc1ccdd6b4eae5de1e
MD5 886d8770dfb3037c69b6068645cd1ed8
BLAKE2b-256 517338a5dc6f2bf673a2967fea0e26892bc86ed3e8370b1ccb58ea4aa600daab

See more details on using hashes here.

File details

Details for the file wsi_toolbox-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: wsi_toolbox-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 69.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for wsi_toolbox-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3cb31304fcb7521e5785ad8c183bdfdcda02776553c0c75bec754477f4f92824
MD5 302829fe68f480582a125eb50ca8e4b6
BLAKE2b-256 7ddf860ce1849f502cb9179b7d9282666037f87ea421c68857c9bdc39a57a0da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page