Skip to main content

A comprehensive toolkit for Whole Slide Image processing, feature extraction, and clustering analysis

Project description

WSI Toolbox

Note: This package is currently unstable. API may change without notice.

A comprehensive toolkit for Whole Slide Image (WSI) processing, feature extraction, and clustering analysis.

Installation

# From PyPI
pip install wsi-toolbox

# From GitHub (latest)
pip install git+https://github.com/technoplasm/wsi-toolbox.git

Supported Models

The following foundation models are available:

Model Dim HuggingFace
uni (default) 1024 MahmoodLab/UNI
gigapath 1536 prov-gigapath/prov-gigapath
virchow2 2560 paige-ai/Virchow2

Setup: These models require HuggingFace authentication. Accept the license on each model page, then:

huggingface-cli login

Quick Start

# 1. Extract features from WSI
wt extract -i sample.ndpi -o sample.h5

# 2. Run clustering
wt cluster -i sample.h5

# 3. Generate preview image (requires sample.ndpi in same directory)
wt preview -i sample.h5
import wsi_toolbox as wt

wt.set_default_model_preset('uni')
wt.set_default_device('cuda')

# 1. Extract
cmd = wt.FeatureExtractionCommand(batch_size=256)
cmd('sample.h5', wsi_path='sample.ndpi')

# 2. Cluster
cluster_cmd = wt.ClusteringCommand(resolution=1.0)
cluster_cmd(['sample.h5'])

# 3. Preview
preview_cmd = wt.PreviewClustersCommand()
img = preview_cmd('sample.h5')
img.save('sample_preview.jpg')

Important: preview / preview-score commands require the original WSI file with the same stem in the same directory (e.g., sample.h5 needs sample.ndpi).

Commands

CLI is available as wsi-toolbox or wt. Each command has --help.


extract

Extract patch embeddings from WSI using foundation models.

CLI Python
wt extract -i sample.ndpi -o sample.h5 FeatureExtractionCommand()(h5_path, wsi_path=...)
wt extract -i sample.ndpi -o sample.h5
wt extract -i sample.ndpi -M gigapath      # Use Gigapath model
wt extract -i sample.ndpi -M virchow2      # Use Virchow2 model
wt extract -i sample.ndpi -L               # Include latent features
cmd = wt.FeatureExtractionCommand(batch_size=256, with_latent=True)
result = cmd('sample.h5', wsi_path='sample.ndpi')
# result.feature_dim, result.patch_count

cluster

Run Leiden clustering on embeddings.

CLI Python
wt cluster -i sample.h5 ClusteringCommand()(['sample.h5'])
wt cluster -i sample.h5
wt cluster -i sample.h5 --resolution 0.5   # Fewer clusters
cmd = wt.ClusteringCommand(resolution=1.0)
result = cmd(['sample.h5'])
# result.cluster_count, result.target_path

See Advanced Usage for multi-file clustering and sub-clustering.


preview

Generate cluster overlay image. Requires WSI with same stem.

CLI Python
wt preview -i sample.h5 PreviewClustersCommand()('sample.h5')
wt preview -i sample.h5
wt preview -i sample.h5 -f 1 2 3           # Filter to clusters 1,2,3
wt preview -i sample.h5 --size 32          # Smaller thumbnails
cmd = wt.PreviewClustersCommand(size=64)
img = cmd('sample.h5', namespace='default')
img.save('preview.jpg')

umap

Compute UMAP projection.

CLI Python
wt umap -i sample.h5 UmapCommand()(['sample.h5'])
wt umap -i sample.h5
wt umap -i sample.h5 --show                # Display plot
wt umap -i sample.h5 --save                # Save plot
cmd = wt.UmapCommand(n_neighbors=15, min_dist=0.1)
result = cmd(['sample.h5'])
# result.target_path → 'uni/default/umap'

pca

Compute PCA projection.

CLI Python
wt pca -i sample.h5 PCACommand()(['sample.h5'])
wt pca -i sample.h5
wt pca -i sample.h5 -n 2                   # 2 components
wt pca -i sample.h5 --show                 # Display plot
cmd = wt.PCACommand(n_components=1, scaler='minmax')
result = cmd(['sample.h5'])
# result.target_path → 'uni/default/pca1'

preview-score

Generate score heatmap overlay. Requires WSI with same stem.

CLI Python
wt preview-score -i sample.h5 -n pca1 PreviewScoresCommand()('sample.h5', score_name='pca1')
wt preview-score -i sample.h5 -n pca1
wt preview-score -i sample.h5 -n pca1 --cmap viridis
wt preview-score -i sample.h5 -n pca1 --invert
cmd = wt.PreviewScoresCommand(size=64)
img = cmd('sample.h5', score_name='pca1', cmap_name='jet')
img.save('pca_heatmap.jpg')

show

Display HDF5 file structure.

CLI Python
wt show -i sample.h5 ShowCommand()('sample.h5')
wt show -i sample.h5
wt show -i sample.h5 -v                    # Verbose

thumb

Generate thumbnail from WSI.

CLI Python
wt thumb -i sample.ndpi wsi.generate_thumbnail()
wt thumb -i sample.ndpi
wt thumb -i sample.ndpi -w 1024            # Specify width

dzi

Export WSI to Deep Zoom Image format (for OpenSeadragon).

CLI Python
wt dzi -i sample.ndpi -o ./out DziCommand()(wsi_path, output_dir, name)
wt dzi -i sample.ndpi -o ./output
wt dzi -i sample.ndpi -o ./output -t 512   # Tile size

cache (optional)

Pre-cache patch images for repeated access:

wt cache -i sample.ndpi -o sample.h5
wt extract -i sample.h5   # Uses cache
wt preview -i sample.h5   # Uses cache

Structure:

cache/{patch_size}/
├── patches       # [N, H, W, 3] images
└── coordinates   # [N, 2] coords

HDF5 File Structure

All data is stored in a single HDF5 file. Use wt show -i sample.h5 to inspect.

Root Attributes (Metadata)

with h5py.File('sample.h5', 'r') as f:
    # WSI metadata
    f.attrs['original_mpp']      # Original microns per pixel
    f.attrs['original_width']    # Original width (px)
    f.attrs['original_height']   # Original height (px)

    # Patch grid info
    f.attrs['mpp']               # Actual mpp used
    f.attrs['patch_size']        # Patch size (e.g., 256)
    f.attrs['patch_count']       # Total patches
    f.attrs['cols']              # Grid columns
    f.attrs['rows']              # Grid rows

Model Features

Features are stored under {model}/. Supported models: uni, gigapath, virchow2.

{model}/
├── features        # [N, D] patch embeddings
│                   #   uni: D=1024
│                   #   gigapath: D=1536
│                   #   virchow2: D=2560
├── coordinates     # [N, 2] patch coordinates (x, y pixels)
└── latent_features # [N, L, D] optional (with -L flag)
with h5py.File('sample.h5', 'r') as f:
    features = f['uni/features'][:]         # (N, 1024)
    coords = f['uni/coordinates'][:]        # (N, 2)

Analysis Results (Hierarchical)

Results are stored under {model}/{namespace}/.

{model}/{namespace}/
├── clusters     # [N] cluster labels (int)
├── umap         # [N, 2] UMAP coordinates
└── pca1         # [N] PCA scores

Namespace:

  • Single file: default
  • Multi-file: file1+file2+... (auto-generated)

Sub-clustering (filter hierarchy):

uni/default/clusters                           # Base
uni/default/filter/1+2+3/clusters              # Sub-cluster of 1,2,3
uni/default/filter/1+2+3/filter/0+1/clusters   # Further nesting

See Advanced Usage for examples.

Writing Status

Large datasets have a writing attribute (True during write, False when complete).

if f['uni/features'].attrs.get('writing', False):
    raise RuntimeError('Dataset is incomplete')

Advanced Usage

Multi-file Joint Clustering

Cluster multiple WSIs together to find common patterns across samples.

# 1. Extract features from each WSI
wt extract -i sample1.ndpi -o sample1.h5
wt extract -i sample2.ndpi -o sample2.h5

# 2. Joint clustering (namespace auto-generated as "sample1+sample2")
wt cluster -i sample1.h5 sample2.h5

# 3. Analysis on joint clusters
wt pca -i sample1.h5 sample2.h5
wt umap -i sample1.h5 sample2.h5

# 4. Preview each file (uses shared cluster labels)
wt preview -i sample1.h5 -N sample1+sample2
wt preview -i sample2.h5 -N sample1+sample2
# Joint clustering
cmd = wt.ClusteringCommand()
result = cmd(['sample1.h5', 'sample2.h5'])
# → namespace: 'sample1+sample2'
# → uni/sample1+sample2/clusters in both files

Sub-clustering

Analyze a subset of clusters in more detail.

# Sub-cluster within clusters 1,2,3
wt cluster -i sample1.h5 sample2.h5 -f 1 2 3

# PCA/UMAP on filtered subset
wt pca -i sample1.h5 sample2.h5 -f 1 2 3
wt umap -i sample1.h5 sample2.h5 -f 1 2 3

# Preview filtered clusters
wt preview -i sample1.h5 -N sample1+sample2 -f 1 2 3
# Sub-cluster
cmd = wt.ClusteringCommand(parent_filters=[[1, 2, 3]])
cmd(['sample1.h5', 'sample2.h5'])
# → uni/sample1+sample2/filter/1+2+3/clusters

# PCA on filtered subset
cmd = wt.PCACommand(parent_filters=[[1, 2, 3]])
cmd(['sample1.h5', 'sample2.h5'])
# → uni/sample1+sample2/filter/1+2+3/pca1

Streamlit App

uv run task app

Development

git clone https://github.com/technoplasm/wsi-toolbox.git
cd wsi-toolbox
uv sync

uv run wt --help
uv run task app

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsi_toolbox-0.3.2.tar.gz (241.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wsi_toolbox-0.3.2-py3-none-any.whl (84.2 kB view details)

Uploaded Python 3

File details

Details for the file wsi_toolbox-0.3.2.tar.gz.

File metadata

  • Download URL: wsi_toolbox-0.3.2.tar.gz
  • Upload date:
  • Size: 241.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for wsi_toolbox-0.3.2.tar.gz
Algorithm Hash digest
SHA256 66ac08d9a0a4a34e58845e8b2a3a110ac502ebcb95bbb229aece15d74a97cee8
MD5 28a4c185d7080d94fe1ca221de33146e
BLAKE2b-256 7cad3550dda5e68df1bf6bdf3c0c2130e59249da2e70a9dec851441c39194485

See more details on using hashes here.

File details

Details for the file wsi_toolbox-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: wsi_toolbox-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 84.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for wsi_toolbox-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a505fc2eb7cdceb6d0d49c9447802c369a75a63154a6452fc7f157d0c0104cc7
MD5 cb6a8431a30efb00cb99dc1584068cfa
BLAKE2b-256 9836cb210668924e8910b83ab5cc10c0c44d6a4f4544f04662351203e1ec137b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page