Skip to main content

A comprehensive toolkit for Whole Slide Image processing, feature extraction, and clustering analysis

Project description

WSI Toolbox

Note: This package is currently unstable. API may change without notice.

A comprehensive toolkit for Whole Slide Image (WSI) processing, feature extraction, and clustering analysis.

Installation

# From PyPI
pip install wsi-toolbox

# From GitHub (latest)
pip install git+https://github.com/technoplasm/wsi-toolbox.git

Supported Models

The following foundation models are available:

Model Arch Params Dim HuggingFace
uni ViT-L/16 300M 1024 MahmoodLab/UNI
uni2 (default) ViT-H/14 681M 1536 MahmoodLab/UNI2-h
gigapath ViT-g/14 1.1B 1536 prov-gigapath/prov-gigapath
virchow ViT-H/14 632M 1280 paige-ai/Virchow
virchow2 ViT-H/14 632M 1280 paige-ai/Virchow2
h-optimus-0 ViT-g/14 1.1B 1536 bioptimus/H-optimus-0
conch15 ViT-L/16 300M 1024 MahmoodLab/conchv1_5
conch15_768 ViT-L/16 300M 768 MahmoodLab/conchv1_5
midnight ViT-g/14 1.1B 1536 SophontAI/OpenMidnight
phikon2 ViT-L/16 300M 1024 owkin/phikon-v2

conch15_768 outputs FC-projected features (not cls_token), intended for TITAN input.

Setup: These models require HuggingFace authentication. Accept the license on each model page, then:

huggingface-cli login

GPU Configuration

Device selection is controlled by --device / -D (CLI) or set_default_device() (Python). Default is auto.

Value Behavior
auto (default) Detect all GPUs. Multiple GPUs → parallel inference. Single GPU → cuda:0. No GPU → cpu (with warning)
cuda:0 Use GPU 0 only. Falls back to cpu if unavailable (with warning)
cuda:1 Use GPU 1 only
cuda:0,1,3 Use specified GPUs in parallel
cpu CPU only
wt extract -i sample.ndpi -D auto           # Auto-detect (default)
wt extract -i sample.ndpi -D cuda:0         # Single GPU
wt extract -i sample.ndpi -D cuda:0,1       # 2 GPUs in parallel
wt.set_default_device('cuda:0,1')  # Use GPU 0 and 1

For the Streamlit app, set via environment variable:

WT_DEVICE=cuda:0 uv run task app

Quick Start

# 1. Extract features from WSI
wt extract -i sample.ndpi -o sample.h5

# 2. Run clustering
wt cluster -i sample.h5

# 3. Generate preview image (requires sample.ndpi in same directory)
wt preview -i sample.h5
import wsi_toolbox as wt

wt.set_default_model_preset('uni2')
wt.set_default_device('auto')

# 1. Extract
cmd = wt.FeatureExtractionCommand(batch_size=256)
cmd('sample.h5', wsi_path='sample.ndpi')

# 2. Cluster
cluster_cmd = wt.ClusteringCommand(resolution=1.0)
cluster_cmd(['sample.h5'])

# 3. Preview
preview_cmd = wt.PreviewClustersCommand()
img = preview_cmd('sample.h5')
img.save('sample_preview.jpg')

Important: preview / preview-score commands require the original WSI file with the same stem in the same directory (e.g., sample.h5 needs sample.ndpi).

Commands

CLI is available as wsi-toolbox or wt. Each command has --help.


extract

Extract patch embeddings from WSI using foundation models.

CLI Python
wt extract -i sample.ndpi -o sample.h5 FeatureExtractionCommand()(h5_path, wsi_path=...)
wt extract -i sample.ndpi -o sample.h5
wt extract -i sample.ndpi -M gigapath      # Use Gigapath model
wt extract -i sample.ndpi -M virchow2      # Use Virchow2 model
wt extract -i sample.ndpi -M conch15_768   # CONCH v1.5 (768D via AttentionalPooler)
wt extract -i sample.ndpi -M midnight      # OpenMidnight model
wt extract -i sample.ndpi -L               # Include latent features
wt extract -i sample.ndpi -D cuda:0,1      # Multi-GPU parallel
cmd = wt.FeatureExtractionCommand(batch_size=256, with_latent=True)
result = cmd('sample.h5', wsi_path='sample.ndpi')
# result.feature_dim, result.patch_count

cluster

Run Leiden clustering on embeddings.

CLI Python
wt cluster -i sample.h5 ClusteringCommand()(['sample.h5'])
wt cluster -i sample.h5
wt cluster -i sample.h5 --resolution 0.5   # Fewer clusters
cmd = wt.ClusteringCommand(resolution=1.0)
result = cmd(['sample.h5'])
# result.cluster_count, result.target_path

See Advanced Usage for multi-file clustering and sub-clustering.


preview

Generate cluster overlay image. Requires WSI with same stem.

CLI Python
wt preview -i sample.h5 PreviewClustersCommand()('sample.h5')
wt preview -i sample.h5
wt preview -i sample.h5 -f 1 2 3           # Filter to clusters 1,2,3
wt preview -i sample.h5 --size 32          # Smaller thumbnails
cmd = wt.PreviewClustersCommand(size=64)
img = cmd('sample.h5', namespace='default')
img.save('preview.jpg')

umap

Compute UMAP projection.

CLI Python
wt umap -i sample.h5 UmapCommand()(['sample.h5'])
wt umap -i sample.h5
wt umap -i sample.h5 --show                # Display plot
wt umap -i sample.h5 --save                # Save plot
cmd = wt.UmapCommand(n_neighbors=15, min_dist=0.1)
result = cmd(['sample.h5'])
# result.target_path → 'uni/default/umap'

pca

Compute PCA projection.

CLI Python
wt pca -i sample.h5 PCACommand()(['sample.h5'])
wt pca -i sample.h5
wt pca -i sample.h5 -n 2                   # 2 components
wt pca -i sample.h5 --show                 # Display plot
cmd = wt.PCACommand(n_components=1, scaler='minmax')
result = cmd(['sample.h5'])
# result.target_path → 'uni/default/pca1'

preview-score

Generate score heatmap overlay. Requires WSI with same stem.

CLI Python
wt preview-score -i sample.h5 -n pca1 PreviewScoresCommand()('sample.h5', score_name='pca1')
wt preview-score -i sample.h5 -n pca1
wt preview-score -i sample.h5 -n pca1 --cmap viridis
wt preview-score -i sample.h5 -n pca1 --invert
cmd = wt.PreviewScoresCommand(size=64)
img = cmd('sample.h5', score_name='pca1', cmap_name='jet')
img.save('pca_heatmap.jpg')

show

Display HDF5 file structure.

CLI Python
wt show -i sample.h5 ShowCommand()('sample.h5')
wt show -i sample.h5
wt show -i sample.h5 -v                    # Verbose

thumb

Generate thumbnail from WSI.

CLI Python
wt thumb -i sample.ndpi wsi.generate_thumbnail()
wt thumb -i sample.ndpi
wt thumb -i sample.ndpi -w 1024            # Specify width

dzi

Export WSI to Deep Zoom Image format (for OpenSeadragon).

CLI Python
wt dzi -i sample.ndpi -o ./out DziCommand()(wsi_path, output_dir, name)
wt dzi -i sample.ndpi -o ./output
wt dzi -i sample.ndpi -o ./output -t 512   # Tile size

cache (optional)

Pre-cache patch images for repeated access:

wt cache -i sample.ndpi -o sample.h5
wt extract -i sample.h5   # Uses cache
wt preview -i sample.h5   # Uses cache

Structure:

cache/{patch_size}/
├── patches       # [N, H, W, 3] images
└── coordinates   # [N, 2] coords

migrate

Migrate old HDF5 format to new format.

wt migrate -i sample.h5
wt migrate -i sample1.h5 sample2.h5      # Multiple files

HDF5 File Structure

All data is stored in a single HDF5 file. Use wt show -i sample.h5 to inspect.

Root Attributes (Metadata)

with h5py.File('sample.h5', 'r') as f:
    # WSI metadata
    f.attrs['original_mpp']      # Original microns per pixel
    f.attrs['original_width']    # Original width (px)
    f.attrs['original_height']   # Original height (px)

    # Patch grid info
    f.attrs['mpp']               # Actual mpp used
    f.attrs['patch_size']        # Patch size (e.g., 256)
    f.attrs['patch_count']       # Total patches
    f.attrs['cols']              # Grid columns
    f.attrs['rows']              # Grid rows

Model Features

Features are stored under {model}/. Supported models: uni, uni2, gigapath, virchow, virchow2, h-optimus-0, conch15, conch15_768, midnight, phikon2.

{model}/
├── features        # [N, D] patch embeddings
│                   #   uni: D=1024
│                   #   uni2: D=1536
│                   #   gigapath: D=1536
│                   #   virchow: D=1280
│                   #   virchow2: D=1280
│                   #   h-optimus-0: D=1536
│                   #   conch15: D=1024
│                   #   conch15_768: D=768
│                   #   midnight: D=1536
│                   #   phikon2: D=1024
├── coordinates     # [N, 2] patch coordinates (x, y pixels)
└── latent_features # [N, L, D] optional (with -L flag)
with h5py.File('sample.h5', 'r') as f:
    features = f['uni/features'][:]         # (N, 1024)
    coords = f['uni/coordinates'][:]        # (N, 2)

Analysis Results (Hierarchical)

Results are stored under {model}/{namespace}/.

{model}/{namespace}/
├── clusters     # [N] cluster labels (int)
├── umap         # [N, 2] UMAP coordinates
└── pca1         # [N] PCA scores

Namespace:

  • Single file: default
  • Multi-file: file1+file2+... (auto-generated)

Sub-clustering (filter hierarchy):

uni/default/clusters                           # Base
uni/default/filter/1+2+3/clusters              # Sub-cluster of 1,2,3
uni/default/filter/1+2+3/filter/0+1/clusters   # Further nesting

See Advanced Usage for examples.

Writing Status

Large datasets have a writing attribute (True during write, False when complete).

if f['uni/features'].attrs.get('writing', False):
    raise RuntimeError('Dataset is incomplete')

Advanced Usage

Multi-file Joint Clustering

Cluster multiple WSIs together to find common patterns across samples.

# 1. Extract features from each WSI
wt extract -i sample1.ndpi -o sample1.h5
wt extract -i sample2.ndpi -o sample2.h5

# 2. Joint clustering (namespace auto-generated as "sample1+sample2")
wt cluster -i sample1.h5 sample2.h5

# 3. Analysis on joint clusters
wt pca -i sample1.h5 sample2.h5
wt umap -i sample1.h5 sample2.h5

# 4. Preview each file (uses shared cluster labels)
wt preview -i sample1.h5 -N sample1+sample2
wt preview -i sample2.h5 -N sample1+sample2
# Joint clustering
cmd = wt.ClusteringCommand()
result = cmd(['sample1.h5', 'sample2.h5'])
# → namespace: 'sample1+sample2'
# → uni/sample1+sample2/clusters in both files

Sub-clustering

Analyze a subset of clusters in more detail.

# Sub-cluster within clusters 1,2,3
wt cluster -i sample1.h5 sample2.h5 -f 1 2 3

# PCA/UMAP on filtered subset
wt pca -i sample1.h5 sample2.h5 -f 1 2 3
wt umap -i sample1.h5 sample2.h5 -f 1 2 3

# Preview filtered clusters
wt preview -i sample1.h5 -N sample1+sample2 -f 1 2 3
# Sub-cluster
cmd = wt.ClusteringCommand(parent_filters=[[1, 2, 3]])
cmd(['sample1.h5', 'sample2.h5'])
# → uni/sample1+sample2/filter/1+2+3/clusters

# PCA on filtered subset
cmd = wt.PCACommand(parent_filters=[[1, 2, 3]])
cmd(['sample1.h5', 'sample2.h5'])
# → uni/sample1+sample2/filter/1+2+3/pca1

Streamlit App

uv run task app

# Environment variables
WT_MODEL=gigapath WT_DEVICE=cuda:1 WT_PREFETCH=2 uv run task app

Development

git clone https://github.com/technoplasm/wsi-toolbox.git
cd wsi-toolbox
uv sync

uv run wt --help
uv run task app

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsi_toolbox-0.4.2.tar.gz (175.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wsi_toolbox-0.4.2-py3-none-any.whl (92.3 kB view details)

Uploaded Python 3

File details

Details for the file wsi_toolbox-0.4.2.tar.gz.

File metadata

  • Download URL: wsi_toolbox-0.4.2.tar.gz
  • Upload date:
  • Size: 175.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for wsi_toolbox-0.4.2.tar.gz
Algorithm Hash digest
SHA256 228df63fed292ddafd364b2800b9d70ac2674feb9678e469cdf5159b2f987073
MD5 f54d1dd8b247fa4509a34de5ff08a7a4
BLAKE2b-256 dd76a93aabe2c3fea268f1cdca42a7b51fd489c54f556b9499e59342fc2a9009

See more details on using hashes here.

File details

Details for the file wsi_toolbox-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: wsi_toolbox-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 92.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for wsi_toolbox-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5a07cfca7f31234e0624cc29c3ffea92d7843bfb10737a6a4eec4cbb44f0b0df
MD5 cf544a354d31ee5feddb5a9758c4dbe3
BLAKE2b-256 9ad9fab0f00a773bbddbadf79800d2a0c7c6b34631c6aacd2f97d3b188251ed9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page