A comprehensive toolkit for Whole Slide Image processing, feature extraction, and clustering analysis
Project description
WSI Toolbox
Note: This package is currently unstable. API may change without notice.
A comprehensive toolkit for Whole Slide Image (WSI) processing, feature extraction, and clustering analysis.
Installation
# From PyPI
pip install wsi-toolbox
# From GitHub (latest)
pip install git+https://github.com/technoplasm/wsi-toolbox.git
Supported Models
The following foundation models are available:
| Model | Arch | Params | Dim | HuggingFace |
|---|---|---|---|---|
uni |
ViT-L/16 | 300M | 1024 | MahmoodLab/UNI |
uni2 (default) |
ViT-H/14 | 681M | 1536 | MahmoodLab/UNI2-h |
gigapath |
ViT-g/14 | 1.1B | 1536 | prov-gigapath/prov-gigapath |
virchow |
ViT-H/14 | 632M | 1280 | paige-ai/Virchow |
virchow2 |
ViT-H/14 | 632M | 1280 | paige-ai/Virchow2 |
h-optimus-0 |
ViT-g/14 | 1.1B | 1536 | bioptimus/H-optimus-0 |
conch15 |
ViT-L/16 | 300M | 1024 | MahmoodLab/conchv1_5 |
conch15_768 |
ViT-L/16 | 300M | 768 | MahmoodLab/conchv1_5 |
midnight |
ViT-g/14 | 1.1B | 1536 | SophontAI/OpenMidnight |
phikon2 |
ViT-L/16 | 300M | 1024 | owkin/phikon-v2 |
conch15_768 outputs FC-projected features (not cls_token), intended for TITAN input.
Setup: These models require HuggingFace authentication. Accept the license on each model page, then:
huggingface-cli login
GPU Configuration
Device selection is controlled by --device / -D (CLI) or set_default_device() (Python). Default is auto.
| Value | Behavior |
|---|---|
auto (default) |
Detect all GPUs. Multiple GPUs → parallel inference. Single GPU → cuda:0. No GPU → cpu (with warning) |
cuda:0 |
Use GPU 0 only. Falls back to cpu if unavailable (with warning) |
cuda:1 |
Use GPU 1 only |
cuda:0,1,3 |
Use specified GPUs in parallel |
cpu |
CPU only |
wt extract -i sample.ndpi -D auto # Auto-detect (default)
wt extract -i sample.ndpi -D cuda:0 # Single GPU
wt extract -i sample.ndpi -D cuda:0,1 # 2 GPUs in parallel
wt.set_default_device('cuda:0,1') # Use GPU 0 and 1
For the Streamlit app, set via environment variable:
WT_DEVICE=cuda:0 uv run task app
Quick Start
# 1. Extract features from WSI
wt extract -i sample.ndpi -o sample.h5
# 2. Run clustering
wt cluster -i sample.h5
# 3. Generate preview image (requires sample.ndpi in same directory)
wt preview -i sample.h5
import wsi_toolbox as wt
wt.set_default_model_preset('uni2')
wt.set_default_device('auto')
# 1. Extract
cmd = wt.FeatureExtractionCommand(batch_size=256)
cmd('sample.h5', wsi_path='sample.ndpi')
# 2. Cluster
cluster_cmd = wt.ClusteringCommand(resolution=1.0)
cluster_cmd(['sample.h5'])
# 3. Preview
preview_cmd = wt.PreviewClustersCommand()
img = preview_cmd('sample.h5')
img.save('sample_preview.jpg')
Important: preview / preview-score commands require the original WSI file with the same stem in the same directory (e.g., sample.h5 needs sample.ndpi).
Commands
CLI is available as wsi-toolbox or wt. Each command has --help.
extract
Extract patch embeddings from WSI using foundation models.
| CLI | Python |
|---|---|
wt extract -i sample.ndpi -o sample.h5 |
FeatureExtractionCommand()(h5_path, wsi_path=...) |
wt extract -i sample.ndpi -o sample.h5
wt extract -i sample.ndpi -M gigapath # Use Gigapath model
wt extract -i sample.ndpi -M virchow2 # Use Virchow2 model
wt extract -i sample.ndpi -M conch15_768 # CONCH v1.5 (768D via AttentionalPooler)
wt extract -i sample.ndpi -M midnight # OpenMidnight model
wt extract -i sample.ndpi -L # Include latent features
wt extract -i sample.ndpi -D cuda:0,1 # Multi-GPU parallel
cmd = wt.FeatureExtractionCommand(batch_size=256, with_latent=True)
result = cmd('sample.h5', wsi_path='sample.ndpi')
# result.feature_dim, result.patch_count
cluster
Run Leiden clustering on embeddings.
| CLI | Python |
|---|---|
wt cluster -i sample.h5 |
ClusteringCommand()(['sample.h5']) |
wt cluster -i sample.h5
wt cluster -i sample.h5 --resolution 0.5 # Fewer clusters
cmd = wt.ClusteringCommand(resolution=1.0)
result = cmd(['sample.h5'])
# result.cluster_count, result.target_path
See Advanced Usage for multi-file clustering and sub-clustering.
preview
Generate cluster overlay image. Requires WSI with same stem.
| CLI | Python |
|---|---|
wt preview -i sample.h5 |
PreviewClustersCommand()('sample.h5') |
wt preview -i sample.h5
wt preview -i sample.h5 -f 1 2 3 # Filter to clusters 1,2,3
wt preview -i sample.h5 --size 32 # Smaller thumbnails
cmd = wt.PreviewClustersCommand(size=64)
img = cmd('sample.h5', namespace='default')
img.save('preview.jpg')
umap
Compute UMAP projection.
| CLI | Python |
|---|---|
wt umap -i sample.h5 |
UmapCommand()(['sample.h5']) |
wt umap -i sample.h5
wt umap -i sample.h5 --show # Display plot
wt umap -i sample.h5 --save # Save plot
cmd = wt.UmapCommand(n_neighbors=15, min_dist=0.1)
result = cmd(['sample.h5'])
# result.target_path → 'uni/default/umap'
pca
Compute PCA projection.
| CLI | Python |
|---|---|
wt pca -i sample.h5 |
PCACommand()(['sample.h5']) |
wt pca -i sample.h5
wt pca -i sample.h5 -n 2 # 2 components
wt pca -i sample.h5 --show # Display plot
cmd = wt.PCACommand(n_components=1, scaler='minmax')
result = cmd(['sample.h5'])
# result.target_path → 'uni/default/pca1'
preview-score
Generate score heatmap overlay. Requires WSI with same stem.
| CLI | Python |
|---|---|
wt preview-score -i sample.h5 -n pca1 |
PreviewScoresCommand()('sample.h5', score_name='pca1') |
wt preview-score -i sample.h5 -n pca1
wt preview-score -i sample.h5 -n pca1 --cmap viridis
wt preview-score -i sample.h5 -n pca1 --invert
cmd = wt.PreviewScoresCommand(size=64)
img = cmd('sample.h5', score_name='pca1', cmap_name='jet')
img.save('pca_heatmap.jpg')
show
Display HDF5 file structure.
| CLI | Python |
|---|---|
wt show -i sample.h5 |
ShowCommand()('sample.h5') |
wt show -i sample.h5
wt show -i sample.h5 -v # Verbose
thumb
Generate thumbnail from WSI.
| CLI | Python |
|---|---|
wt thumb -i sample.ndpi |
wsi.generate_thumbnail() |
wt thumb -i sample.ndpi
wt thumb -i sample.ndpi -w 1024 # Specify width
dzi
Export WSI to Deep Zoom Image format (for OpenSeadragon).
| CLI | Python |
|---|---|
wt dzi -i sample.ndpi -o ./out |
DziCommand()(wsi_path, output_dir, name) |
wt dzi -i sample.ndpi -o ./output
wt dzi -i sample.ndpi -o ./output -t 512 # Tile size
cache (optional)
Pre-cache patch images for repeated access:
wt cache -i sample.ndpi -o sample.h5
wt extract -i sample.h5 # Uses cache
wt preview -i sample.h5 # Uses cache
Structure:
cache/{patch_size}/
├── patches # [N, H, W, 3] images
└── coordinates # [N, 2] coords
migrate
Migrate old HDF5 format to new format.
wt migrate -i sample.h5
wt migrate -i sample1.h5 sample2.h5 # Multiple files
HDF5 File Structure
All data is stored in a single HDF5 file. Use wt show -i sample.h5 to inspect.
Root Attributes (Metadata)
with h5py.File('sample.h5', 'r') as f:
# WSI metadata
f.attrs['original_mpp'] # Original microns per pixel
f.attrs['original_width'] # Original width (px)
f.attrs['original_height'] # Original height (px)
# Patch grid info
f.attrs['mpp'] # Actual mpp used
f.attrs['patch_size'] # Patch size (e.g., 256)
f.attrs['patch_count'] # Total patches
f.attrs['cols'] # Grid columns
f.attrs['rows'] # Grid rows
Model Features
Features are stored under {model}/. Supported models: uni, uni2, gigapath, virchow, virchow2, h-optimus-0, conch15, conch15_768, midnight, phikon2.
{model}/
├── features # [N, D] patch embeddings
│ # uni: D=1024
│ # uni2: D=1536
│ # gigapath: D=1536
│ # virchow: D=1280
│ # virchow2: D=1280
│ # h-optimus-0: D=1536
│ # conch15: D=1024
│ # conch15_768: D=768
│ # midnight: D=1536
│ # phikon2: D=1024
├── coordinates # [N, 2] patch coordinates (x, y pixels)
└── latent_features # [N, L, D] optional (with -L flag)
with h5py.File('sample.h5', 'r') as f:
features = f['uni/features'][:] # (N, 1024)
coords = f['uni/coordinates'][:] # (N, 2)
Analysis Results (Hierarchical)
Results are stored under {model}/{namespace}/.
{model}/{namespace}/
├── clusters # [N] cluster labels (int)
├── umap # [N, 2] UMAP coordinates
└── pca1 # [N] PCA scores
Namespace:
- Single file:
default - Multi-file:
file1+file2+...(auto-generated)
Sub-clustering (filter hierarchy):
uni/default/clusters # Base
uni/default/filter/1+2+3/clusters # Sub-cluster of 1,2,3
uni/default/filter/1+2+3/filter/0+1/clusters # Further nesting
See Advanced Usage for examples.
Writing Status
Large datasets have a writing attribute (True during write, False when complete).
if f['uni/features'].attrs.get('writing', False):
raise RuntimeError('Dataset is incomplete')
Advanced Usage
Multi-file Joint Clustering
Cluster multiple WSIs together to find common patterns across samples.
# 1. Extract features from each WSI
wt extract -i sample1.ndpi -o sample1.h5
wt extract -i sample2.ndpi -o sample2.h5
# 2. Joint clustering (namespace auto-generated as "sample1+sample2")
wt cluster -i sample1.h5 sample2.h5
# 3. Analysis on joint clusters
wt pca -i sample1.h5 sample2.h5
wt umap -i sample1.h5 sample2.h5
# 4. Preview each file (uses shared cluster labels)
wt preview -i sample1.h5 -N sample1+sample2
wt preview -i sample2.h5 -N sample1+sample2
# Joint clustering
cmd = wt.ClusteringCommand()
result = cmd(['sample1.h5', 'sample2.h5'])
# → namespace: 'sample1+sample2'
# → uni/sample1+sample2/clusters in both files
Sub-clustering
Analyze a subset of clusters in more detail.
# Sub-cluster within clusters 1,2,3
wt cluster -i sample1.h5 sample2.h5 -f 1 2 3
# PCA/UMAP on filtered subset
wt pca -i sample1.h5 sample2.h5 -f 1 2 3
wt umap -i sample1.h5 sample2.h5 -f 1 2 3
# Preview filtered clusters
wt preview -i sample1.h5 -N sample1+sample2 -f 1 2 3
# Sub-cluster
cmd = wt.ClusteringCommand(parent_filters=[[1, 2, 3]])
cmd(['sample1.h5', 'sample2.h5'])
# → uni/sample1+sample2/filter/1+2+3/clusters
# PCA on filtered subset
cmd = wt.PCACommand(parent_filters=[[1, 2, 3]])
cmd(['sample1.h5', 'sample2.h5'])
# → uni/sample1+sample2/filter/1+2+3/pca1
Streamlit App
uv run task app
# Environment variables
WT_MODEL=gigapath WT_DEVICE=cuda:1 WT_PREFETCH=2 uv run task app
Development
git clone https://github.com/technoplasm/wsi-toolbox.git
cd wsi-toolbox
uv sync
uv run wt --help
uv run task app
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wsi_toolbox-0.4.2.tar.gz.
File metadata
- Download URL: wsi_toolbox-0.4.2.tar.gz
- Upload date:
- Size: 175.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
228df63fed292ddafd364b2800b9d70ac2674feb9678e469cdf5159b2f987073
|
|
| MD5 |
f54d1dd8b247fa4509a34de5ff08a7a4
|
|
| BLAKE2b-256 |
dd76a93aabe2c3fea268f1cdca42a7b51fd489c54f556b9499e59342fc2a9009
|
File details
Details for the file wsi_toolbox-0.4.2-py3-none-any.whl.
File metadata
- Download URL: wsi_toolbox-0.4.2-py3-none-any.whl
- Upload date:
- Size: 92.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a07cfca7f31234e0624cc29c3ffea92d7843bfb10737a6a4eec4cbb44f0b0df
|
|
| MD5 |
cf544a354d31ee5feddb5a9758c4dbe3
|
|
| BLAKE2b-256 |
9ad9fab0f00a773bbddbadf79800d2a0c7c6b34631c6aacd2f97d3b188251ed9
|