A comprehensive toolkit for Whole Slide Image processing, feature extraction, and clustering analysis
Project description
WSI Toolbox
Note: This package is currently unstable. API may change without notice.
A comprehensive toolkit for Whole Slide Image (WSI) processing, feature extraction, and clustering analysis.
Installation
# From PyPI
pip install wsi-toolbox
# From GitHub (latest)
pip install git+https://github.com/technoplasm/wsi-toolbox.git
Quick Start
As a Python Library
import wsi_toolbox as wt
# Extract features directly from WSI (no cache needed)
wt.set_default_model_preset('uni')
wt.set_default_device('cuda')
cmd = wt.FeatureExtractionCommand(batch_size=256)
result = cmd('output.h5', wsi_path='input.ndpi')
# Or cache patches first for faster repeated access
cache_cmd = wt.CacheCommand(patch_size=256)
cache_cmd('input.ndpi', 'output.h5')
result = cmd('output.h5') # Uses cache automatically
See README_API.md for API documentation.
As a CLI Tool
After pip install wsi-toolbox, the CLI is available as wsi-toolbox or wt.
For development, use uv run wt.
# Extract features directly from WSI (creates HDF5 with features)
wt extract -i input.ndpi -o output.h5
# Or cache patches first (optional, for repeated access)
wt cache -i input.ndpi -o output.h5
wt extract -i output.h5
# Run Leiden clustering on embeddings
wt cluster -i output.h5
# Compute UMAP projection
wt umap -i output.h5
# Compute PCA projection
wt pca -i output.h5
# Generate cluster overlay preview image
wt preview -i output.h5
# Generate PCA score heatmap preview
wt preview-score -i output.h5 -n pca1
# Show HDF5 file structure
wt show -i output.h5
# Export WSI to Deep Zoom Image format
wt dzi -i input.ndpi -o ./output
# Generate thumbnail from WSI
wt thumb -i input.ndpi
Each subcommand has detailed help: wt <subcommand> --help
Streamlit Web Application
uv run task app
HDF5 File Structure
WSI-toolbox stores all data in a single HDF5 file.
Patch Cache (optional)
cache/{patch_size}/patches # Patch images: [N, H, W, 3]
cache/{patch_size}/coordinates # Patch pixel coordinates: [N, 2]
Cache is optional - extract command can read directly from WSI.
Metadata
Metadata is stored in file attrs and group attrs:
with h5py.File('output.h5', 'r') as f:
mpp = f.attrs['mpp']
patch_size = f.attrs['patch_size']
patch_count = f.attrs['patch_count']
# Also available on cache group: f['cache/256'].attrs['mpp']
Available attrs: original_mpp, original_width, original_height, mpp, patch_size, patch_count, cols, rows
Model Features
{model}/features # Patch features: [N, D]
# uni: [N, 1024]
# gigapath: [N, 1536]
# virchow2: [N, 2560]
{model}/latent_features # Latent features (optional): [N, L, D]
Clustering & Analysis (Hierarchical)
Results are stored in a hierarchical namespace structure:
{model}/{namespace}/clusters # Cluster labels: [N]
{model}/{namespace}/umap # UMAP coordinates: [N, 2]
{model}/{namespace}/pca1 # PCA scores: [N] or [N, k]
Namespace:
- Single file:
default - Multiple files:
file1+file2+...(auto-generated from filenames)
Filter hierarchy: Sub-clustering creates nested paths:
# Base clustering
uni/default/clusters
# Sub-cluster patches in clusters 1, 2, 3
uni/default/filter/1+2+3/clusters
# Further sub-cluster within that
uni/default/filter/1+2+3/filter/0+1/clusters
Each level stores its own clusters, umap, pca results independently.
Dataset Writing Status
Large datasets (patches, features, latent_features) have a writing attribute to indicate write status (True during write, False when complete). Incomplete datasets are automatically deleted on error.
ds = f['patches'] # or f['uni/features']
if ds.attrs.get('writing', False):
raise RuntimeError('Dataset is incomplete')
Features
- WSI processing (.ndpi, .svs, .tiff → HDF5)
- Feature extraction (UNI, Gigapath, Virchow2)
- Leiden clustering with UMAP visualization
- Preview generation (cluster overlays, PCA heatmaps)
- Type-safe command pattern with Pydantic results
- CLI, Python API, and Streamlit GUI
Documentation
- API Guide - Python API documentation
Development
# Clone and install
git clone https://github.com/technoplasm/wsi-toolbox.git
cd wsi-toolbox
uv sync
# Run CLI
uv run wt --help
# Run Streamlit app
uv run task app
Optional: Gigapath support
uv sync --group gigapath
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wsi_toolbox-0.3.1.tar.gz.
File metadata
- Download URL: wsi_toolbox-0.3.1.tar.gz
- Upload date:
- Size: 163.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbdec2205ea83afb8291d34f06406de8f1dd34c5026d25604c2aec6857f389fc
|
|
| MD5 |
ab102371f1037232f0aa180c273537a1
|
|
| BLAKE2b-256 |
989df1c718ca38ba525773285a9474be648656fd59e8296a9aaa82224de1fde0
|
File details
Details for the file wsi_toolbox-0.3.1-py3-none-any.whl.
File metadata
- Download URL: wsi_toolbox-0.3.1-py3-none-any.whl
- Upload date:
- Size: 82.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23643d868aeb7f9fccb2457fd8dced9534733abd2832e3b3b22dee61bb462d17
|
|
| MD5 |
501dcbff89b824cb898cb9efe6c1c5ef
|
|
| BLAKE2b-256 |
183ed82bbf4de49b0f90c226ccd5ea1f822052a03205ded29fd947bff75665b2
|