Skip to main content

A comprehensive toolkit for Whole Slide Image processing, feature extraction, and clustering analysis

Project description

WSI Toolbox

A comprehensive toolkit for Whole Slide Image (WSI) processing, feature extraction, and clustering analysis.

Installation

From PyPI

pip install wsi-toolbox

For development

# Clone repository
git clone https://github.com/endaaman/WSI-toolbox.git
cd WSI-toolbox

# Install dependencies
uv sync

Note: For gigapath slide-level encoder (CLI only), install manually:

pip install git+https://github.com/prov-gigapath/prov-gigapath.git@5d77be0
pip install flash-attn einops fairscale

Quick Start

As a Python Library

import wsi_toolbox as wt

# Basic workflow
wt.set_default_model('uni')
cmd = wt.Wsi2HDF5Command(patch_size=256)
result = cmd('input.ndpi', 'output.h5')

See README_API.md for comprehensive API documentation (detailed examples, command patterns, utilities, etc.)

As a CLI Tool

# Convert WSI to HDF5
wsi-toolbox wsi2h5 --in input.ndpi --out output.h5 --patch-size 256

# Extract features
wsi-toolbox embed --in output.h5 --model uni

# Clustering
wsi-toolbox cluster --in output.h5 --resolution 1.0

# For all commands
wsi-toolbox --help

Streamlit Web Application

uv run task app

HDF5 File Structure

WSI-toolbox stores all data in a single HDF5 file:

# Core data
'patches'                      # Patch images: [N, H, W, 3], e.g., [3237, 256, 256, 3]
'coordinates'                  # Patch pixel coordinates: [N, 2]

# Metadata
'metadata/original_mpp'        # Original microns per pixel
'metadata/original_width'      # Original image width (level=0)
'metadata/original_height'     # Original image height (level=0)
'metadata/image_level'         # Image level used (typically 0)
'metadata/mpp'                 # Output patch MPP
'metadata/scale'               # Scale factor
'metadata/patch_size'          # Patch size (e.g., 256)
'metadata/patch_count'         # Total patch count
'metadata/cols'                # Grid columns
'metadata/rows'                # Grid rows

# Model features (per model: uni, gigapath, virchow2)
'{model}/features'             # Patch features: [N, D]
                               #   uni: [N, 1024]
                               #   gigapath: [N, 1536]
                               #   virchow2: [N, 2560]
'{model}/latent_features'      # Latent features (optional): [N, K, K, D]
'{model}/clusters'             # Cluster labels: [N]

# Gigapath slide-level (CLI only)
'gigapath/slide_feature'       # Slide-level features: [768]

Features

  • WSI processing (.ndpi, .svs, .tiff → HDF5)
  • Feature extraction (UNI, Gigapath, Virchow2)
  • Leiden clustering with UMAP visualization
  • Preview generation (cluster overlays, latent PCA)
  • Type-safe command pattern with Pydantic results
  • CLI, Python API, and Streamlit GUI

Documentation

  • API Guide - Comprehensive Python API documentation (日本語)
  • CLAUDE.md - Development guidelines

Development

Setup Development Environment

# Clone repository
git clone https://github.com/endaaman/wsi-toolbox.git
cd wsi-toolbox

# Install all dependencies
uv sync

# Install with optional gigapath support
uv sync --extra gigapath

# Install build tools
uv sync --group build

Run Tests and Development Tools

# Run CLI
uv run wsi-toolbox --help

# Run Streamlit app
uv run task app

# Run watcher
uv run task watcher

Build and Deploy

Build Package

# Clean previous builds
uv run task clean

# Build package
uv run task build
# or
python -m build

# Check package integrity
uv run task check
# or
python -m twine check dist/*

Deploy to PyPI

Prerequisites: Install build tools first

uv sync --group build

Deploy:

# Using deploy script (recommended)
./deploy.sh

# Or manually
python -m build
python -m twine check dist/*
python -m twine upload dist/*

Note: Configure your PyPI credentials before deploying:

# Create ~/.pypirc with your API token
# Or use environment variables
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=<your-pypi-token>

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsi_toolbox-0.1.0.tar.gz (314.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wsi_toolbox-0.1.0-py3-none-any.whl (51.3 kB view details)

Uploaded Python 3

File details

Details for the file wsi_toolbox-0.1.0.tar.gz.

File metadata

  • Download URL: wsi_toolbox-0.1.0.tar.gz
  • Upload date:
  • Size: 314.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for wsi_toolbox-0.1.0.tar.gz
Algorithm Hash digest
SHA256 07a39cf571aafc996047f8f95cc0df477e079cbcd07f4fa919ca30a1f2eeb292
MD5 e1adcf19a5dff7b17fec7eeaf8cd2631
BLAKE2b-256 54a7e99197c58ce494280437faef893d4e8554a4d3d4a278973f9e7b761fa24a

See more details on using hashes here.

File details

Details for the file wsi_toolbox-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: wsi_toolbox-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 51.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for wsi_toolbox-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 61604aeee12234e0736769940e5877c689cb5b3273810862a535208bef031247
MD5 0c2290786472260c239fa8ffd2df27bd
BLAKE2b-256 1b80100ed2a6c643d3ae2caa5ff7d947138f03c1c577ce3066405be555103ce2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page