Skip to main content

A comprehensive toolkit for Whole Slide Image processing, feature extraction, and clustering analysis

Project description

WSI Toolbox

A comprehensive toolkit for Whole Slide Image (WSI) processing, feature extraction, and clustering analysis.

Installation

From PyPI

pip install wsi-toolbox

For development

# Clone repository
git clone https://github.com/endaaman/WSI-toolbox.git
cd WSI-toolbox

# Install dependencies
uv sync

Note: For gigapath slide-level encoder (CLI only):

# For flash-attn (requires CUDA, takes time to build)
uv sync --extra build
uv sync --extra build --extra compile

# Then install gigapath (uv-only dependency group, not available via PyPI)
uv sync --group gigapath

Quick Start

As a Python Library

import wsi_toolbox as wt

# Basic workflow
wt.set_default_model_preset('uni')
cmd = wt.Wsi2HDF5Command(patch_size=256)
result = cmd('input.ndpi', 'output.h5')

See README_API.md for comprehensive API documentation (detailed examples, command patterns, utilities, etc.)

As a CLI Tool

# Convert WSI to HDF5
wsi-toolbox wsi2h5 --in input.ndpi --out output.h5 --patch-size 256

# Extract features
wsi-toolbox embed --in output.h5 --model uni

# Clustering
wsi-toolbox cluster --in output.h5 --resolution 1.0

# For all commands
wsi-toolbox --help

Streamlit Web Application

uv run task app

HDF5 File Structure

WSI-toolbox stores all data in a single HDF5 file:

# Core data
'patches'                      # Patch images: [N, H, W, 3], e.g., [3237, 256, 256, 3]
'coordinates'                  # Patch pixel coordinates: [N, 2]

# Metadata
'metadata/original_mpp'        # Original microns per pixel
'metadata/original_width'      # Original image width (level=0)
'metadata/original_height'     # Original image height (level=0)
'metadata/image_level'         # Image level used (typically 0)
'metadata/mpp'                 # Output patch MPP
'metadata/scale'               # Scale factor
'metadata/patch_size'          # Patch size (e.g., 256)
'metadata/patch_count'         # Total patch count
'metadata/cols'                # Grid columns
'metadata/rows'                # Grid rows

# Model features (per model: uni, gigapath, virchow2)
'{model}/features'             # Patch features: [N, D]
                               #   uni: [N, 1024]
                               #   gigapath: [N, 1536]
                               #   virchow2: [N, 2560]
'{model}/latent_features'      # Latent features (optional): [N, K, K, D]
'{model}/clusters'             # Cluster labels: [N]

# Gigapath slide-level (CLI only)
'gigapath/slide_feature'       # Slide-level features: [768]

Features

  • WSI processing (.ndpi, .svs, .tiff → HDF5)
  • Feature extraction (UNI, Gigapath, Virchow2)
  • Leiden clustering with UMAP visualization
  • Preview generation (cluster overlays, latent PCA)
  • Type-safe command pattern with Pydantic results
  • CLI, Python API, and Streamlit GUI

Documentation

  • API Guide - Comprehensive Python API documentation (日本語)
  • CLAUDE.md - Development guidelines

Development

Setup Development Environment

# Clone repository
git clone https://github.com/endaaman/wsi-toolbox.git
cd wsi-toolbox

# Install all dependencies
uv sync

# Install with optional gigapath support (uv-only dependency group)
uv sync --group gigapath

# Install build tools
uv sync --group build

Run Tests and Development Tools

# Run CLI
uv run wsi-toolbox --help

# Run Streamlit app
uv run task app

# Run watcher
uv run task watcher

Code Quality (Linting)

This project uses Ruff for linting and code formatting.

# Install dev dependencies (includes ruff)
uv sync --group dev

# Check code quality
uv run ruff check wsi_toolbox/

# Auto-fix issues where possible
uv run ruff check wsi_toolbox/ --fix

# Format code
uv run ruff format wsi_toolbox/

# Check formatting without modifying files
uv run ruff format --check wsi_toolbox/

Note: Linting runs automatically on every push/PR via GitHub Actions.

Build and Deploy

Build Package

# Clean previous builds
uv run task clean

# Build package
uv run task build
# or
python -m build

# Check package integrity
uv run task check
# or
python -m twine check dist/*

Deploy to PyPI

Prerequisites: Install build tools first

uv sync --group build

Deploy:

# Using deploy script (recommended)
./deploy.sh

# Or manually
python -m build
python -m twine check dist/*
python -m twine upload dist/*

Note: Configure your PyPI credentials before deploying:

# Create ~/.pypirc with your API token
# Or use environment variables
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=<your-pypi-token>

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsi_toolbox-0.1.1.tar.gz (218.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wsi_toolbox-0.1.1-py3-none-any.whl (52.0 kB view details)

Uploaded Python 3

File details

Details for the file wsi_toolbox-0.1.1.tar.gz.

File metadata

  • Download URL: wsi_toolbox-0.1.1.tar.gz
  • Upload date:
  • Size: 218.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for wsi_toolbox-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a7d467f1d389ce24b10d12afb1549ecf66bee0c44e85f9c7450e4a91b6367d3c
MD5 fcb58c16f521c8e931dbbbb2839dce80
BLAKE2b-256 17dbdab1c50191b60ad535ebd5d2a0eb41cbf1d3e45711ffcf64c2fb510cac5d

See more details on using hashes here.

File details

Details for the file wsi_toolbox-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: wsi_toolbox-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 52.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for wsi_toolbox-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2d3f90b20b29ca1b4e55d1d1e2d64650d4d5d6257f0d163027f198a4ced04180
MD5 e7f204bec5bebd23f1d06fde699a331f
BLAKE2b-256 71c8e32e9267b95c3f05d5e8cdd695699aa39f98eb45d80d8ecdf2ec4d209749

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page