Spatial heterogeneity profiling of immune checkpoints in spatial transcriptomics

These details have not been verified by PyPI

Project description

SpatialCheckpoint

Spatial heterogeneity profiling of immune checkpoints in spatial transcriptomics data.

SpatialCheckpoint is a bioinformatics pipeline that integrates spatial gene expression profiling, consensus clustering, ensemble ML classification, SHAP interpretability, and clinical survival analysis to characterize immune checkpoint heterogeneity across the tumor microenvironment.

Features

Spatial profiling — region-based checkpoint expression across tumor core, invasive margin, stroma, and immune-enriched zones
80+ spatial features — co-localization scores, spatial gradients, Moran's I autocorrelation, region ratios
Archetype discovery — consensus KMeans + NMF across 6 fixed immune archetypes
Ensemble classification — LightGBM + XGBoost + MLP + Random Forest with SMOTE and Optuna HPO
SHAP interpretability — global and per-class feature importance
Clinical associations — Kaplan-Meier curves, Cox proportional hazards, logistic regression on OS/PFS
Bundled gene panel — 44 curated immune checkpoint genes across 6 functional categories

Installation

pip install spatialcheckpoint

For development:

git clone https://github.com/yourorg/SpatialCheckpoint.git
cd SpatialCheckpoint
pip install -e ".[dev]"

Requirements: Python ≥ 3.10

Quick Start

5-Minute Demo (Synthetic Data)

The following demo runs entirely on synthetic data — no real Visium files required.

import numpy as np
import pandas as pd
import scanpy as sc
import spatialcheckpoint as scp

print(f"SpatialCheckpoint v{scp.__version__}")

# ── 1. Gene panel ────────────────────────────────────────────────────────────
genes = scp.get_all_checkpoint_genes()
print(f"Checkpoint panel: {len(genes)} genes")
print(f"  e.g. {genes[:5]}")

pd1_pathway = scp.get_category_genes("co_inhibitory_receptors")
print(f"PD-1 pathway genes: {pd1_pathway}")

# ── 2. Synthetic Visium slide ────────────────────────────────────────────────
rng = np.random.default_rng(42)
n_spots, n_genes = 200, 100
checkpoint_genes_subset = genes[:8]
random_genes = [f"GENE{i:04d}" for i in range(n_genes - len(checkpoint_genes_subset))]
all_genes = random_genes + checkpoint_genes_subset

X = rng.negative_binomial(n=2, p=0.5, size=(n_spots, n_genes)).astype(float)
adata = sc.AnnData(X=X)
adata.var_names = pd.Index(all_genes)

# Spatial coordinates (20 × 10 grid)
gx, gy = np.meshgrid(np.arange(20), np.arange(10))
coords = np.column_stack([gx.ravel(), gy.ravel()]).astype(float)
coords += rng.uniform(-0.1, 0.1, size=coords.shape)
adata.obsm["spatial"] = coords

# Region annotations
regions = ["tumor_core", "invasive_margin", "stroma", "immune_enriched", "necrotic"]
region_list = []
for x, y in coords:
    if x < 5 and y < 5:       region_list.append("tumor_core")
    elif x < 10 and y < 8:    region_list.append("invasive_margin")
    elif x >= 15:              region_list.append("immune_enriched")
    elif y >= 8:               region_list.append("necrotic")
    else:                      region_list.append("stroma")
adata.obs["region_type"] = pd.Categorical(region_list, categories=regions)

# ── 3. Spatial feature extraction ───────────────────────────────────────────
engineer = scp.SpatialFeatureEngineer(adata, checkpoint_genes_subset)
features = engineer.extract_all_features(sample_id="demo_sample")
print(f"\nFeature matrix: {features.shape[0]} samples × {features.shape[1]} features")
print(f"  Feature columns (first 5): {list(features.columns[:5])}")

# ── 4. Archetype discovery ───────────────────────────────────────────────────
# Build a multi-sample feature matrix (simulate 30 samples)
n_samples, n_feats = 30, features.shape[1]
feat_data = rng.standard_normal((n_samples, n_feats))
sample_ids = [f"sample_{i:03d}" for i in range(n_samples)]
feature_matrix = pd.DataFrame(feat_data, index=sample_ids, columns=features.columns)

cancer_types = rng.choice(["BRCA", "CRC", "NSCLC"], size=n_samples)
metadata = pd.DataFrame({"cancer_type": cancer_types}, index=sample_ids)

discovery = scp.SpatialArchetypeDiscovery(feature_matrix, metadata)
result = discovery.consensus_clustering(k_range=(2, 5), n_iterations=30)

print(f"\nConsensus clustering:")
print(f"  Optimal k = {result['optimal_k']}")
print(f"  Label distribution: {dict(pd.Series(result['labels']).value_counts())}")

char_df = discovery.characterize_archetypes(result["labels"])
print(f"\nArchetype characterization:")
print(char_df[["archetype_name", "n_samples"]].to_string())

# ── 5. NMF soft membership ───────────────────────────────────────────────────
nmf_result = discovery.run_nmf(k=result["optimal_k"])
print(f"\nNMF decomposition:")
print(f"  W (membership weights): {nmf_result['W'].shape}")
print(f"  H (archetype profiles): {nmf_result['H'].shape}")
print(f"  Explained variance: {nmf_result['explained_variance']:.3f}")

Python API

1. Data Preprocessing

import spatialcheckpoint as scp

# From Space Ranger output directory
preprocessor = scp.SpatialDataPreprocessor(spaceranger_out_path="path/to/spaceranger/output")
adata = preprocessor.load_visium()
adata = preprocessor.quality_control(adata, min_genes=200, max_mt_pct=25.0)
adata = preprocessor.normalize(adata)
adata.write_h5ad("data/processed/sample01_preprocessed.h5ad")

# Or from an existing H5AD
preprocessor = scp.SpatialDataPreprocessor(h5_path="existing_data.h5ad")

2. Load & Cache

loader = scp.SpatialDataLoader(processed_dir="data/processed/")
adata = loader.load("sample01")   # returns cached .h5ad if present

3. Checkpoint Profiling

genes = scp.get_all_checkpoint_genes()   # 44 genes, 6 functional categories

profiler = scp.SpatialCheckpointProfiler(adata, genes)
region_expr = profiler.expression_by_region()   # DataFrame: region × gene
hotspots    = profiler.checkpoint_hotspot_detection()  # Moran's I per gene

4. Spatial Feature Engineering

engineer = scp.SpatialFeatureEngineer(adata, genes)
features  = engineer.extract_all_features(sample_id="sample01")
# → DataFrame with 80+ columns: co-localization, gradients, Moran's I, region ratios

5. Co-localization Analysis

lr_pairs = scp.get_ligand_receptor_pairs()   # [{ligand, receptor, alias}]
analyzer = scp.CheckpointColocalizationAnalyzer(adata, genes)
coloc_df = analyzer.compute_colocalization()

6. Archetype Discovery

# feature_matrix: DataFrame (n_samples × n_features)
# sample_metadata: DataFrame with 'cancer_type' column, same index as feature_matrix
discovery = scp.SpatialArchetypeDiscovery(feature_matrix, sample_metadata)

cc     = discovery.consensus_clustering(k_range=(2, 8), n_iterations=100)
labels = cc["labels"]           # integer cluster labels
char   = discovery.characterize_archetypes(labels)   # archetype names, top features

nmf    = discovery.run_nmf(k=cc["optimal_k"])
# nmf["W"]  → (n_samples, k) soft membership weights
# nmf["H"]  → (k, n_features) archetype profiles

7. Train the Ensemble Classifier

trainer = scp.ArchetypeModelTrainer(
    feature_matrix=feature_matrix,
    archetype_labels=labels,
    output_dir="models/",
)
results = trainer.run(n_optuna_trials=30)
# results["model"]         → trained ensemble
# results["test_metrics"]  → accuracy, F1, AUC

8. SHAP Explanations

explainer = scp.ArchetypeExplainer(results["model"], feature_matrix)
shap_df   = explainer.global_feature_importance()   # DataFrame: feature × archetype

CLI

# Download a registered dataset
spatialcheckpoint download BRCA_visium_10x

# Download all BRCA datasets
spatialcheckpoint download all --cancer-type BRCA

# Preprocess raw Visium output or H5AD
spatialcheckpoint preprocess path/to/spaceranger/  data/processed/
spatialcheckpoint preprocess sample.h5ad           data/processed/

# Run full spatial analysis on a preprocessed sample
spatialcheckpoint analyze sample01

# Discover archetypes from a feature matrix CSV
spatialcheckpoint discover results/sample01/features.csv --k-min 2 --k-max 8

# Train the archetype classifier
spatialcheckpoint classify features.csv archetype_labels.csv --model-dir models/

# Generate publication figures (requires prior analyze run)
spatialcheckpoint figures --results-dir results/ --output-dir paper/figures/

Gene Panel

The bundled panel covers 44 genes across 6 functional categories:

Category	Genes (examples)
Co-inhibitory receptors	`PDCD1` (PD-1), `CTLA4`, `LAG3`, `HAVCR2` (TIM-3), `TIGIT`
Co-inhibitory ligands	`CD274` (PD-L1), `PDCD1LG2` (PD-L2), `LGALS9` (Galectin-9)
Novel checkpoints	`VSIR` (VISTA), `CD276` (B7-H3), `VTCN1` (B7-H4)
Innate checkpoints	`CD47`, `SIRPA`, `LILRB1`, `LILRB2`
Immune enzymes	`IDO1`, `ENTPD1` (CD39), `NT5E` (CD73), `ARG1`
Co-stimulatory reference	`CD28`, `ICOS`, `TNFRSF4` (OX40), `TNFRSF9` (4-1BB)

import spatialcheckpoint as scp

all_genes    = scp.get_all_checkpoint_genes()                      # 44 genes sorted
pd1_pathway  = scp.get_category_genes("co_inhibitory_receptors")   # 9 genes
cell_markers = scp.get_immune_cell_markers()                       # {cell_type: [genes]}
lr_pairs     = scp.get_ligand_receptor_pairs()                     # [{ligand, receptor, alias}]

Archetypes

Six fixed spatial immune archetypes are inferred by consensus clustering:

Archetype	Spatial signature
`Checkpoint-Hot`	High checkpoint expression, high immune infiltration, strong spatial co-localization
`Checkpoint-Cold`	Low checkpoint and immune activity throughout the tissue
`Checkpoint-Excluded`	Checkpoint expression concentrated at invasive margin; immune cells at periphery
`Checkpoint-Mismatch`	Checkpoint and immune signals spatially separated (non-overlapping)
`Innate-Dominant`	CD47/SIRPα axis dominant over adaptive checkpoints
`Novel-Enriched`	VISTA / B7-H3 / B7-H4 enriched over canonical PD-1/PD-L1 axis

Pipeline Architecture

Raw Visium data (Space Ranger dir or H5AD)
  → SpatialDataPreprocessor      QC, normalize → 'counts' / 'log1p' layers
  → SpatialDataLoader            cache-aware H5AD loader
  → SpatialCheckpointProfiler    region-based expression
                                 (tumor_core, invasive_margin, stroma,
                                  immune_enriched, necrotic)
  → SpatialFeatureEngineer       80+ features per slide:
                                  co-localization, gradients, Moran's I,
                                  region expression ratios
  → SpatialArchetypeDiscovery    consensus KMeans + delta-area k-selection
                                  + NMF soft membership
  → ArchetypeModelTrainer        LightGBM + XGBoost + MLP + RF ensemble,
                                  SMOTE oversampling, RFECV feature selection,
                                  Optuna hyperparameter optimization
  → ArchetypeExplainer           SHAP global / per-class feature importance
  → ClinicalAssociationAnalyzer  KM curves, Cox PH, logistic regression (OS/PFS)
  → Visualization                spatial plots, publication-ready figures

Key data contracts:

Spatial coordinates in adata.obsm['spatial']
Region annotations in adata.obs['region_type'] (categorical)
Preprocessed files: data/processed/{sample_id}_preprocessed.h5ad

Output Files

Path	Contents
`results/{sample_id}/features.csv`	80+ spatial features
`results/{sample_id}/region_expression.csv`	Region × gene expression stats
`results/{sample_id}/hotspots.csv`	Moran's I per gene
`results/{sample_id}/colocalization.csv`	Ligand-receptor co-occurrence
`results/archetypes/archetype_labels.csv`	Sample → archetype assignment
`results/archetypes/archetype_characteristics.csv`	Per-archetype feature profiles
`results/archetypes/nmf_W.csv`, `nmf_H.csv`	NMF basis / coefficient matrices
`models/archetype_classifier.joblib`	Serialized ensemble model
`paper/figures/`	Publication-ready PDF/PNG plots
`paper/tables/`	Feature importance and archetype CSV tables

API Reference

Gene Set Utilities

Function	Description
`get_all_checkpoint_genes()`	Sorted list of 44 checkpoint gene symbols
`get_category_genes(category)`	Genes for a specific functional category
`get_immune_cell_markers()`	`{cell_type: [genes]}` reference marker dictionary
`get_ligand_receptor_pairs()`	List of `{ligand, receptor, alias}` pairs

Core Classes

Class	Module	Purpose
`SpatialDataPreprocessor`	`data.preprocess`	QC, normalize, dual-input (Space Ranger or H5AD)
`SpatialDataLoader`	`data.loader`	Cache-aware loader for preprocessed H5ADs
`SpatialCheckpointProfiler`	`analysis.spatial_expression`	Region-based expression, hotspot detection
`SpatialFeatureEngineer`	`analysis.spatial_features`	80+ spatial feature extraction
`CheckpointColocalizationAnalyzer`	`analysis.colocalization`	Ligand-receptor spatial co-occurrence
`SpatialArchetypeDiscovery`	`model.archetype_discovery`	Consensus clustering + NMF
`SpatialArchetypeClassifier`	`model.classifier`	Ensemble classifier (LGBM+XGB+MLP+RF)
`ArchetypeModelTrainer`	`model.trainer`	Full train pipeline with HPO
`ArchetypeExplainer`	`model.explainer`	SHAP global/per-class importance

Development

# Clone and install in dev mode
git clone https://github.com/yourorg/SpatialCheckpoint.git
cd SpatialCheckpoint
pip install -e ".[dev]"

# Run tests (uses synthetic fixtures — no real data needed)
pytest tests/ -v

# Lint
ruff check src/

Testing with synthetic data

All tests use synthetic fixtures from tests/conftest.py. No real Visium files are required:

# 200-spot × 100-gene AnnData with spatial coords and region labels
# 50-sample × 80-feature DataFrame
# Clinical data with OS, PFS, ICI response

Dependencies

Core: scanpy, squidpy, anndata, pandas, numpy, scipy, scikit-learn

ML: lightgbm, xgboost, shap, imbalanced-learn, optuna

Stats: lifelines

Viz: matplotlib, seaborn

CLI: typer, rich

Heavy dependencies (squidpy, lightgbm, xgboost, lifelines, optuna, shap, imbalanced-learn) are imported with try/except fallbacks — partial functionality is available even when these are not installed.

Citation

If you use SpatialCheckpoint in your research, please cite:

@article{spatialcheckpoint2025,
  title   = {SpatialCheckpoint: Spatial heterogeneity profiling of immune checkpoints
             in spatial transcriptomics},
  author  = {},
  journal = {},
  year    = {2025},
}

License

MIT License — see LICENSE for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

Apr 23, 2026

0.1.2

Apr 1, 2026

0.1.1

Apr 1, 2026

This version

0.1.0

Apr 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatialcheckpoint-0.1.0.tar.gz (98.3 kB view details)

Uploaded Apr 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spatialcheckpoint-0.1.0-py3-none-any.whl (99.1 kB view details)

Uploaded Apr 1, 2026 Python 3

File details

Details for the file spatialcheckpoint-0.1.0.tar.gz.

File metadata

Download URL: spatialcheckpoint-0.1.0.tar.gz
Upload date: Apr 1, 2026
Size: 98.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for spatialcheckpoint-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`152d2b64efe1340bc6723604ab502e99ff33708e4736ef87dbef150f3bca3b35`
MD5	`02024f7ff267167bb30ffef67ed6407b`
BLAKE2b-256	`79e95b2c85acfe227d3f427b37d6921eb435209f092a5000be4e0f34d14f7070`

See more details on using hashes here.

File details

Details for the file spatialcheckpoint-0.1.0-py3-none-any.whl.

File metadata

Download URL: spatialcheckpoint-0.1.0-py3-none-any.whl
Upload date: Apr 1, 2026
Size: 99.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for spatialcheckpoint-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`18dec79f08324fb6a4ce153f1476a32a1c645e1a2f45d3e150421e1091054d8a`
MD5	`f8cfcb09bda7e14f4b004a194f522df6`
BLAKE2b-256	`effa728dd274f10355b469448057fc7a4c8b30ff2fb1d805f3823a4933f2490c`

See more details on using hashes here.

spatialcheckpoint 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

SpatialCheckpoint

Features

Installation

Quick Start

5-Minute Demo (Synthetic Data)

Python API

1. Data Preprocessing

2. Load & Cache

3. Checkpoint Profiling

4. Spatial Feature Engineering

5. Co-localization Analysis

6. Archetype Discovery

7. Train the Ensemble Classifier

8. SHAP Explanations

CLI

Gene Panel

Archetypes

Pipeline Architecture

Output Files

API Reference

Gene Set Utilities

Core Classes

Development

Testing with synthetic data

Dependencies

Citation

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes