Skip to main content

Spatial heterogeneity profiling of immune checkpoints in spatial transcriptomics

Project description

SpatialCheckpoint

PyPI version Python License: MIT

Spatial heterogeneity profiling of immune checkpoints in spatial transcriptomics data.

SpatialCheckpoint is a bioinformatics pipeline that integrates spatial gene expression profiling, consensus clustering, ensemble ML classification, SHAP interpretability, and clinical survival analysis to characterize immune checkpoint heterogeneity across the tumor microenvironment.


Features

  • Spatial profiling — region-based checkpoint expression across tumor core, invasive margin, stroma, and immune-enriched zones
  • 80+ spatial features — co-localization scores, spatial gradients, Moran's I autocorrelation, region ratios
  • Archetype discovery — consensus KMeans + NMF across 6 fixed immune archetypes
  • Ensemble classification — LightGBM + XGBoost + MLP + Random Forest with SMOTE and Optuna HPO
  • SHAP interpretability — global and per-class feature importance
  • Clinical associations — Kaplan-Meier curves, Cox proportional hazards, logistic regression on OS/PFS
  • Bundled gene panel — 44 curated immune checkpoint genes across 6 functional categories

Installation

pip install spatialcheckpoint

For development:

git clone https://github.com/yourorg/SpatialCheckpoint.git
cd SpatialCheckpoint
pip install -e ".[dev]"

Requirements: Python ≥ 3.10


Quick Start

5-Minute Demo (Synthetic Data)

The following demo runs entirely on synthetic data — no real Visium files required.

import numpy as np
import pandas as pd
import scanpy as sc
import spatialcheckpoint as scp

print(f"SpatialCheckpoint v{scp.__version__}")

# ── 1. Gene panel ────────────────────────────────────────────────────────────
genes = scp.get_all_checkpoint_genes()
print(f"Checkpoint panel: {len(genes)} genes")
print(f"  e.g. {genes[:5]}")

pd1_pathway = scp.get_category_genes("co_inhibitory_receptors")
print(f"PD-1 pathway genes: {pd1_pathway}")

# ── 2. Synthetic Visium slide ────────────────────────────────────────────────
rng = np.random.default_rng(42)
n_spots, n_genes = 200, 100
checkpoint_genes_subset = genes[:8]
random_genes = [f"GENE{i:04d}" for i in range(n_genes - len(checkpoint_genes_subset))]
all_genes = random_genes + checkpoint_genes_subset

X = rng.negative_binomial(n=2, p=0.5, size=(n_spots, n_genes)).astype(float)
adata = sc.AnnData(X=X)
adata.var_names = pd.Index(all_genes)

# Spatial coordinates (20 × 10 grid)
gx, gy = np.meshgrid(np.arange(20), np.arange(10))
coords = np.column_stack([gx.ravel(), gy.ravel()]).astype(float)
coords += rng.uniform(-0.1, 0.1, size=coords.shape)
adata.obsm["spatial"] = coords

# Region annotations
regions = ["tumor_core", "invasive_margin", "stroma", "immune_enriched", "necrotic"]
region_list = []
for x, y in coords:
    if x < 5 and y < 5:       region_list.append("tumor_core")
    elif x < 10 and y < 8:    region_list.append("invasive_margin")
    elif x >= 15:              region_list.append("immune_enriched")
    elif y >= 8:               region_list.append("necrotic")
    else:                      region_list.append("stroma")
adata.obs["region_type"] = pd.Categorical(region_list, categories=regions)

# ── 3. Spatial feature extraction ───────────────────────────────────────────
engineer = scp.SpatialFeatureEngineer(adata, checkpoint_genes_subset)
features = engineer.extract_all_features(sample_id="demo_sample")
print(f"\nFeature matrix: {features.shape[0]} samples × {features.shape[1]} features")
print(f"  Feature columns (first 5): {list(features.columns[:5])}")

# ── 4. Archetype discovery ───────────────────────────────────────────────────
# Build a multi-sample feature matrix (simulate 30 samples)
n_samples, n_feats = 30, features.shape[1]
feat_data = rng.standard_normal((n_samples, n_feats))
sample_ids = [f"sample_{i:03d}" for i in range(n_samples)]
feature_matrix = pd.DataFrame(feat_data, index=sample_ids, columns=features.columns)

cancer_types = rng.choice(["BRCA", "CRC", "NSCLC"], size=n_samples)
metadata = pd.DataFrame({"cancer_type": cancer_types}, index=sample_ids)

discovery = scp.SpatialArchetypeDiscovery(feature_matrix, metadata)
result = discovery.consensus_clustering(k_range=(2, 5), n_iterations=30)

print(f"\nConsensus clustering:")
print(f"  Optimal k = {result['optimal_k']}")
print(f"  Label distribution: {dict(pd.Series(result['labels']).value_counts())}")

char_df = discovery.characterize_archetypes(result["labels"])
print(f"\nArchetype characterization:")
print(char_df[["archetype_name", "n_samples"]].to_string())

# ── 5. NMF soft membership ───────────────────────────────────────────────────
nmf_result = discovery.run_nmf(k=result["optimal_k"])
print(f"\nNMF decomposition:")
print(f"  W (membership weights): {nmf_result['W'].shape}")
print(f"  H (archetype profiles): {nmf_result['H'].shape}")
print(f"  Explained variance: {nmf_result['explained_variance']:.3f}")

Python API

1. Data Preprocessing

import spatialcheckpoint as scp

# From Space Ranger output directory
preprocessor = scp.SpatialDataPreprocessor(spaceranger_out_path="path/to/spaceranger/output")
adata = preprocessor.load_visium()
adata = preprocessor.quality_control(adata, min_genes=200, max_mt_pct=25.0)
adata = preprocessor.normalize(adata)
adata.write_h5ad("data/processed/sample01_preprocessed.h5ad")

# Or from an existing H5AD
preprocessor = scp.SpatialDataPreprocessor(h5_path="existing_data.h5ad")

2. Load & Cache

loader = scp.SpatialDataLoader(processed_dir="data/processed/")
adata = loader.load("sample01")   # returns cached .h5ad if present

3. Checkpoint Profiling

genes = scp.get_all_checkpoint_genes()   # 44 genes, 6 functional categories

profiler = scp.SpatialCheckpointProfiler(adata, genes)
region_expr = profiler.expression_by_region()   # DataFrame: region × gene
hotspots    = profiler.checkpoint_hotspot_detection()  # Moran's I per gene

4. Spatial Feature Engineering

engineer = scp.SpatialFeatureEngineer(adata, genes)
features  = engineer.extract_all_features(sample_id="sample01")
# → DataFrame with 80+ columns: co-localization, gradients, Moran's I, region ratios

5. Co-localization Analysis

lr_pairs = scp.get_ligand_receptor_pairs()   # [{ligand, receptor, alias}]
analyzer = scp.CheckpointColocalizationAnalyzer(adata, genes)
coloc_df = analyzer.compute_colocalization()

6. Archetype Discovery

# feature_matrix: DataFrame (n_samples × n_features)
# sample_metadata: DataFrame with 'cancer_type' column, same index as feature_matrix
discovery = scp.SpatialArchetypeDiscovery(feature_matrix, sample_metadata)

cc     = discovery.consensus_clustering(k_range=(2, 8), n_iterations=100)
labels = cc["labels"]           # integer cluster labels
char   = discovery.characterize_archetypes(labels)   # archetype names, top features

nmf    = discovery.run_nmf(k=cc["optimal_k"])
# nmf["W"]  → (n_samples, k) soft membership weights
# nmf["H"]  → (k, n_features) archetype profiles

7. Train the Ensemble Classifier

trainer = scp.ArchetypeModelTrainer(
    feature_matrix=feature_matrix,
    archetype_labels=labels,
    output_dir="models/",
)
results = trainer.run(n_optuna_trials=30)
# results["model"]         → trained ensemble
# results["test_metrics"]  → accuracy, F1, AUC

8. SHAP Explanations

explainer = scp.ArchetypeExplainer(results["model"], feature_matrix)
shap_df   = explainer.global_feature_importance()   # DataFrame: feature × archetype

CLI

# Download a registered dataset
spatialcheckpoint download BRCA_visium_10x

# Download all BRCA datasets
spatialcheckpoint download all --cancer-type BRCA

# Preprocess raw Visium output or H5AD
spatialcheckpoint preprocess path/to/spaceranger/  data/processed/
spatialcheckpoint preprocess sample.h5ad           data/processed/

# Run full spatial analysis on a preprocessed sample
spatialcheckpoint analyze sample01

# Discover archetypes from a feature matrix CSV
spatialcheckpoint discover results/sample01/features.csv --k-min 2 --k-max 8

# Train the archetype classifier
spatialcheckpoint classify features.csv archetype_labels.csv --model-dir models/

# Generate publication figures (requires prior analyze run)
spatialcheckpoint figures --results-dir results/ --output-dir paper/figures/

Gene Panel

The bundled panel covers 44 genes across 6 functional categories:

Category Genes (examples)
Co-inhibitory receptors PDCD1 (PD-1), CTLA4, LAG3, HAVCR2 (TIM-3), TIGIT
Co-inhibitory ligands CD274 (PD-L1), PDCD1LG2 (PD-L2), LGALS9 (Galectin-9)
Novel checkpoints VSIR (VISTA), CD276 (B7-H3), VTCN1 (B7-H4)
Innate checkpoints CD47, SIRPA, LILRB1, LILRB2
Immune enzymes IDO1, ENTPD1 (CD39), NT5E (CD73), ARG1
Co-stimulatory reference CD28, ICOS, TNFRSF4 (OX40), TNFRSF9 (4-1BB)
import spatialcheckpoint as scp

all_genes    = scp.get_all_checkpoint_genes()                      # 44 genes sorted
pd1_pathway  = scp.get_category_genes("co_inhibitory_receptors")   # 9 genes
cell_markers = scp.get_immune_cell_markers()                       # {cell_type: [genes]}
lr_pairs     = scp.get_ligand_receptor_pairs()                     # [{ligand, receptor, alias}]

Archetypes

Six fixed spatial immune archetypes are inferred by consensus clustering:

Archetype Spatial signature
Checkpoint-Hot High checkpoint expression, high immune infiltration, strong spatial co-localization
Checkpoint-Cold Low checkpoint and immune activity throughout the tissue
Checkpoint-Excluded Checkpoint expression concentrated at invasive margin; immune cells at periphery
Checkpoint-Mismatch Checkpoint and immune signals spatially separated (non-overlapping)
Innate-Dominant CD47/SIRPα axis dominant over adaptive checkpoints
Novel-Enriched VISTA / B7-H3 / B7-H4 enriched over canonical PD-1/PD-L1 axis

Pipeline Architecture

Raw Visium data (Space Ranger dir or H5AD)
  → SpatialDataPreprocessor      QC, normalize → 'counts' / 'log1p' layers
  → SpatialDataLoader            cache-aware H5AD loader
  → SpatialCheckpointProfiler    region-based expression
                                 (tumor_core, invasive_margin, stroma,
                                  immune_enriched, necrotic)
  → SpatialFeatureEngineer       80+ features per slide:
                                  co-localization, gradients, Moran's I,
                                  region expression ratios
  → SpatialArchetypeDiscovery    consensus KMeans + delta-area k-selection
                                  + NMF soft membership
  → ArchetypeModelTrainer        LightGBM + XGBoost + MLP + RF ensemble,
                                  SMOTE oversampling, RFECV feature selection,
                                  Optuna hyperparameter optimization
  → ArchetypeExplainer           SHAP global / per-class feature importance
  → ClinicalAssociationAnalyzer  KM curves, Cox PH, logistic regression (OS/PFS)
  → Visualization                spatial plots, publication-ready figures

Key data contracts:

  • Spatial coordinates in adata.obsm['spatial']
  • Region annotations in adata.obs['region_type'] (categorical)
  • Preprocessed files: data/processed/{sample_id}_preprocessed.h5ad

Output Files

Path Contents
results/{sample_id}/features.csv 80+ spatial features
results/{sample_id}/region_expression.csv Region × gene expression stats
results/{sample_id}/hotspots.csv Moran's I per gene
results/{sample_id}/colocalization.csv Ligand-receptor co-occurrence
results/archetypes/archetype_labels.csv Sample → archetype assignment
results/archetypes/archetype_characteristics.csv Per-archetype feature profiles
results/archetypes/nmf_W.csv, nmf_H.csv NMF basis / coefficient matrices
models/archetype_classifier.joblib Serialized ensemble model
paper/figures/ Publication-ready PDF/PNG plots
paper/tables/ Feature importance and archetype CSV tables

API Reference

Gene Set Utilities

Function Description
get_all_checkpoint_genes() Sorted list of 44 checkpoint gene symbols
get_category_genes(category) Genes for a specific functional category
get_immune_cell_markers() {cell_type: [genes]} reference marker dictionary
get_ligand_receptor_pairs() List of {ligand, receptor, alias} pairs

Core Classes

Class Module Purpose
SpatialDataPreprocessor data.preprocess QC, normalize, dual-input (Space Ranger or H5AD)
SpatialDataLoader data.loader Cache-aware loader for preprocessed H5ADs
SpatialCheckpointProfiler analysis.spatial_expression Region-based expression, hotspot detection
SpatialFeatureEngineer analysis.spatial_features 80+ spatial feature extraction
CheckpointColocalizationAnalyzer analysis.colocalization Ligand-receptor spatial co-occurrence
SpatialArchetypeDiscovery model.archetype_discovery Consensus clustering + NMF
SpatialArchetypeClassifier model.classifier Ensemble classifier (LGBM+XGB+MLP+RF)
ArchetypeModelTrainer model.trainer Full train pipeline with HPO
ArchetypeExplainer model.explainer SHAP global/per-class importance

Development

# Clone and install in dev mode
git clone https://github.com/yourorg/SpatialCheckpoint.git
cd SpatialCheckpoint
pip install -e ".[dev]"

# Run tests (uses synthetic fixtures — no real data needed)
pytest tests/ -v

# Lint
ruff check src/

Testing with synthetic data

All tests use synthetic fixtures from tests/conftest.py. No real Visium files are required:

# 200-spot × 100-gene AnnData with spatial coords and region labels
# 50-sample × 80-feature DataFrame
# Clinical data with OS, PFS, ICI response

Dependencies

Core: scanpy, squidpy, anndata, pandas, numpy, scipy, scikit-learn

ML: lightgbm, xgboost, shap, imbalanced-learn, optuna

Stats: lifelines

Viz: matplotlib, seaborn

CLI: typer, rich

Heavy dependencies (squidpy, lightgbm, xgboost, lifelines, optuna, shap, imbalanced-learn) are imported with try/except fallbacks — partial functionality is available even when these are not installed.


Citation

If you use SpatialCheckpoint in your research, please cite:

@article{spatialcheckpoint2025,
  title   = {SpatialCheckpoint: Spatial heterogeneity profiling of immune checkpoints
             in spatial transcriptomics},
  author  = {},
  journal = {},
  year    = {2025},
}

License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatialcheckpoint-0.1.0.tar.gz (98.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spatialcheckpoint-0.1.0-py3-none-any.whl (99.1 kB view details)

Uploaded Python 3

File details

Details for the file spatialcheckpoint-0.1.0.tar.gz.

File metadata

  • Download URL: spatialcheckpoint-0.1.0.tar.gz
  • Upload date:
  • Size: 98.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for spatialcheckpoint-0.1.0.tar.gz
Algorithm Hash digest
SHA256 152d2b64efe1340bc6723604ab502e99ff33708e4736ef87dbef150f3bca3b35
MD5 02024f7ff267167bb30ffef67ed6407b
BLAKE2b-256 79e95b2c85acfe227d3f427b37d6921eb435209f092a5000be4e0f34d14f7070

See more details on using hashes here.

File details

Details for the file spatialcheckpoint-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for spatialcheckpoint-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 18dec79f08324fb6a4ce153f1476a32a1c645e1a2f45d3e150421e1091054d8a
MD5 f8cfcb09bda7e14f4b004a194f522df6
BLAKE2b-256 effa728dd274f10355b469448057fc7a4c8b30ff2fb1d805f3823a4933f2490c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page