Spatial heterogeneity profiling of immune checkpoints in spatial transcriptomics
Project description
SpatialCheckpoint
Spatial heterogeneity profiling of immune checkpoints in spatial transcriptomics data.
SpatialCheckpoint is a bioinformatics pipeline that integrates spatial gene expression profiling, consensus clustering, ensemble ML classification, SHAP interpretability, and clinical survival analysis to characterize immune checkpoint heterogeneity across the tumor microenvironment.
Features
- Spatial profiling — region-based checkpoint expression across tumor core, invasive margin, stroma, and immune-enriched zones
- 80+ spatial features — co-localization scores, spatial gradients, Moran's I autocorrelation, region ratios
- Archetype discovery — consensus KMeans + NMF across 6 fixed immune archetypes
- Ensemble classification — LightGBM + XGBoost + MLP + Random Forest with SMOTE and Optuna HPO
- SHAP interpretability — global and per-class feature importance
- Clinical associations — Kaplan-Meier curves, Cox proportional hazards, logistic regression on OS/PFS
- Bundled gene panel — 44 curated immune checkpoint genes across 6 functional categories
Installation
pip install spatialcheckpoint
For development:
git clone https://github.com/yourorg/SpatialCheckpoint.git
cd SpatialCheckpoint
pip install -e ".[dev]"
Requirements: Python ≥ 3.10
Quick Start
5-Minute Demo (Synthetic Data)
The following demo runs entirely on synthetic data — no real Visium files required.
import numpy as np
import pandas as pd
import scanpy as sc
import spatialcheckpoint as scp
print(f"SpatialCheckpoint v{scp.__version__}")
# ── 1. Gene panel ────────────────────────────────────────────────────────────
genes = scp.get_all_checkpoint_genes()
print(f"Checkpoint panel: {len(genes)} genes")
print(f" e.g. {genes[:5]}")
pd1_pathway = scp.get_category_genes("co_inhibitory_receptors")
print(f"PD-1 pathway genes: {pd1_pathway}")
# ── 2. Synthetic Visium slide ────────────────────────────────────────────────
rng = np.random.default_rng(42)
n_spots, n_genes = 200, 100
checkpoint_genes_subset = genes[:8]
random_genes = [f"GENE{i:04d}" for i in range(n_genes - len(checkpoint_genes_subset))]
all_genes = random_genes + checkpoint_genes_subset
X = rng.negative_binomial(n=2, p=0.5, size=(n_spots, n_genes)).astype(float)
adata = sc.AnnData(X=X)
adata.var_names = pd.Index(all_genes)
# Spatial coordinates (20 × 10 grid)
gx, gy = np.meshgrid(np.arange(20), np.arange(10))
coords = np.column_stack([gx.ravel(), gy.ravel()]).astype(float)
coords += rng.uniform(-0.1, 0.1, size=coords.shape)
adata.obsm["spatial"] = coords
# Region annotations
regions = ["tumor_core", "invasive_margin", "stroma", "immune_enriched", "necrotic"]
region_list = []
for x, y in coords:
if x < 5 and y < 5: region_list.append("tumor_core")
elif x < 10 and y < 8: region_list.append("invasive_margin")
elif x >= 15: region_list.append("immune_enriched")
elif y >= 8: region_list.append("necrotic")
else: region_list.append("stroma")
adata.obs["region_type"] = pd.Categorical(region_list, categories=regions)
# ── 3. Spatial feature extraction ───────────────────────────────────────────
engineer = scp.SpatialFeatureEngineer(adata, checkpoint_genes_subset)
features = engineer.extract_all_features(sample_id="demo_sample")
print(f"\nFeature matrix: {features.shape[0]} samples × {features.shape[1]} features")
print(f" Feature columns (first 5): {list(features.columns[:5])}")
# ── 4. Archetype discovery ───────────────────────────────────────────────────
# Build a multi-sample feature matrix (simulate 30 samples)
n_samples, n_feats = 30, features.shape[1]
feat_data = rng.standard_normal((n_samples, n_feats))
sample_ids = [f"sample_{i:03d}" for i in range(n_samples)]
feature_matrix = pd.DataFrame(feat_data, index=sample_ids, columns=features.columns)
cancer_types = rng.choice(["BRCA", "CRC", "NSCLC"], size=n_samples)
metadata = pd.DataFrame({"cancer_type": cancer_types}, index=sample_ids)
discovery = scp.SpatialArchetypeDiscovery(feature_matrix, metadata)
result = discovery.consensus_clustering(k_range=(2, 5), n_iterations=30)
print(f"\nConsensus clustering:")
print(f" Optimal k = {result['optimal_k']}")
print(f" Label distribution: {dict(pd.Series(result['labels']).value_counts())}")
char_df = discovery.characterize_archetypes(result["labels"])
print(f"\nArchetype characterization:")
print(char_df[["archetype_name", "n_samples"]].to_string())
# ── 5. NMF soft membership ───────────────────────────────────────────────────
nmf_result = discovery.run_nmf(k=result["optimal_k"])
print(f"\nNMF decomposition:")
print(f" W (membership weights): {nmf_result['W'].shape}")
print(f" H (archetype profiles): {nmf_result['H'].shape}")
print(f" Explained variance: {nmf_result['explained_variance']:.3f}")
Python API
1. Data Preprocessing
import spatialcheckpoint as scp
# From Space Ranger output directory
preprocessor = scp.SpatialDataPreprocessor(spaceranger_out_path="path/to/spaceranger/output")
adata = preprocessor.load_visium()
adata = preprocessor.quality_control(adata, min_genes=200, max_mt_pct=25.0)
adata = preprocessor.normalize(adata)
adata.write_h5ad("data/processed/sample01_preprocessed.h5ad")
# Or from an existing H5AD
preprocessor = scp.SpatialDataPreprocessor(h5_path="existing_data.h5ad")
2. Load & Cache
loader = scp.SpatialDataLoader(processed_dir="data/processed/")
adata = loader.load("sample01") # returns cached .h5ad if present
3. Checkpoint Profiling
genes = scp.get_all_checkpoint_genes() # 44 genes, 6 functional categories
profiler = scp.SpatialCheckpointProfiler(adata, genes)
region_expr = profiler.expression_by_region() # DataFrame: region × gene
hotspots = profiler.checkpoint_hotspot_detection() # Moran's I per gene
4. Spatial Feature Engineering
engineer = scp.SpatialFeatureEngineer(adata, genes)
features = engineer.extract_all_features(sample_id="sample01")
# → DataFrame with 80+ columns: co-localization, gradients, Moran's I, region ratios
5. Co-localization Analysis
lr_pairs = scp.get_ligand_receptor_pairs() # [{ligand, receptor, alias}]
analyzer = scp.CheckpointColocalizationAnalyzer(adata, genes)
coloc_df = analyzer.compute_colocalization()
6. Archetype Discovery
# feature_matrix: DataFrame (n_samples × n_features)
# sample_metadata: DataFrame with 'cancer_type' column, same index as feature_matrix
discovery = scp.SpatialArchetypeDiscovery(feature_matrix, sample_metadata)
cc = discovery.consensus_clustering(k_range=(2, 8), n_iterations=100)
labels = cc["labels"] # integer cluster labels
char = discovery.characterize_archetypes(labels) # archetype names, top features
nmf = discovery.run_nmf(k=cc["optimal_k"])
# nmf["W"] → (n_samples, k) soft membership weights
# nmf["H"] → (k, n_features) archetype profiles
7. Train the Ensemble Classifier
trainer = scp.ArchetypeModelTrainer(
feature_matrix=feature_matrix,
archetype_labels=labels,
output_dir="models/",
)
results = trainer.run(n_optuna_trials=30)
# results["model"] → trained ensemble
# results["test_metrics"] → accuracy, F1, AUC
8. SHAP Explanations
explainer = scp.ArchetypeExplainer(results["model"], feature_matrix)
shap_df = explainer.global_feature_importance() # DataFrame: feature × archetype
CLI
# Download a registered dataset
spatialcheckpoint download BRCA_visium_10x
# Download all BRCA datasets
spatialcheckpoint download all --cancer-type BRCA
# Preprocess raw Visium output or H5AD
spatialcheckpoint preprocess path/to/spaceranger/ data/processed/
spatialcheckpoint preprocess sample.h5ad data/processed/
# Run full spatial analysis on a preprocessed sample
spatialcheckpoint analyze sample01
# Discover archetypes from a feature matrix CSV
spatialcheckpoint discover results/sample01/features.csv --k-min 2 --k-max 8
# Train the archetype classifier
spatialcheckpoint classify features.csv archetype_labels.csv --model-dir models/
# Generate publication figures (requires prior analyze run)
spatialcheckpoint figures --results-dir results/ --output-dir paper/figures/
Gene Panel
The bundled panel covers 44 genes across 6 functional categories:
| Category | Genes (examples) |
|---|---|
| Co-inhibitory receptors | PDCD1 (PD-1), CTLA4, LAG3, HAVCR2 (TIM-3), TIGIT |
| Co-inhibitory ligands | CD274 (PD-L1), PDCD1LG2 (PD-L2), LGALS9 (Galectin-9) |
| Novel checkpoints | VSIR (VISTA), CD276 (B7-H3), VTCN1 (B7-H4) |
| Innate checkpoints | CD47, SIRPA, LILRB1, LILRB2 |
| Immune enzymes | IDO1, ENTPD1 (CD39), NT5E (CD73), ARG1 |
| Co-stimulatory reference | CD28, ICOS, TNFRSF4 (OX40), TNFRSF9 (4-1BB) |
import spatialcheckpoint as scp
all_genes = scp.get_all_checkpoint_genes() # 44 genes sorted
pd1_pathway = scp.get_category_genes("co_inhibitory_receptors") # 9 genes
cell_markers = scp.get_immune_cell_markers() # {cell_type: [genes]}
lr_pairs = scp.get_ligand_receptor_pairs() # [{ligand, receptor, alias}]
Archetypes
Six fixed spatial immune archetypes are inferred by consensus clustering:
| Archetype | Spatial signature |
|---|---|
Checkpoint-Hot |
High checkpoint expression, high immune infiltration, strong spatial co-localization |
Checkpoint-Cold |
Low checkpoint and immune activity throughout the tissue |
Checkpoint-Excluded |
Checkpoint expression concentrated at invasive margin; immune cells at periphery |
Checkpoint-Mismatch |
Checkpoint and immune signals spatially separated (non-overlapping) |
Innate-Dominant |
CD47/SIRPα axis dominant over adaptive checkpoints |
Novel-Enriched |
VISTA / B7-H3 / B7-H4 enriched over canonical PD-1/PD-L1 axis |
Pipeline Architecture
Raw Visium data (Space Ranger dir or H5AD)
→ SpatialDataPreprocessor QC, normalize → 'counts' / 'log1p' layers
→ SpatialDataLoader cache-aware H5AD loader
→ SpatialCheckpointProfiler region-based expression
(tumor_core, invasive_margin, stroma,
immune_enriched, necrotic)
→ SpatialFeatureEngineer 80+ features per slide:
co-localization, gradients, Moran's I,
region expression ratios
→ SpatialArchetypeDiscovery consensus KMeans + delta-area k-selection
+ NMF soft membership
→ ArchetypeModelTrainer LightGBM + XGBoost + MLP + RF ensemble,
SMOTE oversampling, RFECV feature selection,
Optuna hyperparameter optimization
→ ArchetypeExplainer SHAP global / per-class feature importance
→ ClinicalAssociationAnalyzer KM curves, Cox PH, logistic regression (OS/PFS)
→ Visualization spatial plots, publication-ready figures
Key data contracts:
- Spatial coordinates in
adata.obsm['spatial'] - Region annotations in
adata.obs['region_type'](categorical) - Preprocessed files:
data/processed/{sample_id}_preprocessed.h5ad
Output Files
| Path | Contents |
|---|---|
results/{sample_id}/features.csv |
80+ spatial features |
results/{sample_id}/region_expression.csv |
Region × gene expression stats |
results/{sample_id}/hotspots.csv |
Moran's I per gene |
results/{sample_id}/colocalization.csv |
Ligand-receptor co-occurrence |
results/archetypes/archetype_labels.csv |
Sample → archetype assignment |
results/archetypes/archetype_characteristics.csv |
Per-archetype feature profiles |
results/archetypes/nmf_W.csv, nmf_H.csv |
NMF basis / coefficient matrices |
models/archetype_classifier.joblib |
Serialized ensemble model |
paper/figures/ |
Publication-ready PDF/PNG plots |
paper/tables/ |
Feature importance and archetype CSV tables |
API Reference
Gene Set Utilities
| Function | Description |
|---|---|
get_all_checkpoint_genes() |
Sorted list of 44 checkpoint gene symbols |
get_category_genes(category) |
Genes for a specific functional category |
get_immune_cell_markers() |
{cell_type: [genes]} reference marker dictionary |
get_ligand_receptor_pairs() |
List of {ligand, receptor, alias} pairs |
Core Classes
| Class | Module | Purpose |
|---|---|---|
SpatialDataPreprocessor |
data.preprocess |
QC, normalize, dual-input (Space Ranger or H5AD) |
SpatialDataLoader |
data.loader |
Cache-aware loader for preprocessed H5ADs |
SpatialCheckpointProfiler |
analysis.spatial_expression |
Region-based expression, hotspot detection |
SpatialFeatureEngineer |
analysis.spatial_features |
80+ spatial feature extraction |
CheckpointColocalizationAnalyzer |
analysis.colocalization |
Ligand-receptor spatial co-occurrence |
SpatialArchetypeDiscovery |
model.archetype_discovery |
Consensus clustering + NMF |
SpatialArchetypeClassifier |
model.classifier |
Ensemble classifier (LGBM+XGB+MLP+RF) |
ArchetypeModelTrainer |
model.trainer |
Full train pipeline with HPO |
ArchetypeExplainer |
model.explainer |
SHAP global/per-class importance |
Development
# Clone and install in dev mode
git clone https://github.com/yourorg/SpatialCheckpoint.git
cd SpatialCheckpoint
pip install -e ".[dev]"
# Run tests (uses synthetic fixtures — no real data needed)
pytest tests/ -v
# Lint
ruff check src/
Testing with synthetic data
All tests use synthetic fixtures from tests/conftest.py. No real Visium files are required:
# 200-spot × 100-gene AnnData with spatial coords and region labels
# 50-sample × 80-feature DataFrame
# Clinical data with OS, PFS, ICI response
Dependencies
Core: scanpy, squidpy, anndata, pandas, numpy, scipy, scikit-learn
ML: lightgbm, xgboost, shap, imbalanced-learn, optuna
Stats: lifelines
Viz: matplotlib, seaborn
CLI: typer, rich
Heavy dependencies (squidpy, lightgbm, xgboost, lifelines, optuna, shap, imbalanced-learn) are imported with try/except fallbacks — partial functionality is available even when these are not installed.
Citation
If you use SpatialCheckpoint in your research, please cite:
@article{spatialcheckpoint2025,
title = {SpatialCheckpoint: Spatial heterogeneity profiling of immune checkpoints
in spatial transcriptomics},
author = {},
journal = {},
year = {2025},
}
License
MIT License — see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spatialcheckpoint-0.1.0.tar.gz.
File metadata
- Download URL: spatialcheckpoint-0.1.0.tar.gz
- Upload date:
- Size: 98.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
152d2b64efe1340bc6723604ab502e99ff33708e4736ef87dbef150f3bca3b35
|
|
| MD5 |
02024f7ff267167bb30ffef67ed6407b
|
|
| BLAKE2b-256 |
79e95b2c85acfe227d3f427b37d6921eb435209f092a5000be4e0f34d14f7070
|
File details
Details for the file spatialcheckpoint-0.1.0-py3-none-any.whl.
File metadata
- Download URL: spatialcheckpoint-0.1.0-py3-none-any.whl
- Upload date:
- Size: 99.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18dec79f08324fb6a4ce153f1476a32a1c645e1a2f45d3e150421e1091054d8a
|
|
| MD5 |
f8cfcb09bda7e14f4b004a194f522df6
|
|
| BLAKE2b-256 |
effa728dd274f10355b469448057fc7a4c8b30ff2fb1d805f3823a4933f2490c
|