Skip to main content

Optimal Transport-based Matrix Factorization for spatial transcriptomics deconvolution.

Project description

spOT-NMF

PyPI version Python versions License: GPL v3 Publish to PyPI bioRxiv

Optimal Transport-Based Matrix Factorization for Accurate Deconvolution of Spatial Transcriptomics Abdelkareem, A.O. et al.(2025)

spOT-NMF is a Python package for unsupervised deconvolution and discovery of gene programs in spatial transcriptomics. It integrates Optimal Transport (OT) into a non-negative matrix factorization (NMF) framework, enabling robust topic modeling, high-resolution spatial deconvolution, and rich biological annotation.

This package supports the analyses in: spOT-NMF: Optimal Transport-Based Matrix Factorization for Accurate Deconvolution of Spatial Transcriptomics — bioRxiv (2025). DOI: 10.1101/2025.08.02.668292


✨ Key Features

  • OT-NMF Deconvolution: Reference-free topic modeling with OT-regularized NMF.
  • HVG Selection: Flexible, batch-aware highly variable gene selection.
  • Biological Annotation: Automated enrichment and gene-set overlap of inferred programs.
  • Spatial Visualization: Publication-quality spatial plots for topic/program usage.
  • Scalable & Modular: Built for large datasets and multi-sample workflows.
  • CLI & Python API: Run from the command line or import in notebooks.

📦 Installation

spOT-NMF requires Python ≥ 3.12. We recommend uv for a fast, reproducible setup. PyTorch is installed separately so you can pick the build (CPU or CUDA) for your platform.

Recommended: uv

# 1. Create and activate an isolated environment (uv fetches Python 3.12 if needed)
uv venv --python 3.12
# Linux/macOS:  source .venv/bin/activate
# Windows:      .venv\Scripts\activate

# 2. Install PyTorch for your platform (see pytorch.org)
#    CPU-only:
uv pip install torch --index-url https://download.pytorch.org/whl/cpu
#    CUDA 12.x (Linux/Windows with NVIDIA GPUs):
#    uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 3. Install spOT-NMF
uv pip install spot-nmf

Alternative: pip

python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install spot-nmf

From source (development)

git clone https://github.com/MorrissyLab/spOT-NMF.git
cd spOT-NMF
uv venv --python 3.12
uv pip install torch --index-url https://download.pytorch.org/whl/cpu
uv pip install -e ".[dev]"     # editable install with test dependencies
uv run pytest -q               # run the test suite

Verify the install

spotnmf --help

If no GPU is available, spOT-NMF automatically runs on CPU.


🚀 Quick Start

Command Line

Full pipeline (deconvolution → annotation → spatial plots → networks):

spotnmf spotnmf \
  --sample_name SAMPLE1 \
  --adata_path ./data/sample1.h5ad \
  --data_mode h5ad \
  --results_dir ./results \
  --k 5 \
  --genome GRCh38

--data_mode selects how the input is read: h5ad for a single AnnData .h5ad file, visium (the default) for a Space Ranger output directory, or visium_hd for Visium HD. Pass --data_mode h5ad whenever --adata_path points to a .h5ad file.

Other commands:

spotnmf deconvolve --sample_name SAMPLE1 --adata_path ./data/sample1.h5ad --data_mode h5ad --results_dir ./results --k 5
spotnmf plot       --sample_name SAMPLE1 --adata_path ./data/sample1.h5ad --data_mode h5ad --results_dir ./results
spotnmf annotate   --sample_name SAMPLE1 --results_dir ./results --genome GRCh38
spotnmf network    --sample_name SAMPLE1 --results_dir ./results --usage_threshold 0 --n_bins 1000 --edge_threshold 0.199

The network command reuses the per-spot usages written by deconvolve. On small datasets no topic pairs may pass --n_bins / --edge_threshold; in that case it prints a notice and skips plotting — lower the thresholds to force a graph.

Python / Notebooks

from pathlib import Path
import spotnmf as spot

# === Configuration === #
DATA_PATH = Path("data/test_data/dataset10_adata_spatial.h5ad")
RESULTS_DIR = Path(r"/data/test_results/")
SAMPLE_NAME = "TestSample"
GENOME = "mm10"

# === Read Data === #
adata = spot.io.read_adata(
    data_path=DATA_PATH,
    data_mode="h5ad"
)

# === Model Parameters === #
model_params = {
    "lr": 0.001,         # Learning rate
    "h": 0.01,           # H regularization
    "w": 0.01,           # W regularization
    "eps": 0.05,         # Epsilon
    "normalize_rows": True,
}

# === Run Factorization === #
results = spot.cli.run_experiment(
    adata_spatial=adata,
    k=5,                        # Number of ranks
    sample_name=SAMPLE_NAME,
    results_dir=str(RESULTS_DIR),
    genome=GENOME,
    annotate=False,
    plot=False,
    network=False,
    is_visium=True,
    model_params=model_params,
)

# === Annotate Programs === #
spot.cli.annotate_programs(
    results_dir=str(RESULTS_DIR),
    sample_name=SAMPLE_NAME,
    genome=GENOME,
)

📓 Tutorials

A fully worked, well-commented notebook runs the entire pipeline end-to-end on the small example dataset that ships with the repo (CPU-only, ~1 minute) — loading data, selecting HVGs, running the OT-NMF deconvolution, mapping programs spatially, extracting marker genes, and validating the recovered programs against ground-truth cell types. All figures are pre-rendered in the notebook.

GitHub renders the notebook (with figures) directly in the browser — just click the link.


⚙️ CLI Overview

Command Description
spotnmf Full pipeline: deconvolution → annotation → spatial plotting
deconvolve Run OT-NMF and save results
plot Visualize spatial topic/program usage
annotate Enrich and annotate gene programs
network Visualize niche networks based on topic interactions

Run spotnmf <command> --help for per-command options.


📁 Outputs

  • topics_per_spot_{sample}.csv — topic/program usage per spot
  • genescores_per_topic_{sample}.csv — gene scores per topic
  • ranked_genescores_{sample}.csv — ranked marker genes per topic
  • Pathway enrichment and gene-set overlap tables
  • Spatial plots & QC visualizations
  • Network plots of topic–topic interactions

🔬 Reproducibility (Manuscript Notebooks)

The main branch provides the reusable software package. The original Jupyter notebooks used to reproduce manuscript figures are maintained in the manuscript branch:

git fetch origin
git checkout manuscript

Notebooks are in:

scripts/manuscript_notebooks/

Use manuscript to regenerate paper figures; use main for running the package on your data.


🧾 Citation

Please cite:

Abdelkareem, A.O., Gill, G.S., Manoharan, V.T., Verhey, T.B., & Morrissy, A.S. spOT-NMF: Optimal Transport-Based Matrix Factorization for Accurate Deconvolution of Spatial Transcriptomics. bioRxiv (2025). https://doi.org/10.1101/2025.08.02.668292

@article{abdelkareem2025spotnmf,
  title   = {spOT-NMF: Optimal Transport-Based Matrix Factorization for Accurate Deconvolution of Spatial Transcriptomics},
  author  = {Abdelkareem, Aly O. and Gill, Gurveer S. and Manoharan, Varsha Thoppey and Verhey, Theodore B. and Morrissy, A. Sorana},
  journal = {bioRxiv},
  year    = {2025},
  doi     = {10.1101/2025.08.02.668292},
  url     = {https://www.biorxiv.org/content/10.1101/2025.08.02.668292v1},
  note    = {Preprint}
}

🤝 Contributing

We welcome ideas, bug reports, and feature requests—please open a GitHub Issue: https://github.com/MorrissyLab/spOT-NMF/issues


📜 License

GPL-3.0. See LICENSE for details.


💬 Support

Questions or need help? Open an Issue: https://github.com/MorrissyLab/spOT-NMF/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spot_nmf-0.1.2.tar.gz (412.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spot_nmf-0.1.2-py3-none-any.whl (405.6 kB view details)

Uploaded Python 3

File details

Details for the file spot_nmf-0.1.2.tar.gz.

File metadata

  • Download URL: spot_nmf-0.1.2.tar.gz
  • Upload date:
  • Size: 412.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spot_nmf-0.1.2.tar.gz
Algorithm Hash digest
SHA256 678af8b97f8150c58984a1afb78cfce32500bf9f04f8e9bfdac920b743747b49
MD5 56e9c8387482698899639d85b44dc3b2
BLAKE2b-256 df52e188f916c0ea2bb012b08f093c04130301b3984c5cfd20d0cfb5bff1dace

See more details on using hashes here.

Provenance

The following attestation bundles were made for spot_nmf-0.1.2.tar.gz:

Publisher: publish.yml on MorrissyLab/spOT-NMF

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spot_nmf-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: spot_nmf-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 405.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spot_nmf-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4bf5fb1f3b106e1aefbdaddd2b12295aa7e4320169c07d6cd03d0740500f93cd
MD5 3568f6ddd47e7dc60e357410fe29a387
BLAKE2b-256 064a35a097bdbd264de3ee42f9dad67ad3b5b093b911b797d3e6410baba48b77

See more details on using hashes here.

Provenance

The following attestation bundles were made for spot_nmf-0.1.2-py3-none-any.whl:

Publisher: publish.yml on MorrissyLab/spOT-NMF

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page