Skip to main content

Python reimplementation of scMetabolism for single-cell metabolism analysis

Project description

py-scmetabolism

A pure-Python re-implementation of scMetabolism (Wu et al., Cancer Discovery 2021) for quantifying metabolic pathway activity at single-cell resolution.

  • AnnData-native — drop-in for the scanpy ecosystem
  • No rpy2, no R install required
  • 3–45× faster than R scMetabolism through optimized algorithms
  • Correlation with R scMetabolism ≥ 0.99 for most methods (see below)

Install

pip install py-scmetabolism

Quick-start

import scanpy as sc
import py_scmetabolism as scm

adata = sc.read_h5ad("mydata.h5ad")

scm.sc_metabolism_anndata(adata, method="AUCell", metabolism_type="KEGG")

scm.dimplot_metabolism(adata, pathway="Glycolysis / Gluconeogenesis", reduction="umap")

Results are stored in adata:

Slot Contents
adata.obsm['X_metabolism'] pathway × cell score matrix
adata.uns['metabolism_pathways'] list of pathway names
adata.uns['metabolism_method'] scoring method used

Mathematical implementation

Every algorithm below yields mathematically equivalent results to the R reference.

VISION — library-size normalization + z-score

R VISION applies log2 transformation after library-size normalization:

scaled = expression × (median_col_sum / col_sum)
logged = log2(scaled + 1)
z_normed = (logged - col_mean) / col_std  # ddof=1
score = mean(z_normed[pathway_genes, ])

AUCell — ordinal ranking recovery curve

  1. Rank genes by descending expression (ties preserved)
  2. Take top aucMaxRank = ceil(0.05 × n_genes) genes
  3. Compute AUC of recovery curve (rank vs binary hit/miss)

ssGSEA — rank-based position weighting

  1. Column ranks with ties.method="average", truncated to integer
  2. Weight by |R|^α (α = 0.25)
  3. Position weight from descending sort: pos_weight = n - position + 1
  4. Closed-form walk: sum(Ra × pos_weight) / sum(Ra) - sum_out_pos / (n - k)

GSVA — kernel density estimation

  1. For each gene, compute left_tail = mean(ppois(expr[i,j], expr[i,k] + 0.5))
  2. Apply logit: result = -log((1 - left_tail) / left_tail)
  3. Column ranks with ties.method="last"
  4. Kuiper walk: srs = |p/2 - rank|, dos = p - rank + 1

Benchmarks

All timings on a single Intel Xeon node; correlations computed pathway-by-pathway against R scMetabolism on the same input data (3000 cells × 19281 genes, KEGG pathways).

Method Python R Speedup Correlation (vs R)
VISION 1.8 s 83.4 s 45.7× 0.9988
AUCell 3.7 s 13.0 s 3.5× 0.9327
ssGSEA 7.2 s 21.5 s 3.0× 1.0000
GSVA 20.4 s 886.2 s 43.5× 0.9870

Same algorithm. Same inputs. Significantly faster, numerically faithful.


Notebooks

All notebooks are executed and ship with outputs committed.

Notebook What it covers
examples/tutorial.ipynb Complete metabolic pathway analysis pipeline on human adipocyte scRNA-seq

The tutorial covers:

  1. Loading real single-cell data (3000 cells × 19281 genes)
  2. Computing pathway activity with VISION, AUCell, ssGSEA, GSVA
  3. Visualizing results (UMAP, dot plot, box plot)
  4. Validating Python vs R correlation
  5. Speed comparison

UMAP plot Dot plot Box plot

Supported methods

Method Description
VISION Library-size-normalized mean expression with z-score normalization
AUCell Ordinal ranking-based recovery curve AUC within aucMaxRank cutoff
ssGSEA Rank-based enrichment with position-weighted walking
GSVA Kernel density estimation with Kuiper statistic

Data

Built-in pathway gene sets:

  • KEGG metabolism (85 pathways, 1667 unique genes)
  • REACTOME metabolism (82 pathways)

Citation

If you use this package, please cite the original scMetabolism paper:

Wu, Y. et al. Spatiotemporal Immune Landscape of Colorectal Cancer Liver Metastasis at Single-Cell Level. Cancer Discovery (2021).

License

GNU GPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_scmetabolism-0.1.0.tar.gz (57.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_scmetabolism-0.1.0-py3-none-any.whl (41.8 kB view details)

Uploaded Python 3

File details

Details for the file py_scmetabolism-0.1.0.tar.gz.

File metadata

  • Download URL: py_scmetabolism-0.1.0.tar.gz
  • Upload date:
  • Size: 57.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.13

File hashes

Hashes for py_scmetabolism-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bdb74015aa2e7e83a85070875af2934a75d401c7353dbbf12d4c85750d99b892
MD5 ea95cd23baa13e8bd7e1024d56f52d78
BLAKE2b-256 e1959c8467e7e77f2a46d64e6308ae23b7c1e0df9b07226cbaf9fa1d56a3f8bc

See more details on using hashes here.

File details

Details for the file py_scmetabolism-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for py_scmetabolism-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e7c900be3929e8464f492ff6e9954d6f480639505cb0fe99a22c3d47e396901c
MD5 e66d85b222b984f76d58d701ce37dc75
BLAKE2b-256 3556745d5de40d438dd53be587081ce4691c2d8b73d8ddf99f1343dd5d9efab6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page