Python reimplementation of scMetabolism for single-cell metabolism analysis
Project description
py-scmetabolism
A pure-Python re-implementation of scMetabolism (Wu et al., Cancer Discovery 2021) for quantifying metabolic pathway activity at single-cell resolution.
- AnnData-native — drop-in for the scanpy ecosystem
- No
rpy2, no R install required - 3–45× faster than R scMetabolism through optimized algorithms
- Correlation with R scMetabolism ≥ 0.99 for most methods (see below)
Install
pip install py-scmetabolism
Quick-start
import scanpy as sc
import py_scmetabolism as scm
adata = sc.read_h5ad("mydata.h5ad")
scm.sc_metabolism_anndata(adata, method="AUCell", metabolism_type="KEGG")
scm.dimplot_metabolism(adata, pathway="Glycolysis / Gluconeogenesis", reduction="umap")
Results are stored in adata:
| Slot | Contents |
|---|---|
adata.obsm['X_metabolism'] |
pathway × cell score matrix |
adata.uns['metabolism_pathways'] |
list of pathway names |
adata.uns['metabolism_method'] |
scoring method used |
Mathematical implementation
Every algorithm below yields mathematically equivalent results to the R reference.
VISION — library-size normalization + z-score
R VISION applies log2 transformation after library-size normalization:
scaled = expression × (median_col_sum / col_sum)
logged = log2(scaled + 1)
z_normed = (logged - col_mean) / col_std # ddof=1
score = mean(z_normed[pathway_genes, ])
AUCell — ordinal ranking recovery curve
- Rank genes by descending expression (ties preserved)
- Take top
aucMaxRank = ceil(0.05 × n_genes)genes - Compute AUC of recovery curve (rank vs binary hit/miss)
ssGSEA — rank-based position weighting
- Column ranks with
ties.method="average", truncated to integer - Weight by
|R|^α(α = 0.25) - Position weight from descending sort:
pos_weight = n - position + 1 - Closed-form walk:
sum(Ra × pos_weight) / sum(Ra) - sum_out_pos / (n - k)
GSVA — kernel density estimation
- For each gene, compute
left_tail = mean(ppois(expr[i,j], expr[i,k] + 0.5)) - Apply logit:
result = -log((1 - left_tail) / left_tail) - Column ranks with
ties.method="last" - Kuiper walk:
srs = |p/2 - rank|,dos = p - rank + 1
Benchmarks
All timings on a single Intel Xeon node; correlations computed pathway-by-pathway against R scMetabolism on the same input data (3000 cells × 19281 genes, KEGG pathways).
| Method | Python | R | Speedup | Correlation (vs R) |
|---|---|---|---|---|
| VISION | 1.8 s | 83.4 s | 45.7× | 0.9988 |
| AUCell | 3.7 s | 13.0 s | 3.5× | 0.9327 |
| ssGSEA | 7.2 s | 21.5 s | 3.0× | 1.0000 |
| GSVA | 20.4 s | 886.2 s | 43.5× | 0.9870 |
Same algorithm. Same inputs. Significantly faster, numerically faithful.
Notebooks
All notebooks are executed and ship with outputs committed.
| Notebook | What it covers |
|---|---|
examples/tutorial.ipynb |
Complete metabolic pathway analysis pipeline on human adipocyte scRNA-seq |
The tutorial covers:
- Loading real single-cell data (3000 cells × 19281 genes)
- Computing pathway activity with VISION, AUCell, ssGSEA, GSVA
- Visualizing results (UMAP, dot plot, box plot)
- Validating Python vs R correlation
- Speed comparison
Supported methods
| Method | Description |
|---|---|
| VISION | Library-size-normalized mean expression with z-score normalization |
| AUCell | Ordinal ranking-based recovery curve AUC within aucMaxRank cutoff |
| ssGSEA | Rank-based enrichment with position-weighted walking |
| GSVA | Kernel density estimation with Kuiper statistic |
Data
Built-in pathway gene sets:
- KEGG metabolism (85 pathways, 1667 unique genes)
- REACTOME metabolism (82 pathways)
Citation
If you use this package, please cite the original scMetabolism paper:
Wu, Y. et al. Spatiotemporal Immune Landscape of Colorectal Cancer Liver Metastasis at Single-Cell Level. Cancer Discovery (2021).
License
GNU GPL-3.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_scmetabolism-0.1.0.tar.gz.
File metadata
- Download URL: py_scmetabolism-0.1.0.tar.gz
- Upload date:
- Size: 57.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdb74015aa2e7e83a85070875af2934a75d401c7353dbbf12d4c85750d99b892
|
|
| MD5 |
ea95cd23baa13e8bd7e1024d56f52d78
|
|
| BLAKE2b-256 |
e1959c8467e7e77f2a46d64e6308ae23b7c1e0df9b07226cbaf9fa1d56a3f8bc
|
File details
Details for the file py_scmetabolism-0.1.0-py3-none-any.whl.
File metadata
- Download URL: py_scmetabolism-0.1.0-py3-none-any.whl
- Upload date:
- Size: 41.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7c900be3929e8464f492ff6e9954d6f480639505cb0fe99a22c3d47e396901c
|
|
| MD5 |
e66d85b222b984f76d58d701ce37dc75
|
|
| BLAKE2b-256 |
3556745d5de40d438dd53be587081ce4691c2d8b73d8ddf99f1343dd5d9efab6
|