Python implementation of the PAGE algorithm
Project description
pyPAGE
pyPAGE is a Python implementation of the conditional-information PAGE framework for gene-set enrichment analysis.
It is designed to infer differential activity of pathways and regulons while accounting for annotation and membership biases using information-theoretic methods.
Approach
Bulk PAGE
Standard gene-set enrichment methods test whether pathway members are non-randomly distributed across a ranked gene list. pyPAGE frames this as an information-theoretic question: how much does knowing a gene's pathway membership tell you about its expression bin?
- Discretize continuous expression scores (e.g. log2 fold-change) into equal-frequency bins
- Compute mutual information (MI) between expression bins and pathway membership — or conditional MI (CMI), which conditions on how many pathways each gene belongs to, correcting for the bias that heavily-annotated genes drive spurious enrichment
- Permutation test to assess significance, with early stopping
- Redundancy filtering removes pathways whose signal is explained by an already-accepted pathway (via CMI between memberships)
- Hypergeometric enrichment per bin produces the iPAGE-style heatmap showing which expression bins drive each pathway's signal
Single-Cell PAGE
For single-cell data, the question becomes: are pathway scores spatially coherent across the cell manifold? A pathway whose activity varies smoothly across cell states (rather than randomly) is biologically meaningful.
- Per-cell scoring — for each cell, compute MI or CMI between gene expression bins and pathway membership across all genes. This produces an (n_cells x n_pathways) score matrix
- KNN graph — build a cell-cell k-nearest-neighbor graph from expression (or use a precomputed one from scanpy)
- Geary's C — measure spatial autocorrelation of each pathway's scores on the KNN graph. Report C' = 1 - C, where higher values mean the pathway varies coherently across the manifold rather than randomly
- Permutation test — generate size-matched random gene sets, compute their C', and derive empirical p-values with BH FDR correction
Installation
Install from PyPI:
pip install bio-pypage
Or install from source:
git clone https://github.com/goodarzilab/pyPAGE
cd pyPAGE
pip install -e .
Quick Start
import pandas as pd
from pypage import PAGE, ExpressionProfile, GeneSets
# 1) Load expression profile (gene, score)
expr = pd.read_csv(
"example_data/AP2S1.tab.gz",
sep="\t",
header=None,
names=["gene", "score"],
)
exp = ExpressionProfile(expr["gene"], expr["score"], is_bin=True)
# 2) Load annotation (gene, pathway)
ann = pd.read_csv(
"example_data/GO_BP_2021_index.txt.gz",
sep="\t",
header=None,
names=["gene", "pathway"],
)
gs = GeneSets(ann["gene"], ann["pathway"])
# 3) Run pyPAGE
p = PAGE(exp, gs, n_shuffle=100, k=7, filter_redundant=True)
results, heatmap = p.run()
print(results.head())
heatmap.show()
results contains:
pathwayCMI— conditional mutual information scorez-score— z-score of observed CMI vs. permutation null distributionp-value— empirical p-value from permutation testRegulation pattern(1for up,-1for down)
Examples
Use these canonical examples with the bundled example_data/ outputs.
pypage -e example_data/test_DESeq_logFC.txt \
--gmt example_data/c2.all.v2026.1.Hs.symbols.gmt \
--type continuous --n-bins 9 \
--cols GENE,log2FoldChange \
--seed 42 \
--outdir example_data/test_DESeq_logFC_cont_PAGE
pypage -e example_data/test_DESeq_logFC.txt \
--gmt example_data/c2.all.v2026.1.Hs.symbols.gmt \
--type discrete \
--cols GENE,log2FoldChange_bin9 \
--seed 42 \
--outdir example_data/test_DESeq_logFC_disc_PAGE
pypage-sc --adata example_data/CRC.h5ad \
--gene-column gene \
--gmt example_data/c2.all.v2026.1.Hs.symbols.gmt \
--groupby PhenoGraph_clusters --n-jobs 0 --fast-mode
Expected Outputs (Demo Artifacts)
Bulk continuous (example_data/test_DESeq_logFC_cont_PAGE/):
Bulk discrete (example_data/test_DESeq_logFC_disc_PAGE/):
Single-cell (example_data/CRC_scPAGE/):
- Interactive SC report
- Consistency ranking PDF
- Example UMAP pathway PDF
- Example group-enrichment PDF
- Group-enrichment stats TSV
Preview Graphics (embedded)
Bulk continuous heatmap (PDF | HTML):
Bulk discrete heatmap (PDF | HTML):
Single-cell ranking (PDF | Interactive ranking | SC report):
Single-cell UMAP pathway example (PDF):
Single-cell group-enrichment example (PDF | Stats TSV):
Documentation
The detailed user and API documentation now lives in MANUAL.md.
Updated notebooks:
- Comprehensive Tutorial
- Bulk PAGE Tutorial
- Single-Cell PAGE Tutorial (CRC)
- Single-Cell PAGE Tutorial (Synthetic)
Citation
Bakulin A, Teyssier NB, Kampmann M, Khoroshkin M, Goodarzi H (2024) pyPAGE: A framework for Addressing biases in gene-set enrichment analysis—A case study on Alzheimer's disease. PLoS Computational Biology 20(9): e1012346. https://doi.org/10.1371/journal.pcbi.1012346
License
MIT
About
pyPAGE was developed in the Goodarzi Lab at UCSF by Artemy Bakulin, Noam B. Teyssier, and Hani Goodarzi.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bio_pypage-0.2.1.tar.gz.
File metadata
- Download URL: bio_pypage-0.2.1.tar.gz
- Upload date:
- Size: 85.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
598dc73009222e8e3dfaf3af576d4287d6bd4fd4580eacf89f8341cbd46da8e6
|
|
| MD5 |
a21558e862ae25fd982621af878cd4e9
|
|
| BLAKE2b-256 |
9e87bda3b73537b1b2e11f41475a8e73e7f921cea714d1580e3ec7c8d76c2445
|
File details
Details for the file bio_pypage-0.2.1-py3-none-any.whl.
File metadata
- Download URL: bio_pypage-0.2.1-py3-none-any.whl
- Upload date:
- Size: 73.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26a1f37613e859c3edff52cd835311eb63141e7051246043f76398b55f519618
|
|
| MD5 |
bf45ed26dcd42cbd200b739a9dbeb38b
|
|
| BLAKE2b-256 |
7d6d2e367fec1c4204120b05d9364dd8c7bc57ca01f20a4a7d58dbcd994132db
|