Skip to main content

Python implementation of the PAGE algorithm

Project description

pyPAGE

PyPI version Python versions Tests License: MIT

pyPAGE is a Python implementation of the conditional-information PAGE framework for gene-set enrichment analysis.

It is designed to infer differential activity of pathways and regulons while accounting for annotation and membership biases using information-theoretic methods.

Approach

Bulk PAGE

Standard gene-set enrichment methods test whether pathway members are non-randomly distributed across a ranked gene list. pyPAGE frames this as an information-theoretic question: how much does knowing a gene's pathway membership tell you about its expression bin?

  1. Discretize continuous expression scores (e.g. log2 fold-change) into equal-frequency bins
  2. Compute mutual information (MI) between expression bins and pathway membership — or conditional MI (CMI), which conditions on how many pathways each gene belongs to, correcting for the bias that heavily-annotated genes drive spurious enrichment
  3. Permutation test to assess significance, with early stopping
  4. Redundancy filtering removes pathways whose signal is explained by an already-accepted pathway (via CMI between memberships)
  5. Hypergeometric enrichment per bin produces the iPAGE-style heatmap showing which expression bins drive each pathway's signal

Single-Cell PAGE

For single-cell data, the question becomes: are pathway scores spatially coherent across the cell manifold? A pathway whose activity varies smoothly across cell states (rather than randomly) is biologically meaningful.

  1. Per-cell scoring — for each cell, compute MI or CMI between gene expression bins and pathway membership across all genes. This produces an (n_cells x n_pathways) score matrix
  2. KNN graph — build a cell-cell k-nearest-neighbor graph from expression (or use a precomputed one from scanpy)
  3. Geary's C — measure spatial autocorrelation of each pathway's scores on the KNN graph. Report C' = 1 - C, where higher values mean the pathway varies coherently across the manifold rather than randomly
  4. Permutation test — generate size-matched random gene sets, compute their C', and derive empirical p-values with BH FDR correction

Installation

Install from PyPI:

pip install bio-pypage

Or install from source:

git clone https://github.com/goodarzilab/pyPAGE
cd pyPAGE
pip install -e .

Quick Start

import pandas as pd
from pypage import PAGE, ExpressionProfile, GeneSets

# 1) Load expression profile (gene, score)
expr = pd.read_csv(
    "example_data/AP2S1.tab.gz",
    sep="\t",
    header=None,
    names=["gene", "score"],
)
exp = ExpressionProfile(expr["gene"], expr["score"], is_bin=True)

# 2) Load annotation (gene, pathway)
ann = pd.read_csv(
    "example_data/GO_BP_2021_index.txt.gz",
    sep="\t",
    header=None,
    names=["gene", "pathway"],
)
gs = GeneSets(ann["gene"], ann["pathway"])

# 3) Run pyPAGE
p = PAGE(exp, gs, n_shuffle=100, k=7, filter_redundant=True)
results, heatmap = p.run()

print(results.head())
heatmap.show()

results contains:

  • pathway
  • CMI — conditional mutual information score
  • z-score — z-score of observed CMI vs. permutation null distribution
  • p-value — empirical p-value from permutation test
  • Regulation pattern (1 for up, -1 for down)

Examples

Use these canonical examples with the bundled example_data/ outputs.

pypage -e example_data/test_DESeq_logFC.txt \
    --gmt example_data/c2.all.v2026.1.Hs.symbols.gmt \
    --type continuous --n-bins 9 \
    --cols GENE,log2FoldChange \
    --seed 42 \
    --outdir example_data/test_DESeq_logFC_cont_PAGE
pypage -e example_data/test_DESeq_logFC.txt \
    --gmt example_data/c2.all.v2026.1.Hs.symbols.gmt \
    --type discrete \
    --cols GENE,log2FoldChange_bin9 \
    --seed 42 \
    --outdir example_data/test_DESeq_logFC_disc_PAGE
pypage-sc --adata example_data/CRC.h5ad \
    --gene-column gene \
    --gmt example_data/c2.all.v2026.1.Hs.symbols.gmt \
    --groupby PhenoGraph_clusters --n-jobs 0 --fast-mode

Expected Outputs (Demo Artifacts)

Bulk continuous (example_data/test_DESeq_logFC_cont_PAGE/):

Bulk discrete (example_data/test_DESeq_logFC_disc_PAGE/):

Single-cell (example_data/CRC_scPAGE/):

Preview Graphics (embedded)

Bulk continuous heatmap (PDF | HTML): Bulk continuous heatmap

Bulk discrete heatmap (PDF | HTML): Bulk discrete heatmap

Single-cell ranking (PDF | Interactive ranking | SC report): Single-cell consistency ranking

Single-cell UMAP pathway example (PDF): Single-cell UMAP pathway

Single-cell group-enrichment example (PDF | Stats TSV): Single-cell group enrichment

Documentation

The detailed user and API documentation now lives in MANUAL.md.

Updated notebooks:

Citation

Bakulin A, Teyssier NB, Kampmann M, Khoroshkin M, Goodarzi H (2024) pyPAGE: A framework for Addressing biases in gene-set enrichment analysis—A case study on Alzheimer's disease. PLoS Computational Biology 20(9): e1012346. https://doi.org/10.1371/journal.pcbi.1012346

License

MIT

About

pyPAGE was developed in the Goodarzi Lab at UCSF by Artemy Bakulin, Noam B. Teyssier, and Hani Goodarzi.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bio_pypage-0.2.1.tar.gz (85.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bio_pypage-0.2.1-py3-none-any.whl (73.5 kB view details)

Uploaded Python 3

File details

Details for the file bio_pypage-0.2.1.tar.gz.

File metadata

  • Download URL: bio_pypage-0.2.1.tar.gz
  • Upload date:
  • Size: 85.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for bio_pypage-0.2.1.tar.gz
Algorithm Hash digest
SHA256 598dc73009222e8e3dfaf3af576d4287d6bd4fd4580eacf89f8341cbd46da8e6
MD5 a21558e862ae25fd982621af878cd4e9
BLAKE2b-256 9e87bda3b73537b1b2e11f41475a8e73e7f921cea714d1580e3ec7c8d76c2445

See more details on using hashes here.

File details

Details for the file bio_pypage-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: bio_pypage-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 73.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for bio_pypage-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 26a1f37613e859c3edff52cd835311eb63141e7051246043f76398b55f519618
MD5 bf45ed26dcd42cbd200b739a9dbeb38b
BLAKE2b-256 7d6d2e367fec1c4204120b05d9364dd8c7bc57ca01f20a4a7d58dbcd994132db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page