Skip to main content

Integrative pathway enrichment analysis of multivariate omics data (Python port of the ActivePathways R package).

Project description

ActivePathways (Python)

ActivePathways is a tool for integrative pathway enrichment analysis of multi-omics data. It identifies gene sets (such as pathways or Gene Ontology terms) that are over-represented in a matrix of genes and their p-values across multiple omics datasets. By fusing multiple datasets through p-value merging, ActivePathways surfaces biological signal that is invisible in any single dataset alone.

This is a Python port of the R ActivePathways package, preserving exact functionality and numerical output.

Citation

If you use ActivePathways please cite:

ActivePathways 2.0 (directional integration): Mykhaylo Slobodyanyuk, Alexander T. Bahcheli, et al. Directional integration and pathway enrichment analysis for multi-omics data. Nature Communications 15, 5690 (2024). doi:10.1038/s41467-024-49986-4

ActivePathways 1.0: Marta Paczkowska, Jonathan Barenboim, et al. Integrative pathway enrichment analysis of multivariate omics data. Nature Communications 11, 735 (2020). doi:10.1038/s41467-019-13983-9

Installation

git clone https://github.com/abahcheli/ActivePathways_python.git
cd ActivePathways_python
pip install -e .

To build a distributable conda package (targets the PyPI tarball):

conda build conda.recipe/
conda install --use-local activepathways

Note: conda.recipe/meta.yaml fetches the package from PyPI. Before the package is published to PyPI, the placeholder SHA256 in the recipe must be replaced with the real tarball hash, or the source section temporarily switched to path: .. for a local build.

Dependencies: numpy>=1.22, pandas>=1.5, scipy>=1.9. Python 3.9+ required.

Quick start

The two required inputs are a p-value matrix (genes × omics datasets, TSV) and a GMT file of gene sets. GMT files for common pathway databases can be downloaded from the Bader Lab gene sets page. Gene symbols must match between the matrix and the GMT file.

Command line

activepathways \
  --scores data/Adenocarcinoma_scores_subset.tsv \
  --gmt    data/hsapiens_REAC_subset.gmt \
  --output results.csv

All options beyond --scores, --gmt, and --output are optional:

activepathways \
  --scores            data/Adenocarcinoma_scores_subset.tsv \
  --gmt               data/hsapiens_REAC_subset.gmt \
  --output            results.csv \
  --merge_method      Brown \
  --cutoff            0.1 \
  --significant       0.05 \
  --correction_method holm \
  --geneset_filter_min 5 \
  --geneset_filter_max 1000

Run activepathways --help for a full listing.

Python

import pandas as pd
from activepathways import active_pathways, export_as_csv

scores = pd.read_csv("data/Adenocarcinoma_scores_subset.tsv", sep="\t").set_index("Gene")
scores = scores.fillna(1.0)  # replace missing p-values with 1 (not significant)

results = active_pathways(scores, "data/hsapiens_REAC_subset.gmt")
export_as_csv(results, "results.csv")

print(results[["term_id", "term_name", "adjusted_p_val", "term_size"]].head())
        term_id            term_name  adjusted_p_val  term_size
0  REAC:2424491      DAP12 signaling    4.491268e-05        358
1   REAC:422475        Axon guidance    2.028966e-02        555
2   REAC:177929    Signaling by EGFR    6.245734e-04        366
3  REAC:2559583  Cellular Senescence    6.636060e-05        196
4   REAC:180292     GAB1 signalosome    1.215316e-02        133

The overlap column lists the genes driving enrichment; evidence lists which input datasets contributed.

For extended examples — directional integration, GMT utilities, Cytoscape output, and merging results — see docs/python_examples.md.

Key parameters

Parameter Default Description
scores required DataFrame of p-values (genes × datasets). No NAs — replace with 1.0.
gmt required Path to a GMT file, or a GMT object from read_gmt().
background all GMT genes Custom gene universe for the hypergeometric test.
geneset_filter (5, 1000) (min, max) pathway size to retain.
cutoff 0.1 P-value cutoff for including genes in the ranked list.
significant 0.05 Adjusted p-value threshold for reporting pathways.
merge_method "Fisher" Method for combining p-values across datasets (see table below).
correction_method "holm" Multiple testing correction (holm, BH, bonferroni, etc.).
cytoscape_file_tag None File prefix for writing Cytoscape output files.
scores_direction None Fold-change direction matrix for directional methods.
constraints_vector None Expected directional relationships between datasets (1, -1, 0).

P-value merging methods

Method Directional Description
"Fisher" No Chi-squared combination of log p-values
"Brown" No Fisher's method corrected for between-dataset correlation
"Stouffer" No Z-score combination
"Strube" No Stouffer corrected for correlation
"Fisher_directional" Yes Fisher penalising directional conflicts
"DPM" Yes Brown's method with directional penalty (recommended)
"Stouffer_directional" Yes Stouffer with directional penalty
"Strube_directional" Yes Strube with directional penalty

API reference

Function Description
active_pathways(scores, gmt, ...) Main enrichment function
merge_p_values(scores, method, ...) Combine p-values across datasets
read_gmt(filename) Load a GMT file → GMT object
write_gmt(gmt, filename) Write a GMT object to file
make_background(gmt) Union of all genes across all GMT terms
export_as_csv(results, filename) Save results table to CSV
merge_results(...) Merge standard and directional results for Cytoscape
p_adjust(p, method) Multiple testing correction (mirrors R's p.adjust)
enrichment_analysis(genelist, gmt, background) Ordered hypergeometric test per term

Differences from the R package

R Python
ActivePathways() active_pathways()
export_as_CSV() export_as_csv()
read.GMT() read_gmt()
write.GMT() write_gmt()
data.table output pandas.DataFrame output
scores[is.na(scores)] <- 1 scores.fillna(1.0)

References

  • Slobodyanyuk M*, Bahcheli AT*, et al. Directional integration and pathway enrichment analysis for multi-omics data. Nature Communications (2024). doi:10.1038/s41467-024-49986-4
  • Paczkowska M*, Barenboim J*, et al. Integrative pathway enrichment analysis of multivariate omics data. Nature Communications (2020). doi:10.1038/s41467-019-13983-9
  • Reimand J*, Isserlin R*, et al. Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nature Protocols (2019). doi:10.1038/s41596-018-0103-9

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

activepathways-2.0.6.tar.gz (40.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

activepathways-2.0.6-py3-none-any.whl (35.8 kB view details)

Uploaded Python 3

File details

Details for the file activepathways-2.0.6.tar.gz.

File metadata

  • Download URL: activepathways-2.0.6.tar.gz
  • Upload date:
  • Size: 40.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for activepathways-2.0.6.tar.gz
Algorithm Hash digest
SHA256 c84e40aec50695fe85283e9060ccc780e56f92a346ca2ab40882a289ab588dff
MD5 e839ff9e010ed9cfdc8f5e2b7c2dfb2b
BLAKE2b-256 b5a236a229bbac99cc8e2193367fb90a0a7ca0823b5b001b6df01db5bf011dc9

See more details on using hashes here.

File details

Details for the file activepathways-2.0.6-py3-none-any.whl.

File metadata

  • Download URL: activepathways-2.0.6-py3-none-any.whl
  • Upload date:
  • Size: 35.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for activepathways-2.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 06c67e1db82dfc5572ff94f4a31d358271781493d4f422ef072733e608ccae1a
MD5 8f0926cf364f647f0f18813b07acca88
BLAKE2b-256 7f2d1184160579e6c2f22e8b20c510cc9227e74ea23fd43a1813f543c4c1ca41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page