Integrative pathway enrichment analysis of multivariate omics data (Python port of the ActivePathways R package).
Project description
ActivePathways (Python)
ActivePathways is a tool for integrative pathway enrichment analysis of multi-omics data. It identifies gene sets (such as pathways or Gene Ontology terms) that are over-represented in a matrix of genes and their p-values across multiple omics datasets. By fusing multiple datasets through p-value merging, ActivePathways surfaces biological signal that is invisible in any single dataset alone.
This is a Python port of the R ActivePathways package, preserving exact functionality and numerical output.
Citation
If you use ActivePathways please cite:
ActivePathways 2.0 (directional integration): Mykhaylo Slobodyanyuk, Alexander T. Bahcheli, et al. Directional integration and pathway enrichment analysis for multi-omics data. Nature Communications 15, 5690 (2024). doi:10.1038/s41467-024-49986-4
ActivePathways 1.0: Marta Paczkowska, Jonathan Barenboim, et al. Integrative pathway enrichment analysis of multivariate omics data. Nature Communications 11, 735 (2020). doi:10.1038/s41467-019-13983-9
Installation
git clone https://github.com/abahcheli/ActivePathways_python.git
cd ActivePathways_python
pip install -e .
To build a distributable conda package (targets the PyPI tarball):
conda build conda.recipe/
conda install --use-local activepathways
Note:
conda.recipe/meta.yamlfetches the package from PyPI. Before the package is published to PyPI, the placeholder SHA256 in the recipe must be replaced with the real tarball hash, or thesourcesection temporarily switched topath: ..for a local build.
Dependencies: numpy>=1.22, pandas>=1.5, scipy>=1.9. Python 3.9+ required.
Quick start
The two required inputs are a p-value matrix (genes × omics datasets, TSV) and a GMT file of gene sets. GMT files for common pathway databases can be downloaded from the Bader Lab gene sets page. Gene symbols must match between the matrix and the GMT file.
Command line
activepathways \
--scores data/Adenocarcinoma_scores_subset.tsv \
--gmt data/hsapiens_REAC_subset.gmt \
--output results.csv
All options beyond --scores, --gmt, and --output are optional:
activepathways \
--scores data/Adenocarcinoma_scores_subset.tsv \
--gmt data/hsapiens_REAC_subset.gmt \
--output results.csv \
--merge_method Brown \
--cutoff 0.1 \
--significant 0.05 \
--correction_method holm \
--geneset_filter_min 5 \
--geneset_filter_max 1000
Run activepathways --help for a full listing.
Python
import pandas as pd
from activepathways import active_pathways, export_as_csv
scores = pd.read_csv("data/Adenocarcinoma_scores_subset.tsv", sep="\t").set_index("Gene")
scores = scores.fillna(1.0) # replace missing p-values with 1 (not significant)
results = active_pathways(scores, "data/hsapiens_REAC_subset.gmt")
export_as_csv(results, "results.csv")
print(results[["term_id", "term_name", "adjusted_p_val", "term_size"]].head())
term_id term_name adjusted_p_val term_size
0 REAC:2424491 DAP12 signaling 4.491268e-05 358
1 REAC:422475 Axon guidance 2.028966e-02 555
2 REAC:177929 Signaling by EGFR 6.245734e-04 366
3 REAC:2559583 Cellular Senescence 6.636060e-05 196
4 REAC:180292 GAB1 signalosome 1.215316e-02 133
The overlap column lists the genes driving enrichment; evidence lists which input datasets contributed.
For extended examples — directional integration, GMT utilities, Cytoscape output, and merging results — see docs/python_examples.md.
Key parameters
| Parameter | Default | Description |
|---|---|---|
| scores | required | DataFrame of p-values (genes × datasets). No NAs — replace with 1.0. |
| gmt | required | Path to a GMT file, or a GMT object from read_gmt(). |
| background | all GMT genes | Custom gene universe for the hypergeometric test. |
| geneset_filter | (5, 1000) | (min, max) pathway size to retain. |
| cutoff | 0.1 | P-value cutoff for including genes in the ranked list. |
| significant | 0.05 | Adjusted p-value threshold for reporting pathways. |
| merge_method | "Fisher" | Method for combining p-values across datasets (see table below). |
| correction_method | "holm" | Multiple testing correction (holm, BH, bonferroni, etc.). |
| cytoscape_file_tag | None | File prefix for writing Cytoscape output files. |
| scores_direction | None | Fold-change direction matrix for directional methods. |
| constraints_vector | None | Expected directional relationships between datasets (1, -1, 0). |
P-value merging methods
| Method | Directional | Description |
|---|---|---|
| "Fisher" | No | Chi-squared combination of log p-values |
| "Brown" | No | Fisher's method corrected for between-dataset correlation |
| "Stouffer" | No | Z-score combination |
| "Strube" | No | Stouffer corrected for correlation |
| "Fisher_directional" | Yes | Fisher penalising directional conflicts |
| "DPM" | Yes | Brown's method with directional penalty (recommended) |
| "Stouffer_directional" | Yes | Stouffer with directional penalty |
| "Strube_directional" | Yes | Strube with directional penalty |
API reference
| Function | Description |
|---|---|
| active_pathways(scores, gmt, ...) | Main enrichment function |
| merge_p_values(scores, method, ...) | Combine p-values across datasets |
| read_gmt(filename) | Load a GMT file → GMT object |
| write_gmt(gmt, filename) | Write a GMT object to file |
| make_background(gmt) | Union of all genes across all GMT terms |
| export_as_csv(results, filename) | Save results table to CSV |
| merge_results(...) | Merge standard and directional results for Cytoscape |
| p_adjust(p, method) | Multiple testing correction (mirrors R's p.adjust) |
| enrichment_analysis(genelist, gmt, background) | Ordered hypergeometric test per term |
Differences from the R package
| R | Python |
|---|---|
| ActivePathways() | active_pathways() |
| export_as_CSV() | export_as_csv() |
| read.GMT() | read_gmt() |
| write.GMT() | write_gmt() |
| data.table output | pandas.DataFrame output |
| scores[is.na(scores)] <- 1 | scores.fillna(1.0) |
References
- Slobodyanyuk M*, Bahcheli AT*, et al. Directional integration and pathway enrichment analysis for multi-omics data. Nature Communications (2024). doi:10.1038/s41467-024-49986-4
- Paczkowska M*, Barenboim J*, et al. Integrative pathway enrichment analysis of multivariate omics data. Nature Communications (2020). doi:10.1038/s41467-019-13983-9
- Reimand J*, Isserlin R*, et al. Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nature Protocols (2019). doi:10.1038/s41596-018-0103-9
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file activepathways-2.0.6.tar.gz.
File metadata
- Download URL: activepathways-2.0.6.tar.gz
- Upload date:
- Size: 40.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c84e40aec50695fe85283e9060ccc780e56f92a346ca2ab40882a289ab588dff
|
|
| MD5 |
e839ff9e010ed9cfdc8f5e2b7c2dfb2b
|
|
| BLAKE2b-256 |
b5a236a229bbac99cc8e2193367fb90a0a7ca0823b5b001b6df01db5bf011dc9
|
File details
Details for the file activepathways-2.0.6-py3-none-any.whl.
File metadata
- Download URL: activepathways-2.0.6-py3-none-any.whl
- Upload date:
- Size: 35.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06c67e1db82dfc5572ff94f4a31d358271781493d4f422ef072733e608ccae1a
|
|
| MD5 |
8f0926cf364f647f0f18813b07acca88
|
|
| BLAKE2b-256 |
7f2d1184160579e6c2f22e8b20c510cc9227e74ea23fd43a1813f543c4c1ca41
|