Official Enrichr Python package for fast local gene set enrichment.
Project description
pyEnrichr - Official Enrichr Python Package
The pyEnrichr Python Fisher Exact test package supports EnrichR libraries and mimics the EnrichR backend that can be executed locally. It has high performance for large gene set libraries. Instant enrichment results for a pure Python implementation of the Fisher Exact Test. This implementation allows the calculation of the same p-values as the Enrichr API, but runs locally and results in faster p-value computation.
Installation
Install Python library using pip.
pip3 install pyenrichr
Enrichment Analysis
To run pyEnrichr in Python run the following command. The result will be a dataframe containing the enriched gene sets of the library as rows, sorted by p-value.
import pyenrichr as pye
# list all libraries from Enrichr
libraries = pye.libraries.list_libraries()
# load a gene set library
lib = pye.libraries.get_library("GO_Biological_Process_2023")
# get example gene set
gene_set = pye.libraries.example_set()
# calculate enrichment for gene set against all gene sets in library
result = pye.enrichment.fisher(gene_set, lib)
lib
is a dictionary of sets. pye.enrichment.fisher
expects as input a set (gene_set) and a library (lib) in the form of a dictionary of sets.
Example Output
The results are returned as Pandas DataFrames. The columns contain term, p-value, Sidak multiple hypothesis corrected p-value (sidak), False Discovery Rate (fdr), odds ratio (odds), overlap size (overlap), set-size, and gene-overlap.
# | Term | p-value | sidak | fdr | odds | overlap | set-size | Gene-overlap |
---|---|---|---|---|---|---|---|---|
1 | Regulation Of Cell Population Proliferation... | 1.041581e-41 | 5.655786e-39 | 5.655786e-39 | 8.903394 | 62 | 766 | PDGFRB,TGFB2,CSF1R,CXCL10,CD86,IL4,CTNNB1,STAT... |
2 | Positive Regulation Of Cell Population Proliferation... | 2.914662e-37 | 1.582661e-34 | 7.913307e-35 | 11.159420 | 49 | 483 | PDGFRB,TGFB2,CSF1R,CD86,IL4,AKT1,EGFR,JAK2,CDK... |
3 | Positive Regulation Of Cell Migration (GO:0030335) | 1.929354e-35 | 1.047639e-32 | 3.492131e-33 | 15.772059 | 39 | 272 | PDGFRB,TGFB2,CSF1R,ATM,PECAM1,TWIST1,IL4,STAT3... |
4 | Regulation Of Apoptotic Process (GO:0042981) | 9.892051e-34 | 5.371384e-31 | 1.342846e-31 | 8.269504 | 53 | 705 | CASP9,CXCL10,ATM,RPS6KB1,FAS,IL4,CTNNB1,CD28,A... |
5 | Positive Regulation Of Intracellular Signal Transmission... | 3.297600e-33 | 1.790597e-30 | 3.581194e-31 | 9.847619 | 47 | 525 | PDGFRB,TGFB2,CD86,CHI3L1,BECN1,ENG,GAPDH,PPARG... |
Fisher Initialization
When multiple libraries are computed some calculations can be pre initialized. This will speed up overall execution time. In the example below the 'fisher' object needs to be initialized with a parameter of at least N
, where N = a + b + c + d
.
import pyenrichr as pye
# initialize calculations
fisher = pye.enrichment.FastFisher(34000)
# load a gene set library
lib_1 = pye.libraries.get_library("GO_Biological_Process_2023")
lib_2 = pye.libraries.get_library("KEGG_2021_Human")
# get example gene set
gene_set = pye.libraries.example_set()
# calculate enrichment for gene set against all gene sets in library 1 and 2
result_1 = pye.enrichment.fisher(gene_set, lib_1, fisher=fisher)
result_2 = pye.enrichment.fisher(gene_set, lib_2, fisher=fisher)
Gene Set Filtering
Small gene sets and small overlaps can be filtered using the parameters min_set_size
and min_overlap
.
import pyenrichr as pye
# load a gene set library
lib = pye.libraries.get_library("GO_Biological_Process_2023")
# get example gene set
gene_set = pye.libraries.example_set()
# calculate enrichment for gene set against all gene sets in library.
# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.
result = pye.enrichment.fisher(gene_set, lib, min_set_size=10, min_overlap=5)
Enrichment of Gene Set Library vs Gene Set Library
When computing enrichment for multiple gene sets against a gene set library pyEnrichr uses an optimized implementation of overlap detection and multithreading to increase computational speed. In the example below we compute all pairwise enrichment between gene sets in GO Biological Processes. As before it is calling the fisher function, but instead of a gene set as first parameter it receives a gene set library in dictionary format. The output is a list of results containing a result dataframe for each gene set vs gene set library. The results can be consolidated into a single p-value matrix.
import pyenrichr as pye
# load a gene set library
lib = pye.libraries.get_library("GO_Biological_Process_2023")
# calculate enrichment for gene set library against all gene sets in another library.
# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.
result = pye.enrichment.fisher(lib, lib, min_set_size=10, min_overlap=5)
# consolidate all p-values into a single dataframe
pmat = pye.enrichment.consolidate(result)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pyenrichr-1.0.2.tar.gz
.
File metadata
- Download URL: pyenrichr-1.0.2.tar.gz
- Upload date:
- Size: 10.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7df8e043eb8848591d90b22c5d26c97fb9eec6d4c7fe459e057a8f72e089e525 |
|
MD5 | 05b39221f94ba159e6642d2a1a8677fa |
|
BLAKE2b-256 | 14d278d41f11d82d0fce14f5da51500fb1789a844f2281e7bcd1e145caedd9dc |