Skip to main content

Official Enrichr Python package for fast local gene set enrichment.

Project description

pyEnrichr - Official Enrichr Python Package

The pyEnrichr Python Fisher Exact test package supports EnrichR libraries and mimics the EnrichR backend that can be executed locally. It has high performance for large gene set libraries. Instant enrichment results for a pure Python implementation of the Fisher Exact Test. This implementation allows the calculation of the same p-values as the Enrichr API, but runs locally and results in faster p-value computation.

Installation

Install Python library using pip.

pip3 install pyenrichr

Enrichment Analysis

To run pyEnrichr in Python run the following command. The result will be a dataframe containing the enriched gene sets of the library as rows, sorted by p-value.

import pyenrichr as pye

# list all libraries from Enrichr
libraries = pye.libraries.list_libraries()

# load a gene set library
lib = pye.libraries.get_library("GO_Biological_Process_2023")

# get example gene set
gene_set = pye.libraries.example_set()

# calculate enrichment for gene set against all gene sets in library
result = pye.enrichment.fisher(gene_set, lib)

lib is a dictionary of sets. pye.enrichment.fisher expects as input a set (gene_set) and a library (lib) in the form of a dictionary of sets.

Example Output

The results are returned as Pandas DataFrames. The columns contain term, p-value, Sidak multiple hypothesis corrected p-value (sidak), False Discovery Rate (fdr), odds ratio (odds), overlap size (overlap), set-size, and gene-overlap.

# Term p-value sidak fdr odds overlap set-size Gene-overlap
1 Regulation Of Cell Population Proliferation... 1.041581e-41 5.655786e-39 5.655786e-39 8.903394 62 766 PDGFRB,TGFB2,CSF1R,CXCL10,CD86,IL4,CTNNB1,STAT...
2 Positive Regulation Of Cell Population Proliferation... 2.914662e-37 1.582661e-34 7.913307e-35 11.159420 49 483 PDGFRB,TGFB2,CSF1R,CD86,IL4,AKT1,EGFR,JAK2,CDK...
3 Positive Regulation Of Cell Migration (GO:0030335) 1.929354e-35 1.047639e-32 3.492131e-33 15.772059 39 272 PDGFRB,TGFB2,CSF1R,ATM,PECAM1,TWIST1,IL4,STAT3...
4 Regulation Of Apoptotic Process (GO:0042981) 9.892051e-34 5.371384e-31 1.342846e-31 8.269504 53 705 CASP9,CXCL10,ATM,RPS6KB1,FAS,IL4,CTNNB1,CD28,A...
5 Positive Regulation Of Intracellular Signal Transmission... 3.297600e-33 1.790597e-30 3.581194e-31 9.847619 47 525 PDGFRB,TGFB2,CD86,CHI3L1,BECN1,ENG,GAPDH,PPARG...

Fisher Initialization

When multiple libraries are computed some calculations can be pre initialized. This will speed up overall execution time. In the example below the 'fisher' object needs to be initialized with a parameter of at least N, where N = a + b + c + d.

import pyenrichr as pye

# initialize calculations
fisher = pye.enrichment.FastFisher(34000)

# load a gene set library
lib_1 = pye.libraries.get_library("GO_Biological_Process_2023")
lib_2 = pye.libraries.get_library("KEGG_2021_Human")

# get example gene set
gene_set = pye.libraries.example_set()

# calculate enrichment for gene set against all gene sets in library 1 and 2
result_1 = pye.enrichment.fisher(gene_set, lib_1, fisher=fisher)
result_2 = pye.enrichment.fisher(gene_set, lib_2, fisher=fisher)

Gene Set Filtering

Small gene sets and small overlaps can be filtered using the parameters min_set_size and min_overlap.

import pyenrichr as pye

# load a gene set library
lib = pye.libraries.get_library("GO_Biological_Process_2023")

# get example gene set
gene_set = pye.libraries.example_set()

# calculate enrichment for gene set against all gene sets in library.
# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.
result = pye.enrichment.fisher(gene_set, lib, min_set_size=10, min_overlap=5)

Enrichment of Gene Set Library vs Gene Set Library

When computing enrichment for multiple gene sets against a gene set library pyEnrichr uses an optimized implementation of overlap detection and multithreading to increase computational speed. In the example below we compute all pairwise enrichment between gene sets in GO Biological Processes. As before it is calling the fisher function, but instead of a gene set as first parameter it receives a gene set library in dictionary format. The output is a list of results containing a result dataframe for each gene set vs gene set library. The results can be consolidated into a single p-value matrix.

import pyenrichr as pye

# load a gene set library
lib = pye.libraries.get_library("GO_Biological_Process_2023")

# calculate enrichment for gene set library against all gene sets in another library.
# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.
result = pye.enrichment.fisher(lib, lib, min_set_size=10, min_overlap=5)

# consolidate all p-values into a single dataframe
pmat = pye.enrichment.consolidate(result)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyenrichr-1.0.2.tar.gz (10.3 MB view details)

Uploaded Source

File details

Details for the file pyenrichr-1.0.2.tar.gz.

File metadata

  • Download URL: pyenrichr-1.0.2.tar.gz
  • Upload date:
  • Size: 10.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for pyenrichr-1.0.2.tar.gz
Algorithm Hash digest
SHA256 7df8e043eb8848591d90b22c5d26c97fb9eec6d4c7fe459e057a8f72e089e525
MD5 05b39221f94ba159e6642d2a1a8677fa
BLAKE2b-256 14d278d41f11d82d0fce14f5da51500fb1789a844f2281e7bcd1e145caedd9dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page