Tools for analyzing pathway enrichment of gene lists
Project description
pyclusterprofiler
A limited python implementation of clusterProfiler from R, borrowing some functions and concepts from sharepathway and goatools.
Currently KEGG and GO interfaces are implemented.
Installation
You can install pyclusterprofiler
via pip:
pip install pyclusterprofiler
Usage
import pyclusterprofiler
To find enriched KEGG pathways in groupings ("cluster" column) of genes ("gene_id" column) identified in df
:
df_enrichment = pyclusterprofiler.compare_clusters(df,'cluster',database='KEGG')
Or using GO terms (instead using database
="GO-slim" here will use reduced set of terms):
df_enrichment = pyclusterprofiler.compare_clusters(df,'cluster',database='GO')
Example filter for any pathways/annotations with significant enrichment:
significant_pathways = (df_enrichment
.query('(corrected_pvalue<0.05)&(cluster_pathway_genes>3)')
['pathway']
.unique()
)
Plot results as a dot plot:
ax = pyclusterprofiler.dotplot(df_enrichment.query('pathway in @significant_pathways'))
compare_clusters
arguments
argument | description |
---|---|
df |
dataframe with "gene_id" column containing NCBI gene id's and a column specifying group membership |
grouping |
column or list of columns in df to use for group membership |
enrichment_threshold |
threshold on ratio of observed/expected gene counts for test to include in results (default 1) |
correction |
method for correcting p-values for multiple hypothesis testing, used as argument to statsmodels.stats.multitest.multipletests (default "fdr_bh") |
organism |
organism databases to download. GO uses NCBI taxid; for KEGG see their organism list (default is human databases for each) |
database |
"KEGG", "GO", or "GO-slim" (default "KEGG") |
exclude |
pathway/annotation groupings to exclude. For KEGG, can be "human_diseases", "organismal_systems," or a list of both (see KEGG pathways). For GO, can be "molecular_function","biological_process", "cellular_component", or a list of one or more (can also use abbreviations "MF","BP","CC" respectively) (default None) |
force |
force fresh download of databases, otherwise uses previously downloaded files if found in the current working directory (default False) |
verbose |
If True, prints provided NCBI gene id's that could not be found in the database (default True) |
Contributing
Contributions are very welcome.
License
Distributed under the terms of the MIT license, "pyclusterprofiler" is free and open source software.
Issues
If you encounter any problems, please file an issue along with a detailed description.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyclusterprofiler-0.1.dev10.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f64e8fdf3b699d436a5fb90ff06bb4ad993434c76288fa4b4081dcb092345c6 |
|
MD5 | 2c402e2889ce3d3532c542192dc2cc33 |
|
BLAKE2b-256 | d6d91620e92fd8ca268d6099d8a55f526172526950b730fdc2ea8821d5757dfa |
Hashes for pyclusterprofiler-0.1.dev10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57a291c5532c9fdfb2be2933b2d4f233656eb52102ec45461584521b5dcb184b |
|
MD5 | 17f25c559896651f4a60ac35a6dc6925 |
|
BLAKE2b-256 | 4ee3781dda729133dc9a023ad1e1f262c194e6701989dbf7b1129e1bc1fd5465 |