Skip to main content

Tools for analyzing pathway enrichment of gene lists

Project description

pyclusterprofiler

PyPI Python Version

A limited python implementation of clusterProfiler from R, borrowing some functions and concepts from sharepathway and goatools.

Currently KEGG and GO interfaces are implemented.


Installation

You can install pyclusterprofiler via pip:

pip install pyclusterprofiler

Usage

import pyclusterprofiler

To find enriched KEGG pathways in groupings ("cluster" column) of genes ("gene_id" column) identified in df:

df_enrichment = pyclusterprofiler.compare_clusters(df,'cluster',database='KEGG')

Or using GO terms (instead using database="GO-slim" here will use reduced set of terms):

df_enrichment = pyclusterprofiler.compare_clusters(df,'cluster',database='GO')

Example filter for any pathways/annotations with significant enrichment:

significant_pathways = (df_enrichment
	.query('(corrected_pvalue<0.05)&(cluster_pathway_genes>3)')
	['pathway']
	.unique()
	)

Plot results as a dot plot:

ax = pyclusterprofiler.dotplot(df_enrichment.query('pathway in @significant_pathways'))

compare_clusters arguments

argument description
df dataframe with "gene_id" column containing NCBI gene id's and a column specifying group membership
grouping column or list of columns in df to use for group membership
correction method for correcting p-values for multiple hypothesis testing, used as argument to statsmodels.stats.multitest.multipletests (default "fdr_bh")
organism organism databases to download. GO uses NCBI taxid; for KEGG see their organism list (default is human databases for each)
database "KEGG", "GO", or "GO-slim" (default "KEGG")
exclude pathway/annotation groupings to exclude. For KEGG, can be "human_diseases", "organismal_systems," or a list of both (see KEGG pathways). For GO, can be "molecular_function","biological_process", "cellular_component", or a list of one or more (can also use abbreviations "MF","BP","CC" respectively) (default None)
force force fresh download of databases, otherwise uses previously downloaded files if found in the current working directory (default False)
verbose If True, prints provided NCBI gene id's that could not be found in the database (default True)

Contributing

Contributions are very welcome.

License

Distributed under the terms of the MIT license, "pyclusterprofiler" is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyclusterprofiler-0.1.dev14.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyclusterprofiler-0.1.dev14-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file pyclusterprofiler-0.1.dev14.tar.gz.

File metadata

  • Download URL: pyclusterprofiler-0.1.dev14.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2

File hashes

Hashes for pyclusterprofiler-0.1.dev14.tar.gz
Algorithm Hash digest
SHA256 ba94afeed9f057d20e81d69f3a1d1dfa93694f2bdf1f79b4924f3a42365eedcf
MD5 8e303f62fadbf860b73a63d01ff583d3
BLAKE2b-256 b4a9bd1284a7e76cce42a56cf2e981b3ab68d2e36e5a5dcf5bd05020b6fa2b85

See more details on using hashes here.

File details

Details for the file pyclusterprofiler-0.1.dev14-py3-none-any.whl.

File metadata

  • Download URL: pyclusterprofiler-0.1.dev14-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2

File hashes

Hashes for pyclusterprofiler-0.1.dev14-py3-none-any.whl
Algorithm Hash digest
SHA256 df8fd7c371243043cdb6317ab38b3fe23c52db56ed7f279d107ac8cc0417ad46
MD5 91cfc1d49b68f118b3bd51ad6963da56
BLAKE2b-256 1e794c1b34a0b8bf433c151d7106f56e0c16d9180cba986baa6b4e4556bc7f47

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page