Skip to main content

Gene Set Enrichment Analysis in Python

Project description

GSEAPY: Gene Set Enrichment Analysis in Python.

https://badge.fury.io/py/gseapy.svg https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square https://travis-ci.org/zqfang/GSEApy.svg?branch=master Documentation Status https://img.shields.io/badge/license-MIT-blue.svg https://img.shields.io/badge/python-3.5%2B-blue.svg

Please note : from version 0.9.5 on, GSEApy only works on Python 3.5+ and Python 2.x will no longer be supported. For a Python 2 version you can install v0.9.4.

The main documentation for GSEApy can be found at http://gseapy.rtfd.io/

For examples of using gseapy please click here: Example

Release notes : https://github.com/zqfang/gseapy/releases

GSEAPY is a python wrapper for GSEA and Enrichr.

GSEAPY can be used for RNA-seq, ChIP-seq, Microarry data. It can be used for convenient GO enrichment and to produce publication quality figures in python.

GSEAPY has five sub-commands available: gsea, prerank, ssgsea, replot enrichr.

gsea:

The gsea module produces GSEA results. The input requries a txt file(FPKM, Expected Counts, TPM, et.al), a cls file, and gene_sets file in gmt format.

prerank:

The prerank module produces Prerank tool results. The input expects a pre-ranked gene list dataset with correlation values, provided in .rnk format, and gene_sets file in gmt format. prerank module is an API to GSEA pre-rank tools.

ssgsea:

The ssgsea module performs single sample GSEA(ssGSEA) analysis. The input expects a pd.Series (indexed by gene name), or a pd.DataFrame (include GCT file) with expression values and a GMT file. For multiple sample input, ssGSEA reconigzes gct format, too. ssGSEA enrichment score for the gene set is described by D. Barbie et al 2009.

replot:

The replot module reproduce GSEA desktop version results. The only input for GSEApy is the location to GSEA Desktop output results.

enrichr:

The enrichr module enable you perform gene set enrichment analysis using Enrichr API. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr . It runs very fast.

Please use ‘gseapy COMMAND -h’ to see the detail description for each option of each module.

The full GSEA is far too extensive to describe here; see GSEA documentation for more information. All files’ formats for GSEApy are identical to GSEA desktop version.

If you use gseapy in your research, you should cite the original ``GSEA`` and ``Enrichr`` paper.

Why GSEAPY

I would like to use Pandas to explore my data, but I did not find a convenient tool to do gene set enrichment analysis in python. So, here are my reasons:

  • Ability to run inside python interactive console without having to switch to R!!!

  • User friendly for both wet and dry lab users.

  • Produce or reproduce publishable figures.

  • Perform batch jobs easy.

  • Easy to use in bash shell or your data analysis workflow, e.g. snakemake.

GSEA Java version output:

This is an example of GSEA desktop application output

docs/GSEA_OCT4_KD.png

GSEAPY Prerank module output

Using the same data from GSEA, GSEAPY reproduce the example above.

Using Prerank or replot module will reproduce the same figure for GSEA Java desktop outputs

docs/gseapy_OCT4_KD.png

Generated by GSEAPY

GSEAPY figures are supported by all matplotlib figure formats.

You can modify GSEA plots easily in .pdf files. Please Enjoy.

GSEAPY enrichr module

note: For now, enrichr module download enriched results only.

TODO: Save enriched table, grids, networks, bar graphs from website server using phantomJS and selenium.

A graphical introduction of Enrichr

docs/enrichr.PNG

Note: Enrichr uses a list of Entrez gene symbols as input. You should convert all gene names to uppercase.

Installation

Install gseapy package from bioconda or pypi.
# if you have conda
$ conda install -c bioconda gseapy

# for windows users
$ conda install -c bioninja gseapy

# or use pip to install the latest release
$ pip install gseapy
You may instead want to use the development version from Github, by running
$ pip install git+git://github.com/zqfang/gseapy.git#egg=gseapy

Dependency

  • Python 2.7 or 3.5+

Mandatory

  • Numpy >= 1.13.0

  • Pandas

  • Matplotlib

  • Beautifulsoup4

  • Requests(for enrichr API)

You may also need to install lxml, html5lib, if you could not parse xml files.

Run GSEAPY

Before you start:

Unless you know exactly how GSEA works, you should convert all gene symobl names to uppercase first.

For command line usage:

# An example to reproduce figures using replot module.
$ gseapy replot -i ./Gsea.reports -o test


# An example to run GSEA using gseapy gsea module
$ gseapy gsea -d exptable.txt -c test.cls -g gene_sets.gmt -o test

# An example to run Prerank using gseapy prerank module
$ gseapy prerank -r gsea_data.rnk -g gene_sets.gmt -o test

# An example to run ssGSEA using gseapy ssgsea module
$ gseapy ssgsea -d expression.txt -g gene_sets.gmt -o test

# An example to use enrichr api
# see details of -g below, -d  is optional
$ gseapy enrichr -i gene_list.txt -g KEGG_2016 -d pathway_enrichment -o test

Run gseapy inside python console:

  1. Prepare expression.txt, gene_sets.gmt and test.cls required by GSEA, you could do this

import gseapy

# run GSEA.
gseapy.gsea(data='expression.txt', gene_sets='gene_sets.gmt', cls='test.cls', outdir='test')

# run prerank
gseapy.prerank(rnk='gsea_data.rnk', gene_sets='gene_sets.gmt', outdir='test')

# run ssGSEA
gseapy.ssgsea(data="expression.txt", gene_sets= "gene_sets.gmt", outdir='test')


# An example to reproduce figures using replot module.
gseapy.replot(indir='./Gsea.reports', outdir='test')
  1. If you prefer to use Dataframe, dict, list in interactive python console, you could do this.

see detail here: Example

# assign dataframe, and use enrichr library data set 'KEGG_2016'
expression_dataframe = pd.DataFrame()

sample_name = ['A','A','A','B','B','B'] # always only two group,any names you like

# assign gene_sets parameter with enrichr library name or gmt file on your local computer.
gseapy.gsea(data=expression_dataframe, gene_sets='KEGG_2016', cls= sample_names, outdir='test')

# using prerank tool
gene_ranked_dataframe = pd.DataFrame()
gseapy.prerank(rnk=gene_ranked_dataframe, gene_sets='KEGG_2016', outdir='test')

# using ssGSEA
gseapy.ssgsea(data=ssGSEA_dataframe, gene_sets='KEGG_2016', outdir='test')
  1. For enrichr , you could assign a list, pd.Series, pd.DataFrame object, or a txt file (should be one gene name per row.)

# assign a list object to enrichr
gl = ['SCARA3', 'LOC100044683', 'CMBL', 'CLIC6', 'IL13RA1', 'TACSTD2', 'DKKL1', 'CSF1',
     'SYNPO2L', 'TINAGL1', 'PTX3', 'BGN', 'HERC1', 'EFNA1', 'CIB2', 'PMP22', 'TMEM173']

gseapy.enrichr(gene_list=gl, description='pathway', gene_sets='KEGG_2016', outdir='test')

# or a txt file path.
gseapy.enrichr(gene_list='gene_list.txt', description='pathway', gene_sets='KEGG_2016',
               outdir='test', cutoff=0.05, format='png' )

GSEAPY supported gene set libaries :

To see the full list of gseapy supported gene set libraries, please click here: Library

Or use get_library_name function inside python console.

 #see full list of latest enrichr library names, which will pass to -g parameter:
 names = gseapy.get_library_name()

 # show top 20 entries.
 print(names[:20])


['Genome_Browser_PWMs',
'TRANSFAC_and_JASPAR_PWMs',
'ChEA_2013',
'Drug_Perturbations_from_GEO_2014',
'ENCODE_TF_ChIP-seq_2014',
'BioCarta_2013',
'Reactome_2013',
'WikiPathways_2013',
'Disease_Signatures_from_GEO_up_2014',
'KEGG_2016',
'TF-LOF_Expression_from_GEO',
'TargetScan_microRNA',
'PPI_Hub_Proteins',
'GO_Molecular_Function_2015',
'GeneSigDB',
'Chromosome_Location',
'Human_Gene_Atlas',
'Mouse_Gene_Atlas',
'GO_Cellular_Component_2015',
'GO_Biological_Process_2015',
'Human_Phenotype_Ontology',]

Bug Report

If you would like to report any bugs when you running gseapy, don’t hesitate to create an issue on github here, or email me: fzq518@gmail.com

To get help of GSEAPY

Visit the document site at http://gseapy.rtfd.io/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gseapy-0.9.7.tar.gz (44.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gseapy-0.9.7-py3.6.egg (86.3 kB view details)

Uploaded Egg

gseapy-0.9.7-py3-none-any.whl (41.8 kB view details)

Uploaded Python 3

File details

Details for the file gseapy-0.9.7.tar.gz.

File metadata

  • Download URL: gseapy-0.9.7.tar.gz
  • Upload date:
  • Size: 44.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for gseapy-0.9.7.tar.gz
Algorithm Hash digest
SHA256 20697edce63d8bf254905c84aac56aabe9a1c691f0a8b93ac65a012941d50042
MD5 76f8cc429e50ddb1688ba5333aeb92b1
BLAKE2b-256 d8c30f0c932e37671d53eb00b825f23de9d8e489a72f50af1f1152fa9f41d457

See more details on using hashes here.

File details

Details for the file gseapy-0.9.7-py3.6.egg.

File metadata

  • Download URL: gseapy-0.9.7-py3.6.egg
  • Upload date:
  • Size: 86.3 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for gseapy-0.9.7-py3.6.egg
Algorithm Hash digest
SHA256 3ac140b232b413ccd533cd827348c299ca6bf5154c3702082c9c463b09665faf
MD5 8aabc04eab764d5f733de83f1e3c82a9
BLAKE2b-256 2df0bff632c2b9b2198a076ea651585b92b5850404af05fc24d4a70be2497bc7

See more details on using hashes here.

File details

Details for the file gseapy-0.9.7-py3-none-any.whl.

File metadata

  • Download URL: gseapy-0.9.7-py3-none-any.whl
  • Upload date:
  • Size: 41.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for gseapy-0.9.7-py3-none-any.whl
Algorithm Hash digest
SHA256 cb96e7a1469065a7933a823ee01e928fb3d4811e6447f95da846f50f58ac8baa
MD5 0cf83a20c4b42a5f69092458b3e3e76c
BLAKE2b-256 b2a770fa6399e524c3630027dbab0b70736e4269c87919360c0fe3eaf41a5909

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page