Skip to main content

Python CLI and module for running the GSVA R bioconductor package with Python Pandas inputs and outputs.

Project description

# GSVA Python CLI

Python hooks for R's GSVA bioconductor package to make working in Pandas easier, and a handy CLI for execution of GSVA.

Autodoc manual is here: https://jason-weirather.github.io/GSVA/

##### Disclaimer

I am not the creator or author of GSVA. This is a CLI and python hook created to make their package easy to use from the command line and python.

##### This is not the offical site for the GSVA bioconductor package

Find the official R package here

https://doi.org/doi:10.18129/B9.bioc.GSVA

##### And if you find this useful, please cite the author's publication

Hänzelmann S, Castelo R and Guinney J (2013). “GSVA: gene set variation analysis for microarray and RNA-Seq data.” BMC Bioinformatics, 14, pp. 7. doi: 10.1186/1471-2105-14-7, http://www.biomedcentral.com/1471-2105/14/7.

## Get GSVA Python CLI

#### Method 1: Install on your system

1. Install R https://www.r-project.org/
2. Install the R bioconductor packaqge GSEABase and GSVA

```
$ Rscript -e 'source("http://bioconductor.org/biocLite.R");\
library(BiocInstaller);\
biocLite(pkgs=c("GSEABase","GSVA"),dep=TRUE)'
```

3. Install this package `$ pip install GSVA`

#### Method 2: Run GSVA via the docker

`$ docker pull vacation/gsva:latest`

## Use GSVA Python CLI in your python code

First install GSVA Python CLI on your system as described above. For details on the `gsva(expression_df,genesets_df,...)` function parameters see https://jason-weirather.github.io/GSVA/

### Example convert a per-sample per-gene expression matrix to a per-sample per-pathway enrichment matrix

Consider this Jupyter notebook workflow

```
import pandas as pd
from GSVA import gsva, gmt_to_dataframe
```

Read in a Broad reference pathway gmt file. Notice the "member" and "name" fields. If you make your own dataframe to use, these are the required column names.

```
genesets_df = gmt_to_dataframe('c2.cp.v6.0.symbols.gmt')
genesets_df.head()
```

| | description | member | name |
|---|---------------------------------------------------|--------|---------------------------------|
| 0 | http://www.broadinstitute.org/gsea/msigdb/card... | ACSS2 | KEGG_GLYCOLYSIS_GLUCONEOGENESIS |
| 1 | http://www.broadinstitute.org/gsea/msigdb/card... | GCK | KEGG_GLYCOLYSIS_GLUCONEOGENESIS |
| 2 | http://www.broadinstitute.org/gsea/msigdb/card... | PGK2 | KEGG_GLYCOLYSIS_GLUCONEOGENESIS |
| 3 | http://www.broadinstitute.org/gsea/msigdb/card... | PGK1 | KEGG_GLYCOLYSIS_GLUCONEOGENESIS |
| 4 | http://www.broadinstitute.org/gsea/msigdb/card... | PDHB | KEGG_GLYCOLYSIS_GLUCONEOGENESIS |

This example has 200 samples

```
expression_df = pd.read_csv('example_expression.csv',index_col=0)
expression_df.iloc[0:5,0:5]
```

| gene_name | S-1 | S-2 | S-3 | S-4 | S-5 |
|-----------|--------|--------|--------|--------|--------|
| MT-CO1 | 13.852 | 12.328 | 13.055 | 11.898 | 10.234 |
| MT-CO2 | 13.406 | 12.383 | 13.281 | 11.578 | 11.156 |
| MT-CO3 | 13.234 | 12.109 | 13.352 | 11.531 | 10.422 |
| MT-ATP8 | 13.805 | 11.789 | 13.414 | 11.883 | 11.141 |
| MT-ATP6 | 13.500 | 11.703 | 13.227 | 11.219 | 10.836 |

The default command runs without verbose message output. but take notice, that genes that are not part of the `expression_df` are dropped from the analysis, and depending on your choice of GSVA method, genes for which there is not enough expression (i.e. all zero expression) will be dropped.

```
pathways_df = gsva(expression_df,genesets_df)
pathways_df.iloc[0:5,0:5]
```

| name | S-1 | S-2 | S-3 | S-4 | S-5 |
|-------------------------|-----------|-----------|-----------|----------|-----------|
| BIOCARTA_41BB_PATHWAY | 0.068631 | 0.257169 | -0.146907 | 0.020151 | -0.234537 |
| BIOCARTA_ACE2_PATHWAY | 0.110822 | -0.222310 | -0.161572 | 0.370659 | -0.003318 |
| BIOCARTA_ACH_PATHWAY | 0.514193 | 0.149291 | 0.226279 | 0.289960 | 0.016071 |
| BIOCARTA_ACTINY_PATHWAY | -0.014494 | 0.407871 | -0.062163 | 0.055607 | 0.424726 |
| BIOCARTA_AGPCR_PATHWAY | 0.622482 | -0.012845 | 0.317349 | 0.286368 | 0.022540 |


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GSVA-1.0.4.tar.gz (8.3 kB view hashes)

Uploaded Source

Built Distribution

GSVA-1.0.4-py3-none-any.whl (11.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page