GSVA package for Python

These details have not been verified by PyPI

Project links

Homepage

Project description

pygsva: Gene Set Variation Analysis in Python

Gene Set Variation Analysis (GSVA) is a powerful gene set enrichment method designed for single-sample analysis. It enables pathway-centric analyses of molecular data by shifting the functional unit of analysis from individual genes to gene sets. This approach is particularly useful for bulk microarray, RNA-seq, and other molecular profiling data types, providing a pathway-level view of biological activity.

Overview

GSVA transforms an input gene-by-sample expression data matrix into a gene-set-by-sample expression data matrix, representing pathway activities. This transformed data can then be utilized with classical analytical methods such as:

Differential Expression
Classification
Survival Analysis
Clustering
Correlation Analysis

Additionally, GSVA enables pathway comparisons with other molecular data types, such as microRNA expression, binding data, copy-number variation (CNV), or single nucleotide polymorphisms (SNPs).

Performance

Version 0.2.0 includes significant performance optimizations with vectorized NumPy operations:

Method	Small (1K×100)	Medium (5K×200)	Large (10K×500)
GSVA (Gaussian)	0.5s	4.2s	31.6s
GSVA (none)	0.2s	1.5s	10.9s
ssGSEA	0.2s	1.1s	7.1s
PLAGE	0.4s	2.1s	7.2s
Z-score	0.1s	0.5s	1.1s

For very large datasets (19,000 genes × 1,500 samples), GSVA with kcdf='Gaussian' completes in ~4 minutes (previously 52 minutes).

Methods

The pygsva package provides Python implementations of four single-sample gene set enrichment methods:

1. PLAGE (Pathway Level Analysis of Gene Expression)

Reference: Tomfohr, Lu, and Kepler (2005)
Standardizes expression profiles over the samples.
Performs Singular Value Decomposition (SVD) on each gene set.
The coefficients of the first right-singular vector are returned as pathway activity estimates.

2. Z-Score Method

Reference: Lee et al. (2008)
Standardizes expression profiles over the samples.
Combines standardized values for each gene in a gene set.

3. ssGSEA (Single-Sample Gene Set Enrichment Analysis)

Reference: Barbie et al. (2009)
Calculates enrichment scores as the normalized difference in empirical cumulative distribution functions (CDFs) of gene expression ranks inside and outside the gene set.
By default, the pathway scores are normalized by dividing them by the range of calculated values.

4. GSVA (Default Method)

Reference: Hänzelmann, Castelo, and Guinney (2013)
A non-parametric method using empirical CDFs of gene expression ranks inside and outside the gene set.
Calculates an expression-level statistic to bring gene expression profiles with varying dynamic ranges to a common scale.

Installation

# Install from PyPI
pip install pygsva

# Or install from source
git clone https://github.com/guokai8/pygsva
cd pygsva
pip install .

Usage

from pygsva import *

# Load example data
hsko = load_hsko_data()
pbmc = load_pbmc_data()
gene_sets = {key: group.iloc[:, 0].tolist() for key, group in hsko.groupby(hsko.iloc[:, 2])}

# GSVA (default method)
param = gsvaParam(pbmc, gene_sets=gene_sets, kcdf="Gaussian", n_jobs=4)
result = gsva(param)

# For faster computation on large datasets, use kcdf="none"
param_fast = gsvaParam(pbmc, gene_sets=gene_sets, kcdf="none", n_jobs=4)
result_fast = gsva(param_fast)

# ssGSEA
result_ssgsea = ssgsea(pbmc, gene_sets, n_jobs=4)

# PLAGE
param_plage = plageParam(pbmc, gene_sets=gene_sets, min_size=2)
result_plage = gsva(param_plage)

# Z-score
param_zscore = zscoreParam(pbmc, gene_sets)
result_zscore = gsva(param_zscore)

Key Parameters

kcdf: Kernel for cumulative distribution function estimation
- "Gaussian": Standard GSVA with Gaussian kernel (default)
- "Poisson": For count data (RNA-seq)
- "none": Empirical CDF only (fastest, recommended for large datasets)
n_jobs: Number of CPU cores for parallel processing (-1 for all cores)
min_size/max_size: Filter gene sets by size
use_sparse: Use sparse matrix operations for memory efficiency

References

If you use any of the methods in this package, please cite the corresponding articles:

Tomfohr, Lu, and Kepler (2005) - Pathway Level Analysis of Gene Expression (PLAGE)
Lee et al. (2008) - Z-Score Method
Barbie et al. (2009) - Single Sample Gene Set Enrichment Analysis (ssGSEA)
Hänzelmann, Castelo, and Guinney (2013) - Gene Set Variation Analysis (GSVA)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions please contact guokai8@gmail.com or https://github.com/guokai8/pygsva/issues

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.0

Feb 1, 2026

0.1.7

Dec 6, 2024

0.1.6

Dec 6, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygsva-0.2.0.tar.gz (175.5 kB view details)

Uploaded Feb 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pygsva-0.2.0-py3-none-any.whl (180.1 kB view details)

Uploaded Feb 1, 2026 Python 3

File details

Details for the file pygsva-0.2.0.tar.gz.

File metadata

Download URL: pygsva-0.2.0.tar.gz
Upload date: Feb 1, 2026
Size: 175.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for pygsva-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`307834ca6bcc3faddcb811d7cb1ee6b1aa7c66f3d4003764a33791d1ad42e887`
MD5	`c986fc692d85c4ea6b6e0024e4c42bf3`
BLAKE2b-256	`a2c03aa070c620df7481e268fe1c07f22c47920cde1e00b2ca4a183f034296a6`

See more details on using hashes here.

File details

Details for the file pygsva-0.2.0-py3-none-any.whl.

File metadata

Download URL: pygsva-0.2.0-py3-none-any.whl
Upload date: Feb 1, 2026
Size: 180.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for pygsva-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f5d82ca53275d0288bd2fe8d0a7be59dbd75a83e3a4ee9e56d7240d50512a049`
MD5	`4978e40b417ea9687f0ce3642222b05f`
BLAKE2b-256	`30f442682c5affd7264aaa4831353c69435d7f762fbb53e36099d0beb216baea`

See more details on using hashes here.

pygsva 0.2.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

pygsva: Gene Set Variation Analysis in Python

Overview

Performance

Methods

1. PLAGE (Pathway Level Analysis of Gene Expression)

2. Z-Score Method

3. ssGSEA (Single-Sample Gene Set Enrichment Analysis)

4. GSVA (Default Method)

Installation

Usage

Key Parameters

References

License

Contact

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes