ROTS gene ranking implementation in Python
Project description
rots-py
Python implementation of the Reproducibility-Optimized Test Statistic (ROTS) for gene ranking from the Bioconductor ROTS package.
ROTS belongs to a familly of gene ranking statistics that aim to rank genes based on evidence for differential expression in two-group comparisons. ROTS is a non-parametric method that uses a permutation test to assess the significance of the observed differential expression. ROTS is designed to be robust to outliers and to be reproducible across different studies.
NOTE: Data should have at least two non-null values per row for both groups.
Installation
pip install rots-py
Usage
import rots
# Load data
data = ...
group = ...
# Run ROTS
result = rots.rots(data, group, B=500, log=True, verbose=True, progress=True)
# Get the ranking
ranking_statistic = result["d"]
fdr = result["fdr"]
logFC = result["logfc"]
pvalue = result["p"]
Methods
rots
Runs the ROTS analysis on the given data. Returns a Python dictionary.
Parameters
data
: A pandas dataframe with genes/proteins as rows and samples as columns. (required)group
: A pandas series with the group labels for each sample. (required)B
: Number of permutations to perform. Default is 500. (optional)K
: Top-list size. (optional)paried
: Whether the samples are paired. Default is False. (optional)seed
: Seed for the random number generator. Default is None. (optional)a1
: Parameter for the ROTS statistic. If both a1 and a2 are specified optimization step is skipped. (optional)a2
: Parameter for the ROTS statistic. If both a1 and a2 are specified optimization step is skipped. (optional)log
: Whether data is log-transformed. Default is False. (optional)progress
: Whether to show a progress bar. Default is False. (optional)verbose
: Whether to print the progress of the analysis. Default is False. (optional)
Returns
Python dict
object with the following keys:
data
: The original dataframe used for the inputB
: Number of permutationsd
: ROTS test statistic for each gene/proteinlogfc
: Log2 fold changep
: P-valueFDR
: False Detection Ratea1
: Optimized parameter a1a2
: Optimized parameter a2k
: Top list size (None
if optimization skipped)R
: Reproducibility score (None
if optimization skipped)Z
: Z-score (None
if optimization skipped)ztable
: Z-score tablecl
: Group labels for each sample
Acknowledgements
This package was developed as part of the EDISS program with the support of Coffey Lab at the Turku Bioscience center.
Changelog
1.1.0
- Ported parts of code to Cython for better performance
- Fixed bugs
1.0.3
- Bug fixes
1.0.2
- Bug fixes
- Added numba for better performance
1.0.0
- Initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.