epcy · PyPI

Evaluattion of Predictive CapabilitY for ranking biomarker candidates.

These details have not been verified by PyPI

Project links

Homepage

Project description

https://img.shields.io/badge/python-3.6-blue.svg

https://travis-ci.org/iric-soft/epcy.svg?branch=master

https://codecov.io/gh/iric-soft/epcy/branch/master/graph/badge.svg

Citing:

EPCY: Evaluation of Predictive CapabilitY for ranking biomarker gene candidates. Poster at ISMB ECCB 2019: https://f1000research.com/posters/8-1349

Introduction:

This tool was developed to Evaluate Predictive CapabilitY of each gene (feature) to become a predictive (bio)marker candidates. Documentation is available via Read the Docs.

Requirements:

python3
(Optional) virtualenv

Install:

Using pypi:

pip install epcy

From source:

python3 -m venv $HOME/.virtualenvs/epcy
source $HOME/.virtualenvs/epcy/bin/activate
pip install pip setuptools --upgrade
pip install wheel
cd [your_epcy_folder]
# If need it
# CFLAGS=-std=c99 pip3 install numpy==1.17.0
python3 setup.py install
epcy -h

Usage:

General:

After install:

epcy -h

From source:

cd [your_epcy_folder]
python3 -m epcy -h

Generic case:

EPCY is design to work on any quantitative data, provided that values of each feature are comparable between each samples (normalized).
To run a comparative analysis, epcy pred need two tabulated files:
- A matrix of quantitative normalized data for each samples (column) with an “ID” column to identify each feature.
- A design table which describe the comparison.

# Run epcy on any normalized quantification data
epcy pred -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_condition
# If your data require a log2 transforamtion, add --log
epcy pred --log -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_condition

Result will be saved in prediction_capability.xls file, which is detail below.
You can personalize the design file using –condition –query

epcy pred_rna -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/condition2 --condition condition2 --query A

Working on RNA sequencing readcounts:

To run EPCY on readcounts not normalized use pred_rna tool as follow:

# To run on read count not normalized, add --cpm --log
epcy pred_rna --cpm --log -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_condition

Working on kallisto quantification:

EPCY allow to work directly on kallisto quantificaion using h5 files and have access to bootstrapped samples. To do so, a kallisto column need to be add to the design file (to specify the directory path where to find abundant.h5 file for each sample) and epcy pred_rna need to run as follow:

# To run on kallisto quantification, add --kall (+ --cpm --log)
epcy pred_rna --kal --cpm --log -d ./data/small_leucegene/5_inv16_vs_5/design.tsv -o ./data/small_leucegene/5_inv16_vs_5/
# !!! Take care kallisto quantification is on transcript not on gene

To run on gene level, a gff3 file of the genome annotation is needed, to have the correspondence between transcript and gene. This file can be download on ensembl

# To run on kallisto quantification and gene level, add --gene --anno [file.gff] (+ --kall --cpm --log)
epcy pred_rna --kal --cpm --log --gene --anno ./data/small_genome/Homo_sapiens.GRCh38.84.reduce.gff3 -d ./data/small_leucegene/5_inv16_vs_5/design.tsv -o ./data/small_leucegene/5_inv16_vs_5/

kallisto quantification allow to work on TPM:

# work on TPM, replace --cpm by --tpm
epcy pred_rna --kal --tpm --log --gene --anno ./data/small_genome/Homo_sapiens.GRCh38.84.reduce.gff3 -d ./data/small_leucegene/5_inv16_vs_5/design.tsv -o ./data/small_leucegene/5_inv16_vs_5/

Output:

predictive_capability.xls

This file is the main output which contain the evaluation of each features (genes, proteins, …). It’s a tabulated files 9 columns:

Default columns:
- id: the id of each feature.
- l2fc: log2 Fold change.
- kernel_mcc: Matthews Correlation Coefficient (MCC) compute by a predictor using KDE.
- kernel_mcc_low, kernel_mcc_high: boundaries of confidence interval (90%).
- mean_query: mean(values) of samples specify as Query in design.tsv
- mean_ref: mean(values) of samples specify as Ref in design.ts
- bw_query: Estimate bandwidth used by KDE, to calculate the density of query samples
- bw_ref: Estimate bandwidth used by KDE, to calculate the density of ref samples
Using –normal:
- normal_mcc: MCC compute a predictor using normal distributions.
Using –auc –utest:
- auc: Area Under the Curve
- u_pv: pvalue compute by a MannWhitney rank test
Using –ttest:
- t_pv: pvalue compute by ttest_ind

condition_predicted.xls

Using –full a secondary output file (condition_predicted.xls) specify for each features if the sample as been correctly predicted. Build an heatmap with this output could help you to explore your data. More details coming soon.

Bagging:

To improve the stability and accuracy of MCC computed, you can add n bagging (using -b n)

#Take care, it's take n time more longer!!!, use multiprocess (-t) seems a good idea :).
epcy pred_rna -b 4 -t 4 --cpm --log -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_condition

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.2.6.4

Jun 13, 2024

0.2.6.3

Jun 13, 2024

0.2.5

Dec 19, 2023

0.2.4

Mar 8, 2023

0.2.3

Dec 7, 2021

0.2.2

Jun 18, 2021

0.2.1

Jun 7, 2021

0.2.0

May 28, 2021

0.1.3

May 15, 2021

This version

0.1.2

May 15, 2021

0.0.2

May 5, 2021

0.0.1

Apr 1, 2020

0.0.0

Mar 31, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epcy-0.1.2.tar.gz (32.9 kB view hashes)

Uploaded May 15, 2021 Source

Built Distribution

epcy-0.1.2-py3-none-any.whl (40.5 kB view hashes)

Uploaded May 15, 2021 Python 3

Hashes for epcy-0.1.2.tar.gz

Hashes for epcy-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`01a0612e3dc0f2afa739f6e7a21653c8dee49f3265aab2b5f57f81b14276da2c`
MD5	`b274521375b59ea0635a1b38ade58d77`
BLAKE2b-256	`6b571c7a67d3910ffc2dba9897884cb6cfc6259a94278cb90a07feb072cb7390`

Hashes for epcy-0.1.2-py3-none-any.whl

Hashes for epcy-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`549c51ef7d2ea0a093091ac6ae17e4069cd1cfed215c4659215a99cedfa9da91`
MD5	`beb53724104b0db8fc3d909c5236da3a`
BLAKE2b-256	`ddf1c801cbfd5560c9e0389e598b73d004ec4349fd20cd4f299faa5575c0cb13`