Evaluattion of Predictive CapabilitY for ranking biomarker candidates.
Project description
|
|
|
|
Citing:
EPCY: Evaluation of Predictive CapabilitY for ranking biomarker gene candidates. Poster at ISMB ECCB 2019: https://f1000research.com/posters/8-1349
Introduction:
This tool was developed to Evaluate Predictive CapabilitY of each feature to become a biomarker candidates.
Requirements:
python3
(Optional) virtualenv
Install:
python3 -m venv $HOME/.virtualenvs/epcy
source $HOME/.virtualenvs/epcy/bin/activate
cd [your_epcy_folder]
CFLAGS=-std=c99 pip3 install numpy==1.17.0
python3 setup.py install
epcy -h
Usage:
General:
From source:
cd [your_epcy_folder]
python3 -m epcy -h
After setup install:
epcy -h
Generic case:
EPCY is design to work on any quantitative data, provided that values of each feature are comparable between each samples (normalized).
To run a comparative analysis, epcy pred need two tabulated files:
# Run epcy on any normalized quantification data
epcy pred -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_subgroup
# If your data require a log2 transforamtion, add --log
epcy pred --log -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_subgroup
Result will be saved in prediction_capability.xls file, which is detail below.
You can personalize the design file using –subgroup –query
epcy pred_rna -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/subgroup2 --subgroup subgroup2 --query A
Working on RNA sequencing readcounts:
To run EPCY on readcounts not mormalized use pred_rna tool as follow:
# To run on read count not normalized, add --cpm --log
epcy pred_rna --cpm --log -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_subgroup
Working on kallisto quantification:
EPCY allow to work directly on kallisto quantificaion using h5 files, to have access to bootstrapped samples. To do so, a kallisto column need to be add to the design file (to specify the directory path where to find abundant.h5 file for each sample) and epcy pred_rna need to run as follow:
# To run on kallisto quantification, add --kall (+ --cpm --log)
epcy pred_rna --kal --cpm --log -d ./data/small_leucegene/5_inv16_vs_5/design.tsv -o ./data/small_leucegene/5_inv16_vs_5/
# !!! Take care kallisto quantification is on transcript not on gene
To run on gene level, a gff3 file of the genome annotation is needed, to have the correspondence between transcript and gene. This file can be download on ensembl
# To run on kallisto quantification and gene level, add --gene --anno [file.gff] (+ --kall --cpm --log)
epcy pred_rna --kal --cpm --log --gene --anno ./data/small_genome/Homo_sapiens.GRCh38.84.reduce.gff3 -d ./data/small_leucegene/5_inv16_vs_5/design.tsv -o ./data/small_leucegene/5_inv16_vs_5/
kallisto quantification allow to work on TPM:
# work on TPM, replace --cpm by --tpm
epcy pred_rna --kal --tpm --log --gene --anno ./data/small_genome/Homo_sapiens.GRCh38.84.reduce.gff3 -d ./data/small_leucegene/5_inv16_vs_5/design.tsv -o ./data/small_leucegene/5_inv16_vs_5/
Output:
predictive_capability.xls
This file is the main output which contain the evaluation of each features (genes, proteins, …). It’s a tabulated files 9 columns:
Default columns:
id: the id of each feature.
l2fc: log2 Fold change.
kernel_mcc: Matthews Correlation Coefficient (MCC) compute by a predictor using KDE.
kernel_mcc_low, kernel_mcc_high: boundaries of confidence interval (90%).
mean_query: mean(values) of samples specify as Query in design.tsv
mean_ref: mean(values) of samples specify as Ref in design.ts
bw_query: Estimate bandwidth used by KDE, to calculate the density of query samples
bw_ref: Estimate bandwidth used by KDE, to calculate the density of ref samples
Using –normal:
Using –auc –utest:
auc: Area Under the Curve
u_pv: pvalue compute by a MannWhitney rank test
Using –ttest:
t_pv: pvalue compute by ttest_ind
subgroup_predicted.xls
Using –full a secondary output file (subgroup_predicted.xls) specify for each features if the sample as been correctly predicted. Build an heatmap with this output could help you to explore your data. More details coming soon.
Bagging:
To improve the stability and accuracy of MCC computed, you can add n bagging (using -b n)
#Take care, it's take n time more longer!!!, use multiprocess (-t) seems a good idea :).
epcy pred_rna -b 4 -t 4 --cpm --log -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_subgroup
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file epcy-0.0.1.tar.gz.
File metadata
- Download URL: epcy-0.0.1.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.7.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f2dab337ce61edc1cfad9198051c7fe66361bcde897d8c6da86feb6202dce0f
|
|
| MD5 |
4c780e2b92a758ae013a6ec3d65a3777
|
|
| BLAKE2b-256 |
6dcb5ae29e137d392eb31c6c53fa2d8275e6e0ab5f0206efd670b10fff0b81f3
|
File details
Details for the file epcy-0.0.1-py3.7.egg.
File metadata
- Download URL: epcy-0.0.1-py3.7.egg
- Upload date:
- Size: 75.1 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.7.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49ffaad2d337f061e4e2d2a7fc150272001ce58d40903cb2e04fa1e765dd23fd
|
|
| MD5 |
cef2350c90192340bb1e3dcc1d7d91e4
|
|
| BLAKE2b-256 |
3756c45472b7e524f988054873d286a9ec0bb9d9278ca11a3481c9572ec822d7
|
File details
Details for the file epcy-0.0.1-py3-none-any.whl.
File metadata
- Download URL: epcy-0.0.1-py3-none-any.whl
- Upload date:
- Size: 33.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.7.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b27eb112b20c3052c6b1bce633d1cefad0207f4c070e61c66bbacb45d8fbe779
|
|
| MD5 |
fb1066dfda5d0cee6f61eea38bfad5ec
|
|
| BLAKE2b-256 |
ad77aa9e55d3b4692c319d673885fe82f9d712d892d023095f36ccec1c765ed4
|