Skip to main content

Evaluattion of Predictive CapabilitY for ranking biomarker candidates.

Project description

https://img.shields.io/badge/python-3.6-blue.svg https://travis-ci.org/iric-soft/epcy.svg?branch=master https://codecov.io/gh/iric-soft/epcy/branch/master/graph/badge.svg

Citing:

Introduction:

This tool was developed to Evaluate Predictive CapabilitY of each feature to become a biomarker candidates.

Requirements:

  • python3

  • (Optional) virtualenv

Install:

python3 -m venv $HOME/.virtualenvs/epcy
source $HOME/.virtualenvs/epcy/bin/activate
cd [your_epcy_folder]
CFLAGS=-std=c99 pip3 install numpy==1.17.0
python3 setup.py install
epcy -h

Usage:

General:

From source:

cd [your_epcy_folder]
python3 -m epcy -h

After setup install:

epcy -h

Generic case:

  • EPCY is design to work on any quantitative data, provided that values of each feature are comparable between each samples (normalized).

  • To run a comparative analysis, epcy pred need two tabulated files:

    • A matrix of quantitative normalized data for each samples (column) with an “ID” column to identify each feature.

    • A design table which describe the comparison.

# Run epcy on any normalized quantification data
epcy pred -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_subgroup
# If your data require a log2 transforamtion, add --log
epcy pred --log -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_subgroup
  • Result will be saved in prediction_capability.xls file, which is detail below.

  • You can personalize the design file using –subgroup –query

epcy pred_rna -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/subgroup2 --subgroup subgroup2 --query A

Working on RNA sequencing readcounts:

  • To run EPCY on readcounts not mormalized use pred_rna tool as follow:

# To run on read count not normalized, add --cpm --log
epcy pred_rna --cpm --log -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_subgroup

Working on kallisto quantification:

  • EPCY allow to work directly on kallisto quantificaion using h5 files, to have access to bootstrapped samples. To do so, a kallisto column need to be add to the design file (to specify the directory path where to find abundant.h5 file for each sample) and epcy pred_rna need to run as follow:

# To run on kallisto quantification, add --kall (+ --cpm --log)
epcy pred_rna --kal --cpm --log -d ./data/small_leucegene/5_inv16_vs_5/design.tsv -o ./data/small_leucegene/5_inv16_vs_5/
# !!! Take care kallisto quantification is on transcript not on gene
  • To run on gene level, a gff3 file of the genome annotation is needed, to have the correspondence between transcript and gene. This file can be download on ensembl

# To run on kallisto quantification and gene level, add --gene --anno [file.gff] (+ --kall --cpm --log)
epcy pred_rna --kal --cpm --log --gene --anno ./data/small_genome/Homo_sapiens.GRCh38.84.reduce.gff3 -d ./data/small_leucegene/5_inv16_vs_5/design.tsv -o ./data/small_leucegene/5_inv16_vs_5/
  • kallisto quantification allow to work on TPM:

# work on TPM, replace --cpm by --tpm
epcy pred_rna --kal --tpm --log --gene --anno ./data/small_genome/Homo_sapiens.GRCh38.84.reduce.gff3 -d ./data/small_leucegene/5_inv16_vs_5/design.tsv -o ./data/small_leucegene/5_inv16_vs_5/

Output:

predictive_capability.xls

This file is the main output which contain the evaluation of each features (genes, proteins, …). It’s a tabulated files 9 columns:

  • Default columns:

    • id: the id of each feature.

    • l2fc: log2 Fold change.

    • kernel_mcc: Matthews Correlation Coefficient (MCC) compute by a predictor using KDE.

    • kernel_mcc_low, kernel_mcc_high: boundaries of confidence interval (90%).

    • mean_query: mean(values) of samples specify as Query in design.tsv

    • mean_ref: mean(values) of samples specify as Ref in design.ts

    • bw_query: Estimate bandwidth used by KDE, to calculate the density of query samples

    • bw_ref: Estimate bandwidth used by KDE, to calculate the density of ref samples

  • Using –normal:

    • normal_mcc: MCC compute a predictor using normal distributions.

  • Using –auc –utest:

    • auc: Area Under the Curve

    • u_pv: pvalue compute by a MannWhitney rank test

  • Using –ttest:

subgroup_predicted.xls

Using –full a secondary output file (subgroup_predicted.xls) specify for each features if the sample as been correctly predicted. Build an heatmap with this output could help you to explore your data. More details coming soon.

Bagging:

To improve the stability and accuracy of MCC computed, you can add n bagging (using -b n)

#Take care, it's take n time more longer!!!, use multiprocess (-t) seems a good idea :).
epcy pred_rna -b 4 -t 4 --cpm --log -d ./data/small_for_test/design.tsv -m ./data/small_for_test/exp_matrix.tsv -o ./data/small_for_test/default_subgroup

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epcy-0.0.1.tar.gz (26.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

epcy-0.0.1-py3.7.egg (75.1 kB view details)

Uploaded Egg

epcy-0.0.1-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file epcy-0.0.1.tar.gz.

File metadata

  • Download URL: epcy-0.0.1.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.7.2

File hashes

Hashes for epcy-0.0.1.tar.gz
Algorithm Hash digest
SHA256 2f2dab337ce61edc1cfad9198051c7fe66361bcde897d8c6da86feb6202dce0f
MD5 4c780e2b92a758ae013a6ec3d65a3777
BLAKE2b-256 6dcb5ae29e137d392eb31c6c53fa2d8275e6e0ab5f0206efd670b10fff0b81f3

See more details on using hashes here.

File details

Details for the file epcy-0.0.1-py3.7.egg.

File metadata

  • Download URL: epcy-0.0.1-py3.7.egg
  • Upload date:
  • Size: 75.1 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.7.2

File hashes

Hashes for epcy-0.0.1-py3.7.egg
Algorithm Hash digest
SHA256 49ffaad2d337f061e4e2d2a7fc150272001ce58d40903cb2e04fa1e765dd23fd
MD5 cef2350c90192340bb1e3dcc1d7d91e4
BLAKE2b-256 3756c45472b7e524f988054873d286a9ec0bb9d9278ca11a3481c9572ec822d7

See more details on using hashes here.

File details

Details for the file epcy-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: epcy-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.7.2

File hashes

Hashes for epcy-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b27eb112b20c3052c6b1bce633d1cefad0207f4c070e61c66bbacb45d8fbe779
MD5 fb1066dfda5d0cee6f61eea38bfad5ec
BLAKE2b-256 ad77aa9e55d3b4692c319d673885fe82f9d712d892d023095f36ccec1c765ed4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page