Skip to main content

An enhanced genomic variant classifier

Project description

PolyBoost

Description

PolyBoost is a post-analysis tool for the batch processing output of PolyPhen-2 that replaces the naive Bayes classifier with an extreme gradient boosting XGBoost classifier.

(Note: I am not affiliated with the PolyPhen-2 group).

Citation

PolyBoost: An enhanced genomic variant classifier using extreme gradient boosting is currently under peer review. If, for some reason, you need to cite this in the meantime, please contact me.

QuickStart

You will need to install PolyBoost and obtain the batch mode output from PolyPhen2 predictions (http://genetics.bwh.harvard.edu/pph2/bgi.shtml). An example of the batch mode output is found in polyphen2-example.txt in this repository.

Install PolyBoost with into Python 3.7 using:

pip3 install polyboost

If you are using Windows, xgboost (a dependency) probably cannot be installed from PyPI. If you get an error message, follow the instructions for installation of XGBoost below and try this command again.

After installation, run PolyBoost as follows:

python -m polyboost.polyboost [PolyPhen2 Output File]

Where [PolyPhen2 Output File] is the path to the batch mode output from PolyPhen-2. Make sure the PolyPhen2 input file is in your working directory (i.e. the directory you are running that command from).

Example:

python -m polyboost.polyboost polyphen2-example.txt

On some systems with multiple python distributions, you may need to use python3 (or python3.7) instead of "python" to use the correct version of python.

Installation

Requirements

PolyBoost requires Python 3.7, xgboost, numpy and scipy. PolyBoost will attempt to install all of these dependencies automatically. However, XGBoost may not be automatically installed because installation from PyPi does not work reliably on Windows at time of release. If an error occurs, install XGBoost (see below) before installing PolyBoost.

Use of a Python virtualenv is recommended, but not required.

Python 3.7

You will need to install Python 3.7 through standard methods.

XGBoost

Installation of XGBoost is ostensibly as easy as:

pip3 install xgboost

I found, however, that I could not download this from the PyPi repository on Windows. In this case, you can download the XGBoost python wheel from this location and install it like:

pip3 install xgboost-{version}-{pythonversion}-{architecture}.whl

Example: I installed XGBoost 0.90 for Python 3.7 (32-bit), so I used:

pip3 install xgboost-0.90-cp37-cp37m-win32.whl

I used the 32-bit module even though I am using 64-bit Windows because Python 3.7 was installed in 32 bit mode on my computer.

If you have difficulty, detailed instructions for installing XGBoost for your platform can be found here: https://xgboost.readthedocs.io/en/latest/build.html

PolyBoost

Install PolyBoost with:

pip3 install polyboost

Numpy and Scipy

You should not need to install numpy and scipy manually, but you can do so with:

pip3 install numpy scipy

Options

Number of Threads (--threads)

You can specify the number of threads to run predictions. You must choose between 1 and 16 threads. If you make no selection, the default is to use 4 threads.

Example using 8 threads:

python -m polyboost.polyboost polyphen2-example.txt --threads 8

Threshold (--threshold)

You can manually choose a threshold between binary classification of "benign" and "damaging". The default choice is 0.5660484. This default was determined during classifier development by maximizing the Youden index (sensitivity + specificity - 1) of the receiver operating characteristic (ROC) curve on an external validation dataset.

Example using a threshold value of 0.25:

python3 -m polyboost.polyboost polyphen2-example.txt --threshold 0.25 

Output (--out)

By default, PolyBoost outputs to the console (standard output). You can optionally output to a file using --out.

Example redirecting to output.txt

python -m polyboost.polyboost polyphen2-example.txt --out output.txt

Output Example

o_acc   o_pos   o_aa1   o_aa2   polyboost_score     polyboost_prediction
P26439  186     P       L       0.35185128          benign
P26439  205     L       P       0.09412336          benign
P26439  213     S       G       0.37042004          benign
P26439  216     K       E       0.60328233          damaging
P26439  222     P       H       0.06907171          benign
P26439  222     P       Q       0.39627028          benign
P26439  222     P       T       0.20633507          benign
P26439  236     L       S       0.7706197           damaging
P26439  245     A       P       0.17939752          benign
P26439  253     Y       N       0.044733346         benign
P26439  254     Y       D       0.2756629           benign
P26439  259     T       M       0.027224064         benign

Questions?

Please e-mail me with questions. I will do my best to respond.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polyboost-1.1.0.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

polyboost-1.1.0-py3-none-any.whl (5.7 MB view details)

Uploaded Python 3

File details

Details for the file polyboost-1.1.0.tar.gz.

File metadata

  • Download URL: polyboost-1.1.0.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.7.4

File hashes

Hashes for polyboost-1.1.0.tar.gz
Algorithm Hash digest
SHA256 b2d75ea0ceb6f310f92162663a704d78d9dcb22cedae6752648a7512c3860016
MD5 578f42b626c3ed92805d71bfea507115
BLAKE2b-256 bde182cc6b73280fea7ed81bdbfe0e9dc7e681c455c8608b039bd0c675137c81

See more details on using hashes here.

File details

Details for the file polyboost-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: polyboost-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.7.4

File hashes

Hashes for polyboost-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 23c5d553c5b955f1d4c387c301b3f43c2e0b5bb848bfd7700611b9cc839b048d
MD5 9e654b7ea7ae70a7bf5e3f57ce1279b4
BLAKE2b-256 680539faec9cee74e53eeb2737a3d3add475ef6c8d298140c6e9d4101a2c6a6a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page