Tool for the estimation of the difficulty of phylogenetic placements
Project description
BAD: Bold Assertor of Difficulty
Description
BAD is a python tool for predicting the difficulty of phylogenetic placements. BAD uses RAxML-NG outputs as input. It requires a RAxML-NG installation. It was trained on empirical datasets from TreeBASE and can use both AA and DNA data. The output of BAD is a score between 0 (easy) and 1 (difficult). BAD provides an explanation of its prediction using the Shapley values implementation SHAP (Github, Paper).
Installation
Using pip
pip install bad-phylo
Usage Example
A simple command line call of BAD looks like this:
bad -msa /test/example.fasta -tree /test/example.bestTree -model /test/example.bestModel -query /test/query.fasta -o test_bad
This command will use the MSA and query file in fasta format, and the best tree inferred with RAxML-NG as well as the model. It will compute features from all four data sources and predict the placement difficulties for each taxon in the query file. All output files will be stored in an output folder called test_bad in the current directory. BAD will summarize the explanations for the prediction in the command line. For further details, please look at the SHAP summary plots or the bad.log file in the output folder.
Before interpreting the explanations provided by BAD, please make sure you know how to properly interpret Shapley values. Easy to understand introduction to Shapley values: https://christophm.github.io/interpretable-ml-book/shapley.html
Please keep in mind that BAD requires an installation of RAxML-NG. By default, it uses the command raxml-ng
.
If your RAxML-NG installation is not part of the PATH variable, you can specify the path to the RAxML-NG binary file with the parameter -raxmlng PATH_TO_RAXMLNG
.
References
-
A. M. Kozlov, D. Darriba, T. Flouri, B. Morel, and A. Stamatakis (2019) RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference Bioinformatics, 35(21): 4453–4455. https://doi.org/10.1093/bioinformatics/btz305
-
S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 4768–4777. Curran Associates Inc., 2017. ISBN 9781510860964. https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bad-phylo-0.1.0.tar.gz
.
File metadata
- Download URL: bad-phylo-0.1.0.tar.gz
- Upload date:
- Size: 1.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28980cd95cb046b0d03ed95746fa80d6e8c5f97ac01f729eb5eedeff8cec3175 |
|
MD5 | 8ef3ca57ded695e55866ac8c2a885413 |
|
BLAKE2b-256 | 95872bab13b5859c2ca72a18b9bd97a6504d53fe41d75ab5313b1e4baf721b32 |
File details
Details for the file bad_phylo-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: bad_phylo-0.1.0-py3-none-any.whl
- Upload date:
- Size: 1.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59f3c9c482e932fd44fe5d154885c421e82c6359f29e12d79056a12b6d108bdf |
|
MD5 | c90653f59b6161773b17dcc6d6be4673 |
|
BLAKE2b-256 | 0f9e91dbf1eea530275aecb570d7dda8bd8ca404ed8e909f52f1fc078781f41a |