Phylodynamic paramater and model inference using pretrained deep neural networks
Project description
PhyloDeep
PhyloDeep is a python library for parameter estimation and model selection from phylogenetic trees, based on deep learning.
Installation
Use the package pip to install PhyloDeep.
Usage
We recommend to perform a priori model adequacy first to assess whether the input data resembles well the simulations on which the neural networks were trained.
###Python
import phylodeep
from phylodeep import BD, BDEI, BDSS, SUMSTATS, FULL
path_to_tree = './Zurich.trees'
# set presumed sampling probability
sampling_proba = 0.25
# a priori check for models BD, BDEI, BDSS
model_BD_vs_BDEI = phylodeep.checkdeep(path_to_tree, model=BDSS)
# model selection
model_BDEI_vs_BD_vs_BDSS = phylodeep.modeldeep(path_to_tree, sampling_proba, vector_representation=FULL)
# the selected model is BDSS
# parameter inference
param_BDSS = phylodeep.paramdeep(path_to_tree, sampling_proba, model=BDSS, vector_representation=FULL,
ci_computation=True)
###Command line
# we use here a tree of 200 tips
# a priori model adequacy check: highly recommended
checkdeep -t ./Zurich.trees -m BD -o BD_model_adequacy.png
checkdeep -t ./Zurich.trees -m BDEI -o BDEI_model_adequacy.png
checkdeep -t ./Zurich.trees -m BDSS -o BDSS_model_adequacy.png
# model selection
modeldeep -t ./Zurich.trees -p 0.25 -v CNN_FULL_TREE -o model_selection.csv
# parameter inference
paramdeep -t ./Zurich.trees -p 0.25 -m BDSS -v CNN_FULL_TREE -o HIV_Zurich_BDSS_CNN.csv
paramdeep -t ./Zurich.trees -p 0.25 -m BDSS -v FFNN_SUMSTATS -o HIV_Zurich_BDSS_FFNN_CI.csv -c
###Example of output and interpretations Here, we use an HIV tree reconstructed from 200 sequences, published in Phylodynamics on local sexual contact networks by Rasmussen et al in PloS Computational Biology in 2017, and that you can find at github
The a priori model adequacy check results in the following figures:
BD model adequacy test
BDEI model adequacy test
BDSS model adequacy test
For the three models (BD, BDEI and BDSS), HIV tree datapoint (represented by a red star) is well inside the data cloud of simulations, where warm colors correspond to high density of simulations. The simulations and HIV tree datapoint were in the form of summary statistics prior to applying PCA. All three models thus pass the model adequacy check.
We then apply model selection using the full tree representation and obtain the following result:
Model | Probability BDEI | Probability BD | Probability BDSS ------------- | ------------- | ------------- Predicted probability | 0.00 | 0.00 | 1.00
The BDSS probability is by far the highest: it is the BDSS model that is confidently selected
Finally, under the selected model BDSS, we predict parameter values together with 95% CIs:
-------------- | R naught | Infectious period | X transmission | Superspreading fraction
------------- | ------------- | ------------- | -------------
predicted value | 1.69 | 9.78 | 9.34 | 0.079
CI 2.5% | 1.40 | 8.12 | 6.65 | 0.050
CI 97.5% | 2.08 | 12.26 | 10 | 0.133
The point estimates for parameters that are no time related (R naught, X transmission and Superspreading fraction) are well inside the parameter ranges of simulations and thus seem valid.
Citation
Contributing
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for phylodeep-0.1.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b6ea2c6d9cb77427ba300ab20c6ed7052f00b246918e6e18b3185c113a68608 |
|
MD5 | 8102c9eaa855e07d6b3436b769c38ee3 |
|
BLAKE2b-256 | ad94c6c503db22e68babafd5a92c9e678b7ca48d699e5cb8386a6fb7a88e7de6 |