No project description provided
Project description
online-BEAST
This command line tool can be used to add sequences to an ongoing analysis in BEAST2. This framework is called online Bayesian phylodynamic inference (see Gill et al., 2020).
Install
Install online-beast
with pip (requires python -V >= 3.6.2).
pip install online-beast
Usage
Give online-beast
beast the path to a XML file from a previous BEAST2 run (i.e. one that has an associated .state
file) and a fasta file of sequences to add to the analysis. Sequences in the fasta file must be aligned (i.e. to the sequences in the XML file) and the same length as the other sequences in the XML file. Only new sequences (new descriptors) will be added to the analysis, so new sequences can be append to the fasta file as they are acquired.
online-beast data/testGTR.xml data/samples.fasta
The new sequences will by added to the XML file and the associated .state
file (produced automatically by BEAST2).
The analysis can then be resumed (with the additional sequence data) using the BEAST2 resume flag.
beast -resume testGTR.xml
The online analysis can be visualised in real-time using Beastiary. The jumps in the trace show where new sequences have been added.
Date trait data will be automatically parsed. The format of the date trait data (in the fasta descriptor) can be set with the --date-format
(default %Y-%m-%d
) and --delimiter
(default _
) flags. If there is no date trait in the xml use the --no-date-trait
flag.
online-beast data/ebola.xml data/ebola.fasta --dateformat %d/%m/%Y --date-delimiter _
If there is trait data in the XML file you need to specify how to extract it from the fasta descriptor line using the --trait
flag. The format is 'traitname delimiter group'
e.g. a string separated by spaces. For example to get the location
trait from sample_wuhan_2022-04-05
you would use --trait 'location _ 1'
. The --trait
flag can be used multiple times to specify multiple traits.
online-beast covid.xml data/covid.fasta --trait 'location _ 1'
By default the new sequences will be appended to the input XML and state files. Output file names can be specified using the --output
flag. This will also create a new .state
file.
online-beast testGTR.xml samples.fasta --output new_testGTR.xml
If you use the BEAST2 -statefile
flag to specify the filename of the state (i.e. it is not xml_filename + .state
). Use the flag --state-file
to specify the state file path.
online-beast testGTR.xml samples.fasta --state-file beast.state
Explanation
A Markov chain started anywhere near the center of the stationary distribution needs no burn-in (Geyer 2011). Online Bayesian phylodynamic inference is akin to transfer learning in the deep learning field. By starting our MCMC with reasonable states (obtained from a previous run) we reduce the amount of optimisation (burn-in) that must be performed to reach convergence.
Online-beast loosely follows the implementation of Gill et al., 2020 for BEAST1. However, most of the implementation of online-beast is handled by the default state system in BEAST2. New sequences are added from the fasta file one at a time. The hamming distance is calculated between the new sequence and all the other sequences in the XML file. The new sequence is grafted onto the tree in the .state
file, half way along the branch of the closest sequence in the XML file. The new sequence is append to the BEAST XML file.
Ebola example
In this example we will make use of a publicly available dataset of sequences from the 2013-2016 Zaire ebolavirus outbreak in Sierra Leone.
In the data/
folder you'll find a ebola.xml
file and several fasta files that contain sequences from the outbreak broken up by date. The script below will run an online Bayesian phylodynamic analysis adding new sequences after each run finishes.
#!/bin/bash
# Run beast with initial samples
beast data/ebola.xml
# Update analysis with new samples
online-beast data/ebola.xml data/ebola1.fasta --date-format "%d/%m/%Y" --state-file ebola.xml.state --output ebola.xml
# Resume the analysis
beast -resume ebola.xml
# Update analysis with new samples
online-beast ebola.xml data/ebola2.fasta --date-format "%d/%m/%Y" --output ebola.xml
# Resume the analysis
beast -resume ebola.xml
# Update analysis with new samples
online-beast ebola.xml data/ebola3.fasta --date-format "%d/%m/%Y" --output ebola.xml
# Resume the analysis
beast -resume ebola.xml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file online-beast-0.7.6.tar.gz
.
File metadata
- Download URL: online-beast-0.7.6.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 600d1eddab689c6b5a44b78bb99e1bdc4ca5942327154ddee9eba30bbe00fe20 |
|
MD5 | f664cc2b2f967ea1467680d4e6bed1ad |
|
BLAKE2b-256 | 1dbfde95e9ee31ec6530d8771830a2cdbc15b81d3fbead6c11c724d2f98d7cf6 |
File details
Details for the file online_beast-0.7.6-py3-none-any.whl
.
File metadata
- Download URL: online_beast-0.7.6-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8342ca815a4309f5a009c47d6538011041a39c835e2a151c7c2056c71f90e1bb |
|
MD5 | e54aef83132cffa181b6d33532ca6a3d |
|
BLAKE2b-256 | 63c75791dd9359656074e0e91d12bcfc2f87f2e3826657b53ee3e878f5553fb4 |