Skip to main content

No project description provided

Project description

online-BEAST

PyPi tests cov

This command line tool can be used to add sequences to an ongoing analysis in BEAST2. This framework is called online Bayesian phylodynamic inference (see Gill et al., 2020).

Install

Install online-beast with pip (requires python -V >= 3.6.2).

pip install online-beast

Usage

Give online-beast beast the path to a XML file from a previous BEAST2 run (i.e. one that has an associated .state file) and a fasta file of sequences to add to the analysis. Sequences in the fasta file must be aligned (i.e. to the sequences in the XML file) and the same length as the other sequences in the XML file. Only new sequences (new descriptors) will be added to the analysis, so new sequences can be append to the fasta file as they are acquired.

online-beast data/testGTR.xml data/samples.fasta

The new sequences will by added to the XML file and the associated .state file (produced automatically by BEAST2).

The analysis can then be resumed (with the additional sequence data) using the BEAST2 resume flag.

beast -resume testGTR.xml

The online analysis can be visualised in real-time using Beastiary. The jumps in the trace show where new sequences have been added.

Date trait data will be automatically parsed. The format of the date trait data (in the fasta descriptor) can be set with the --date-format (default %Y-%m-%d) and --delimiter (default _) flags. If there is no date trait in the xml use the --no-date-trait flag.

online-beast data/ebola.xml data/ebola.fasta --dateformat %d/%m/%Y --date-delimiter _

If there is trait data in the XML file you need to specify how to extract it from the fasta descriptor line using the --trait flag. The format is 'traitname delimiter group' e.g. a string separated by spaces. For example to get the location trait from sample_wuhan_2022-04-05 you would use --trait 'location _ 1'. The --trait flag can be used multiple times to specify multiple traits.

online-beast covid.xml data/covid.fasta --trait 'location _ 1'

By default the new sequences will be appended to the input XML and state files. Output file names can be specified using the --output flag. This will also create a new .state file.

online-beast testGTR.xml samples.fasta --output new_testGTR.xml 

If you use the BEAST2 -statefile flag to specify the filename of the state (i.e. it is not xml_filename + .state). Use the flag --state-file to specify the state file path.

online-beast testGTR.xml samples.fasta --state-file beast.state 

Explanation

A Markov chain started anywhere near the center of the stationary distribution needs no burn-in (Geyer 2011). Online Bayesian phylodynamic inference is akin to transfer learning in the deep learning field. By starting our MCMC with reasonable states (obtained from a previous run) we reduce the amount of optimisation (burn-in) that must be performed to reach convergence.

Online-beast loosely follows the implementation of Gill et al., 2020 for BEAST1. However, most of the implementation of online-beast is handled by the default state system in BEAST2. New sequences are added from the fasta file one at a time. The hamming distance is calculated between the new sequence and all the other sequences in the XML file. The new sequence is grafted onto the tree in the .state file, half way along the branch of the closest sequence in the XML file. The new sequence is append to the BEAST XML file.

Ebola example

In this example we will make use of a publicly available dataset of sequences from the 2013-2016 Zaire ebolavirus outbreak in Sierra Leone.

In the data/ folder you'll find a ebola.xml file and several fasta files that contain sequences from the outbreak broken up by date. The script below will run an online Bayesian phylodynamic analysis adding new sequences after each run finishes.

#!/bin/bash

# Run beast with initial samples
beast data/ebola.xml 
# Update analysis with new samples
online-beast data/ebola.xml data/ebola1.fasta --date-format "%d/%m/%Y" --state-file ebola.xml.state --output ebola.xml
# Resume the analysis
beast -resume ebola.xml 
# Update analysis with new samples
online-beast ebola.xml data/ebola2.fasta --date-format "%d/%m/%Y" --output ebola.xml
# Resume the analysis
beast -resume ebola.xml 
# Update analysis with new samples
online-beast ebola.xml data/ebola3.fasta --date-format "%d/%m/%Y" --output ebola.xml
# Resume the analysis
beast -resume ebola.xml 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

online-beast-0.7.6.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

online_beast-0.7.6-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file online-beast-0.7.6.tar.gz.

File metadata

  • Download URL: online-beast-0.7.6.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure

File hashes

Hashes for online-beast-0.7.6.tar.gz
Algorithm Hash digest
SHA256 600d1eddab689c6b5a44b78bb99e1bdc4ca5942327154ddee9eba30bbe00fe20
MD5 f664cc2b2f967ea1467680d4e6bed1ad
BLAKE2b-256 1dbfde95e9ee31ec6530d8771830a2cdbc15b81d3fbead6c11c724d2f98d7cf6

See more details on using hashes here.

File details

Details for the file online_beast-0.7.6-py3-none-any.whl.

File metadata

  • Download URL: online_beast-0.7.6-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure

File hashes

Hashes for online_beast-0.7.6-py3-none-any.whl
Algorithm Hash digest
SHA256 8342ca815a4309f5a009c47d6538011041a39c835e2a151c7c2056c71f90e1bb
MD5 e54aef83132cffa181b6d33532ca6a3d
BLAKE2b-256 63c75791dd9359656074e0e91d12bcfc2f87f2e3826657b53ee3e878f5553fb4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page