Skip to main content

Generate an appropriate data tag to add constant sites to your BEAST2 XML

Project description

Build Status

Introduction

Based on this suggestion by Remco, we can correctly account for constant sites in a BEAST2 analysis by adding the following data tag below your current data tag:

<data id="xyz" spec="FilteredAlignment" filter="-" data="@xyzOriginal" constantSiteWeights="100 200 300 400"/>

This assumes that your original <data> tag had id=xyz and was renamed to id=xyzOriginal, and that you have 1000 constant sites that were removed from the alignment, with:

  • 100 As

  • 200 Cs

  • 300 Gs

  • 400 Ts

What does this do?

This script will take a FASTA file with a single DNA sequence (e.g., a bacterial chromosome), a VCF file containing the position of SNPs along the FASTA file (e.g., as outputted from snippy-core) and the XML file produced by BEAUTi containing only variable sites. It will output a new XML file named <original_xml_name>_plus_const.xml with the added information to account for constant sites. There is nothing else you need to do but run BEAST2.

It will optionally also take a BED file with positions to mask (e.g., positions of phage).

How to install?

pip3 install b2constsites

How to run it?

If installed correctly, a scripted called run_b2cs should be available in your path:

run_b2cs --help

At a minimum, you need to supply a sequence file with your reference (assuming it has a single chromosome entry — this was designed for bacterial genomics, but may work with viral too), a VCF file with variants, and the XML output from BEAUTi.

run_b2cs myref.fasta myvar.vcf myxml.xml

A new file called myxml_plus_const.xml will be created in the same folder as myxml.xml.

ASSUMPTIONS and CAVEATS

This script will:

  • Only take in to account SNPs and MNPs annotated in the VCF. Other variant types will be ignored.

  • Will only take into consideration A, C, G, and T bases in your reference sequence. All other characters will be ignored.

  • Has not been tested with BEAST1.8, and as far as I know it will not work with that version of BEAST. This was designed for use with BEAST2.

The output will be, therefore, an approximation. However, it should be a close enough approximation that it will provide a better inference from BEAST2 than if one uses only variable sites, and then corrects in some post hoc manner.

Authors

Anders Gonçalves da Silva
Sarah Baines
Jean Lee
Torsten Seemann

Maintainer

Anders Gonçalves da Silva

Issues or Questions

GitHub Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

b2constsites-0.3.3.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

b2constsites-0.3.3-py3-none-any.whl (43.0 kB view details)

Uploaded Python 3

File details

Details for the file b2constsites-0.3.3.tar.gz.

File metadata

  • Download URL: b2constsites-0.3.3.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.19.1

File hashes

Hashes for b2constsites-0.3.3.tar.gz
Algorithm Hash digest
SHA256 e93526b106a72a4c6f537c0cfa7c92ec55f80aca4bc293efd1708c1209b3002e
MD5 729c574ede1de074221bdd8ace863765
BLAKE2b-256 1f8389c7683e5ace3eac6c8483a0588092ea73aa09a888e9d983fb3f540f2037

See more details on using hashes here.

File details

Details for the file b2constsites-0.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for b2constsites-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3908073d5b7e333e79da7607a0d06cf9d150ed0e307a0f837d4964f7c008b287
MD5 1e3cf34b61b9a54944fe68c335765585
BLAKE2b-256 4e2c9cea51ca203bd1f2040eedb6f780f390f7db4f6b1e0af824335a9cb8079a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page