Generate an appropriate data tag to add constant sites to your BEAST2 XML
Project description
Introduction
Based on this suggestion by Remco, we can correctly account for constant sites in a BEAST2 analysis by adding the following data tag below your current data tag:
<data id='xyz' spec='FilteredAlignment' filter='-' data='@xyzOriginal' constantSiteWeights='100 200 300 400'/>
This assumes that your original <data> tag had id=xyz and was renamed to id=xyzOriginal, and that you have 1000 constant sites that were removed from the alignment, with:
100 As
200 Cs
300 Gs
400 Ts
What does this do?
This script will take a FASTA file with a single DNA sequence (e.g., a bacterial chromosome), and a VCF file containing the position of SNPs along the FASTA file (e.g., as outputted from snippy-core) and will output the <data> tag ready to copy paste into your XML file.
It will optionally also take a BED file with positions to mask (e.g., positions of phage).
How to install?
pip3 install b2constsites
How to run it?
If installed correctly, a scripted called run_b2cs should be available in your path:
run_b2cs --help
At a minimum, you need to supply a sequence file with your reference (assuming it has a single chromosome entry — this was designed for bacterial genomics, but may work with viral too), and a VCF file with variants.
run_b2cs myref.fasta myvar.vcf
The data tag will be printed to screen. The tag then should be cut and pasted into your XML file just below the original data tab. You should take note to modify the data tags as per the Introduction above.
ASSUMPTIONS
This script will:
Only take in to account SNPs and MNPs annotated in the VCF. Other variant types will be ignored.
Will only take into consideration A, C, G, and T bases in your reference sequence. All other characters will be ignored.
The output will be, therefore, an approximation. However, it should be a close enough approximation that it will provide a better inference from BEAST2 than if once uses only variable sites, and then corrects in some post hoc manner.
Issues or Questions
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for b2constsites-0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f245542ca654bbef92570d2d50c808676243d391839529714a6c6de9badd5c50 |
|
MD5 | 85322f259fb7dbccb88cf1e031a196ac |
|
BLAKE2b-256 | 145ee1178544a3e414234c457f7375ba196da085ecbfc5aea6a89a1e506b10a7 |