Skip to main content

Integrated pipeline for ML phylogenetic inference from ABI trace and FASTA data

Project description

AB12PHYLO

PyPI license github version PyPI Python version

AB12PHYLO is an integrated, easy-to-use pipeline for Maximum Likelihood (ML) phylogenetic tree inference from ABI traces and FASTA data. At its core, AB12PHYLO runs parallelized instances of RAxML-NG (Kozlov et al. 2019) or IQ-Tree (Nguyen et al. 2015) as well as a BLAST search in a reference database. It enables visual, effortless sample identification based on phylogenetic position and sequence similarity, as well as population subset selection aided by metrics like Tajima's D for estimations of ongoing evolution, or definition of haplotypes.

demo screen capture of AB12PHYLO

Documentation

There are two versions of AB12PHYLO, both started from a terminal: ab12phylo as a graphical user interface intended to be more user-friendly and intuitive, and in some details more powerful than ab12phylo-cmd. This version, on the other hand, is a commandline-only tool for maximum reproducibility and automation of a linear pipeline.
While ab12phylo comes with its own on-screen help, and a very brief example for ab12phylo-cmd is provided below, detailed installation and usage instructions can be found in the github wiki. Especially for the commandline ab12phylo-cmd, also check the in-line help via ab12phylo-cmd -h.

For more individual support or feature requests, please write an email to ab12phylo@gmail.com.

Installation

AB12PHYLO can be installed using conda or pip:

conda install -c lkndl -c conda-forge -c bioconda ab12phylo

or

pip install ab12phylo
:memo: WINDOWS USERS

Windows users must use Anaconda, and run ab12phylo-init before starting the graphical ab12phylo!

When AB12PHYLO is first run, it will check the system for three important non-python tools: RAxML-NG, IQ-Tree 2 and BLAST+. If they are not installed or outdated, AB12PHYLO can download the latest static binaries from GitHub or the NCBI respectively. Check the wiki for more details, troubleshooting, installing from source or updating the package.
As implied above, start the graphical version via ab12phylo from the terminal, and invoke the commandline version via ab12phylo-cmd.

Quick start and functionality

ABI trace files are the main input for AB12PHYLO. Additionally, wellsplate tables can be used to translate back to original sample IDs, provided the mapping is identical for all sequenced genes. Reference data may be included in FASTA format, and the graphical AB12PHYLO accepts FASTA sequences as the main input format as well.

main stages of AB12PHYLO

A: Sequence data is extracted from ABI trace files using a customisable quality control: Sequence ends are trimmed with a sliding window until a certain number (8 out of 10 by default) of bases reach the minimal accepted phred quality score (between 0 and 60, 30 by default). Bases with low phred quality are replaced by N only if they form a consecutive stretch that is longer than a certain threshold (5 by default).

B: Samples missing for a single locus are discarded for all genes. Trimmed traces as well as reference and FASTA sequences are aligned into single-gene Multiple Sequence Alignments (MSAs), which are then each trimmed to a user-defined level conserved positions using Gblocks 0.91b. For multi-gene analyses, the single-gene MSAs are then concatenated into a multi-gene MSA, which is used for ML tree inference. Trees are re-constructed using either RAxML-NG or IQ-Tree 2, with only the latter one available for Windows.

C: AB12PHYLO allows editing of the resulting tree and selection of taxa by label matching, shared ancestry or manual picking. For these selected sub-populations, basic population genetics neutrality and diversity metrics are calculated from the conserved MSA positions only, with adjustable tolerance of gaps and unknown characters. The graphical ab12phylo is both less cumbersome and more capable for these applications; the wiki pages (ab12phylo, ab12phylo-cmd) have more details.

A BLAST search for species annotation can be run on a local database, or via the public NCBI BLAST API. However, importing XML results of a web BLAST should be preferred to running remote API calls as a main strategy.

A simple ab12phylo-cmd example

A simple real-world invocation of commandline AB12PHYLO might look like this:

ab12phylo-cmd -abi <seq_dir> \
    -csv <wellsplates_dir> \
    -g <barcode_gene> \
    -rf <ref.fasta> \
    -bst 1000 \
    -dir <results>

where:

  • <seq_dir> contains all input ABI trace files, ending in .ab1
  • <wellsplates_dir> contains the .csv mappings of user-defined IDs to sequencer's isolate coordinates
  • <barcode_gene> was sequenced, see here for more info
  • <ref.fasta> contains full GenBank reference records like this
  • 1000 -bst = --bootstrap trees will be generated
  • <results> is where results will be

Dependencies

Biopython, NumPy, pandas, Toytree <= 1.2.0, Toyplot, matplotlib, PyYAML, lxml, xmltramp2, svgutils, Pillow, Requests, Beautiful Soup and Jinja2

Non-python dependencies

The pipeline will use existing installations of the programs listed below if they are found on the system $PATH and not considered outdated. Otherwise, both ab12phylo and ab12phylo-cmd can download the latest static binaries from GitHub or the NCBI on their initial runs or if run with --initialize.

References

  • Alexey M. Kozlov, Diego Darriba, Tomáš Flouri, Benoit Morel, and Alexandros Stamatakis (2019) RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics, btz305 doi:10.1093/bioinformatics/btz305

  • Nguyen,L. T., Schmidt,H. A., Von Haeseler,A., and Minh,B. Q. (2015) IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32, 268–274. doi:10.1093/molbev/msu300

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ab12phylo-0.5.21b0.tar.gz (852.0 kB view hashes)

Uploaded Source

Built Distribution

ab12phylo-0.5.21b0-py3-none-any.whl (986.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page