Skip to main content

Virus reassortment inference software.Infers both recent and ancestral reassortment and uses flexible molecular clock constraints.

Project description

TreeSort

TreeSort logo

TreeSort infers both recent and ancestral reassortment events along the branches of a phylogenetic tree of a fixed genomic segment. It uses a statistical hypothesis testing framework to identify branches where reassortment with other segments has occurred and reports these events.

Below is an example of 2 reassortment events inferred by TreeSort on a swine H1 dataset. The reference phylogeny is the hemagglutinin (HA) segment tree, and the branch annotations indicate reassortment relative to the HA's evolutionary history. The annotations list the acquired gene segments and how distant these segments were (# of nucleotide differences) from the original segments. For example, PB2(136) indicates that a new PB2 was acquired that was approximately 136 nucleotides different from the pre-reassortment PB2.

Citation

If you use TreeSort, please cite it as
Markin, A., Macken, C.A., Baker, A.L., and Anderson, T.K. Revealing reassortment in influenza A viruses with TreeSort. bioRxiv 2024.11.15.623781; doi: https://doi.org/10.1101/2024.11.15.623781.

N.B. TreeSort uses TreeTime in a subroutine to infer substitution rates for segments - please also cite Sagulenko et al. 2018 doi: 10.1093/ve/vex042.

Installation

For a default installation, run pip install treesort. Alternatively, you can download this repository and run pip install . from within the downloaded directory. TreeSort requires Python 3 to run and depends on SciPy, BioPython, DendroPy, and TreeTime (these dependencies will be installed automatically).

For a broader installation of the bioinformatics suite required to align sequences and build phylogenetic trees via the prepare_dataset.sh script that we provide, we recommend using a conda environment that can be set up as follows.

If you haven't already, configure bioconda.

conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

Then create a new environment with required dependencies and install TreeSort inside that environment.

git clone https://github.com/flu-crew/TreeSort.git
cd TreeSort
conda create -n treesort-env --file conda-requirements.txt
conda activate treesort-env
pip install .
<Run TreeSort on your data>
conda deactivate

Tutorial

We use a swine H1 influenza A virus dataset for this tutorial. We include only HA and NA gene segments in this analysis for simplicity, but it can be expanded to all 8 segments. Please note that all sequences should have the dates of collection included in the deflines, and all metadata fields should be separated by "|". E.g., "A/swine/Iowa/A02934932/2017|1A.3.3.2|2017-05-12".

To start, we will install TreeSort using the conda method above

git clone https://github.com/flu-crew/TreeSort.git  # Download this repo
cd TreeSort
conda create -n treesort-env --file conda-requirements.txt  # Create a new conda env and install dependencies
conda activate treesort-env
pip install .  # Install TreeSort

Creating a descriptor file

The input to TreeSort is a descriptor file, which is a comma-separated csv file that describes where the alignments and trees for individual segments can be found. Here is an example descriptor file. For our case, the descriptor file could look as follows (the column headings should not be included):

segment name path to the fasta alignment path to the newick-formatted tree
*HA HA-swine_H1_HANA.fasta.aln HA-swine_H1_HANA.fasta.aln.rooted.tre
NA NA-swine_H1_HANA.fasta.aln NA-swine_H1_HANA.fasta.aln.rooted.tre

The star symbol (*) indicates the segment that will be used as the reference phylogeny and reassortment events will be inferred relative to this phylogeny (HA in this case). Note that the reference phylogeny should be rooted, whereas trees for other segments can be unrooted.

We will use prepare_dataset.sh bash script to automatically build alignments and trees for two segments in our swine dataset and compile a descriptor file. The script relies on the fact that every sequence has a segment name in the middle of the defline (e.g., |HA| or |4|).

./prepare_dataset.sh --fast --segments "HA,NA" tutorial/swH1-dataset/swine_H1_HANA.fasta HA tutorial/swH1-parsed

To make things faster, we use the --fast flag here so that all trees are built using FastTree. However, we do not recommend to use this flag for high-precision analyses. When this flag is not used, the script will build the reference phylogeny using IQ-Tree, which will be slower but will likely result in a better quality tree, and therefore more accurate reassortment inference.

The required arguments to the script are the path to the main fasta file, name of the regerence segment, and the path to the output directory. If --segments are not specified, the script assumes that 8 IAV segment names should be used (PB2, PB1, PA, HA, NP, NA, MP, NS).

Running the above command will save the descriptor file, all trees, and alignments to the tutorial/swH1-parsed directory. Note that if for your data you already have trees built, you can manually create the descriptor file without using the script.

Running TreeSort

First make sure to familiarize yourself with the options available in the tool by looking through the help message.

treesort -h

Having the descriptor file from above, TreeSort can be run as follows

cd tutorial/swH1-parsed/
treesort -i descriptor.csv -o swH1-HA.annotated.tre

To run the newest mincut algorithm for reassortment inference (see details here), please use

treesort -i descriptor.csv -o swH1-HA.annotated.tre -m mincut

TreeSort will first estimate molecular clock rates for each segment and then will infer reassortment and annotate the backbone tree. The output tree in nexus format (swH1-HA.annotated.tre) can be visualized in FigTree or icytree.org. You can view the inferred reassortment events by displaying the 'rea' annotations on tree edges, as shown in the Figure above.

In this example TreeSort identifies a total of 93 HA-NA reassortment events:

Inferred reassortment events with NA: 93.
Identified exact branches for 79/93 of them

Additionally, the method outputs the estimated reassortment rate per ancestral lineage per year. The rate translates to the probability of a single strain to undergo a reassortment event over the course of a year. In our case this probability of reassortment with NA is approximately 4%.

Below is a part of the TreeSort output, where we see two consecutive NA reassortment events. The NA clade classifications were added to the strain names so that it's easier to interpret these reassortment events. Here we had a 2002 NA -> 1998A NA switch, followed by a 1998A -> 2002B NA switch.

Uncertain reassortment placement (the '?' tag)

Note that this section only applies to the -m local inference method (the default method for TreeSort). The -m mincut method always infers certain reassortment placements.

Sometimes TreeSort does not have enough information to confidently place a reassortment event on a specific branch of the tree. TreeSort always narrows down the reassortment event to a particular ancestral node on a tree, but may not distinguish which of the child branches was affected by reassortment. In those cases, TreeSort will annotate both child branches with a ?<segment-name> tag. For example, ?PB2(26) below indicates that the reassortment with PB2 might have happened on either of the child branches.

Typically, this happens when the sampling density is low. Therefore, increasing the sampling density by including more strains in the analysis may resolve such instances.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treesort-0.3.1.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

TreeSort-0.3.1-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file treesort-0.3.1.tar.gz.

File metadata

  • Download URL: treesort-0.3.1.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.5

File hashes

Hashes for treesort-0.3.1.tar.gz
Algorithm Hash digest
SHA256 1e1ce0ef11d9836ff444899e82c5b6015d7b09ae885149200122a2f609580d37
MD5 c39628014011735803d469a673d99eb5
BLAKE2b-256 8a7ed880a55e8f5cd38a65dd62a5254c90bfc56eb3f17c45f2fb6cd09ede919c

See more details on using hashes here.

File details

Details for the file TreeSort-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: TreeSort-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 28.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.5

File hashes

Hashes for TreeSort-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2b03b4def1551aa29a9d1c03b91805f715019dd21b06176bbd99a7c485935644
MD5 e67a22ed060ea10eca91d8de1e256bde
BLAKE2b-256 430856c376338796c0ad25db6bb450562cc13b3c880b0d33f116b60db75a923b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page