TreeSAPP is a functional and taxonomic annotation tool
Project description
TreeSAPP: Tree-based Sensitive and Accurate Phylogenetic Profiler
Connor Morgan-Lang, Ryan McLaughlin, Grace Zhang, Kevin Chan, Zachary Armstrong, and Steven J. Hallam
Overview:
TreeSAPP is a python package for phylogenetically annotating genomes and metagenomes. Here is diagram of the workflow:
Download and installation:
TreeSAPP can be installed using conda:
conda install -c conda-forge -c bioconda treeesapp
If you're working in an HPC environment and don't have conda installed, we also have a singularity container available:
singularity pull library://cmorganl/treesapp
singularity exec treesapp.sif
Finally, if you want to install the latest version of TreeSAPP locally,
you can use git clone
to pull down the latest version.
We recommend using a virtual environment using the python package virtualenv
while installing TreeSAPP and all dependencies.
cd ~/bin
virtualenv ~/bin/treesapp_venv
source ~/bin/treesapp_venv/bin/activate
pip install treesapp
make rpkm
However, the pipeline will not run without several dependencies.
Downloading dependencies:
If you do not already have the dependencies for TreeSAPP installed on your computer, we've listed how to easily download and install each one below. Good luck!
RAxML
A simple git clone
of their GitHub page should work
for Linux and Mac operating systems.
From here, consult the README file in the standard-RAxML directory for installation instructions using make.
We highly recommend only using release 8.2.12 as older versions were found to not estimate pendant distances of placements as accurately.
However, the executable MUST be named raxmlHPC
or it will not be found by TreeSAPP!
HMMER
TreeSAPP uses HMMER for identifying marker gene sequences in proteins and genomes. The latest version (v3.3) is available at http://hmmer.org/. Download it from there and follow their installation guide under DOCUMENTATION.
Prodigal
Prodigal (version 2.6.3) can be downloaded from the GitHub page. Follow the installation guide on their GitHub wiki to install. There is an upcoming version 3 so these links may become outdated soon!
MAFFT
MAFFT multiple alignment software is only required for creating and updating reference packages, it is not a part of the main workflow. Therefore, feel free to skip installing MAFFT unless you plan on doing either one of those tasks. If not, here is the MAFFT webpage. Download and installation instructions are available from there.
USEARCH
The current version of TreeSAPP uses USEARCH for multiple clustering stages, such as when building reference packages. Fortunately, no application should require huge amounts of RAM so we can use the free, 32-bit version available at the Robert Edgar's drive5 website.
OD-Seq
OD-Seq
is used for detecting mis-annotated or "outliers" in multiple sequence alignments when building new reference packages.
It can be installed into TreeSAPP as a part of make all
or in isolation with make odseq
.
Source files can also be downloaded from the University College Dublin's website using this
link.
Finishing up
I hope that wasn't too painful. If you think you have installed everything, try running treesapp info
!
It will check for the required executables up front and you will be
quickly notified if some are missing, or at least TreeSAPP is unable to find them.
In the case you do not have sudo permissions to move these executables to a globally-available directory (e.g. /usr/local/bin/),
you can copy them to treesapp_venv/lib/python*/site-packages/treesapp-*.egg/treesapp/sub_binaries/
and TreeSAPP will be able to find and use them.
Running TreeSAPP
To list all the sub-commands run treesapp
.
To test the assign
workflow, run:
treesapp assign -i ~/bin/TreeSAPP/test_data/marker_test_suite.faa -m prot --trim_align -o assign_test -t M0701,M0702,M0705
To assign sequences in your genome of interest:
treesapp assign -i Any.fasta -o ~/path/to/output/directory/
though, as in the previous assign command, we recommend using the --trim_align
flag,
and increasing the number of threads and processors to use with -n
.
Tutorials
If we do not yet have a reference package for a gene you are interested in, please try building a new reference package. Of course, if you run into any problems or would like to collaborate on building many reference packages don't hesitate to email us or create a new issue with an 'enhancement' label.
To determine whether the sequences used to build your new reference package are what you think they are, and whether it might unexpectedly annotate homologous sequences, see the purity tutorial.
If you are working with a particularly complex reference package, from an orthologous group for example, or have extra
phylogenetic information you'd like to include in your classifications,
try annotating extra features with treesapp layer
.
The easiest way to get started with TreeSAPP is by using Terraform to provision a Google Cloud Platform instance with TreeSAPP and all its dependencies. This is outdated and scripts supporting this are under repair
Yet to come:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for treesapp-0.6.5-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a95c82dc5cbd361a2c6f632f9b50e827f95f10757192e4b03457e3074fe04f45 |
|
MD5 | 890c8b989b79da14f8d2780302a18993 |
|
BLAKE2b-256 | 81129c9c4585c6f678264ef7ed676260fcd1a613d7af8e90810843d3a17729b3 |
Hashes for treesapp-0.6.5-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70eeb92a642f53abb0c210a74dbf71295d38a3224cb9d3a9b734bf395e603c53 |
|
MD5 | b433f91be8fb030fbc4aa43b8ee61fea |
|
BLAKE2b-256 | c2c2269258000d6c4789bf868eff75fe80cab5ee0d309f637e3aade6fe8f89b8 |
Hashes for treesapp-0.6.5-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1231a18af2003e6615f3824cec173e547b2b3920ae9eada4923842bf83e75df7 |
|
MD5 | bf647fb40d066acbbf441a9899cf7261 |
|
BLAKE2b-256 | 8a3c50d34e09303189d1ac1305154c8d9f1ab60b3f5ce806b93aeb30a6fed8a3 |
Hashes for treesapp-0.6.5-cp37-cp37m-macosx_10_6_intel.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9cd0db995ca2119dc8444fe2caf7235e19cbd58b2b26314a2259e87bd1b733c |
|
MD5 | 081753ebc4321f8a051a04fce01c30dc |
|
BLAKE2b-256 | a8d724bf74f3eda05c3b86fd3c2879ffcca147755dfea2090371dc9e93c88c7f |
Hashes for treesapp-0.6.5-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 742c0e1956b6592e49801668b7654619a1b0f2b33b49ab2ae5fc26cc54d580c0 |
|
MD5 | 026feafe3fc1c8a6d78c55424c3fe60c |
|
BLAKE2b-256 | 0b5e6fe49482070fa06e094b5b9817cc5d675a069b6ab2aa12c189ae041855b4 |
Hashes for treesapp-0.6.5-cp36-cp36m-macosx_10_6_intel.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16f5cf59e9b29964c74bc63556888ee65e68a9c582c9326f6b272c474464e675 |
|
MD5 | 5e855df40c2f42c498ee578e3bf5b983 |
|
BLAKE2b-256 | 31077a226f8595fcac03ba510ea70b02f7a06abd362ac51ff0c84a92a7b33f8b |
Hashes for treesapp-0.6.5-cp35-cp35m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b4bae6bb618a70c55dbc6c8a199d7edcfa4822cc2f0b9221b496857ca54abc3 |
|
MD5 | 56952efb5fa9ca9c6d8268be2e62f775 |
|
BLAKE2b-256 | 1646b6351be3e4ce8016cab4a543ced199dcd43d048687e18ffd4d3c29149c59 |
Hashes for treesapp-0.6.5-cp35-cp35m-macosx_10_6_intel.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1981f2de427e4677c05913764f85218e2cac0802f7712888d2f27e63c51de247 |
|
MD5 | 2b568d11ebd4b00ad55251b3445ecfaa |
|
BLAKE2b-256 | 70cce5dbad76e474a4348a8287c8ca927fa10f73d12e322c77c7772a09610c38 |