Computational pipeline for fast and easy construction of phylogenetic trees.
Simple Genome Tree (SGTree)
Simple Genome Tree (SGTree) is a computational pipeline for fast and easy construction of phylogenetic trees from a set of user provided genomes and a set of phylogenetic markers, in a taxonomic framework of de-replicated reference genomes. SGTree identifies conserved phylogenetic marker proteins and evaluates additional copies of markers derived from either duplications, horizontal gene transfer or contamination, to build a phylogenetic tree based on the concatenated alignment of selected marker proteins.
Activate conda environment and run sgtree
First clone the git repository. (cHdNmkqqNNrQNa9Kvux6)
Make sure you have anaconda installed, you can install it here. https://www.anaconda.com/distribution/#download-section
Depending on your OS, choose the osxenv.txt for mac or the ubuntu-env.txt file for Ubuntu or linux.
Next run (where spec-file.txt is either ubuntu_env or osxenv):
conda create --name sgtree_env --file spec-file.txt source activate sgtree_env (or conda activate depending on your version)
- Run sgtree with the provided set of query genomes and models for testing, user can control the number of CPUs used by the computer, the minimum percentage of models with hits for a genome to be considered as part of the dataset as well as a directory with reference genomes. Genomes from the query genomes directory and the reference genomes directory will be colored red and grey respectively.
# test example python3 sgtree.py testgenomes/Chloroflexi hmms/UNI56 --num_cpus 8 # general example python3 sgtree.py <genomes directory> <models directory> --num_cpus <integer> --percent_models <integer> --ref <reference genomes directory>
For running sgtree_final.py, please make a ref_concat folder wherever you want to store reference db runs.
python3 sgtree_final.py testgenomes/Chloroflexi hmms/UNI56 sgtree/references_concat --num_cpus 10 --save_dir sgtree/test --ref sgtree/testgenomes/chlorref python3 sgtree_final.py <genomes directory> <models directory> <references directory>--num_cpus <integer> --percent_models <integer> --ref <reference genomes directory>
## Important note 1. Genomes must have the header format as follows: ``` bash >IMG2684622718|2685462912 MLCAFAEEEAKIAETVGKVATELKVKKLLSDFATKEGEEHISTYNKIAMTAKAEGYADIEAMLCAFAEEEAKLQKL
where the first field before the pipe contains the genome identifier which matches the filename base, and the second field a unique protein identifier.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size sgtree-0.0.10-py3-none-any.whl (17.6 kB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size sgtree-0.0.10.tar.gz (17.1 kB)||File type Source||Python version None||Upload date||Hashes View|