Skip to main content

Computational pipeline for fast and easy construction of phylogenetic trees.

Project description

Simple Genome Tree (SGTree)

v0.1 05/19/2019

Simple Genome Tree (SGTree) is a computational pipeline for fast and easy construction of phylogenetic trees from a set of user provided genomes and a set of phylogenetic markers, in a taxonomic framework of de-replicated reference genomes. SGTree identifies conserved phylogenetic marker proteins and evaluates additional copies of markers derived from either duplications, horizontal gene transfer or contamination, to build a phylogenetic tree based on the concatenated alignment of selected marker proteins.

Setup

Activate conda environment and run sgtree

  1. First clone the git repository. (cHdNmkqqNNrQNa9Kvux6)

  2. Make sure you have anaconda installed, you can install it here. https://www.anaconda.com/distribution/#download-section

  3. Depending on your OS, choose the osxenv.txt for mac or the ubuntu-env.txt file for Ubuntu or linux.

  4. Next run (where spec-file.txt is either ubuntu_env or osxenv):

	conda create --name sgtree_env --file spec-file.txt
	source activate sgtree_env (or conda activate depending on your version)

Run SGTree

  1. Run sgtree with the provided set of query genomes and models for testing, user can control the number of CPUs used by the computer, the minimum percentage of models with hits for a genome to be considered as part of the dataset as well as a directory with reference genomes. Genomes from the query genomes directory and the reference genomes directory will be colored red and grey respectively.
	# test example
	python3 sgtree.py testgenomes/Chloroflexi hmms/UNI56 --num_cpus 8
	# general example
	python3 sgtree.py <genomes directory> <models directory> --num_cpus <integer> --percent_models 
	<integer> --ref <reference genomes directory>
For running sgtree_final.py, please make a ref_concat folder wherever you want to store reference db runs. 
	python3 sgtree_final.py testgenomes/Chloroflexi hmms/UNI56 
	sgtree/references_concat --num_cpus 10 --save_dir sgtree/test --ref sgtree/testgenomes/chlorref
	python3 sgtree_final.py <genomes directory> <models directory> <references directory>--num_cpus <integer> --percent_models 
	<integer> --ref <reference genomes directory>

## Important note
1. Genomes must have the header format as follows: 

``` bash
	>IMG2684622718|2685462912
	MLCAFAEEEAKIAETVGKVATELKVKKLLSDFATKEGEEHISTYNKIAMTAKAEGYADIEAMLCAFAEEEAKLQKL

where the first field before the pipe contains the genome identifier which matches the filename base, and the second field a unique protein identifier.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgtree-0.0.10.tar.gz (17.1 kB view hashes)

Uploaded Source

Built Distribution

sgtree-0.0.10-py3-none-any.whl (17.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page