Tree2gd
Project description
Tree2gd
Tree2GD provides an integrated pipeline to identify WGD events, with friendful commands in one-step or multiple steps, with smart quality control in custom dataset, with multithreading design costing low time, with well performance in detect WGD signals, and with advanced visualization of GDs and Ks peaks.
Python Requirements
We currently recommend using Python 3.8 from http://www.python.org Tree2GD is currently supported and tested on the following Python implementations:
Python 3.6, 3.7, 3.8 – see http://www.python.org
Pip3.6>=v19.2.3 – see https://pypi.org/project/pip
Installing by sudo apt-get install python3-pip or with get-pip.py using
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py python3 get-pip.py
R Requirements
In order to complete the final drawing result display,We currently recommend using R>=v4.0.0 from https://www.r-project.org
Installation
Installation From Pypi
You can quickly install Tree2gd through the following command, and automatically install the python packages it depends on by run:
pip3 install Tree2gd [--user] #You may need the --user parameter if you do not have administrator rights
If you need to use a specific python3 path to install and use Tree2gd, you can replace the above pip3 with /THE/PATH/OF/YOUR/PYTHON3 -m pip
Installation From Source
You can download and decompress our source code, or fetch it using git. Now change directory to the Tree2gd source code folder and run:
python3 setup.py build python3 setup.py test python3 setup.py install [--user] #You may need the --user parameter if you do not have administrator rights
Testing
After the installation is successful, the main program command Tree2gd and test command Tree2gd_test will be added to your system. You can first check whether Tree2gd is installed successfully by running the following command:
$ Tree2gd -h
If the system feeds back its corresponding parameter description, congratulations on the correct installation of Tree2gd to your system! Next, we strongly recommend that you run Tree2gd_test to use the data we have prepared for a quick and complete Tree2gd test, because you can get the following benefits:
1.Check whether the pre-compiled version of the software we use by default is suitable for your system, and replace the unavailable ones with the configuration file (see the following instructions for configuration).
2.When using for the first time, in the final drawing part, we will spend a few minutes to install several dependent packages in R. After that, the formal use will be faster and more convenient.
3.After the user modifies the configuration file, you can add own new settings through the -config parameter of the command to test, and quickly detect that the new configuration can run successfully.
The Tree2gd_test command will run the complete analysis process with the fastest parameter settings. It only contains two optional parameters command:
$ Tree2gd_test [-t] [--config] -t [int] sets the number of threads for testing (default: 1) --config [str] uses the configuration given by the user File for testing (verify availability of custom configuration)
In the case of 4 cpus, it takes about 5 minutes to complete a round of testing (the first run will take some extra time to download and install the R package). After successful operation, it will generate a folder in the current directory``./Tree2gd_test_out``, You can check it (especially the final drawing result Tree2GD.result.pdf in step6) to fully verify the running effect of the software.
Running
You can complete all WGD analysis only with the simplest commands below and get a perfect drawing display:
$ Tree2gd -i input_dir -tree phytree.nwk
Among them, phytree.nwk is the species evolution tree in newick format.
The input_dir folder contains all the corresponding protein sequences (default postfix .pep) and cds sequences (default postfix .cds) of each species contained in phytree.nwk by fasta format.
In addition, you can add the following optional parameters to make the program run faster and better (especially when using multi-core operation):
- -t t
Thread num.default:1
- -o outputdir
The output dir.default:./output
- --step step_num_str
Which steps you need.default:123456(Choose from numbers: such as ‘234’)
- --log logfile
log file name,or log will print on stdout
- --config config_file
config.ini configuration file, leave it blank to run with default parameters and the program’s own software version.
- --debug
The log file will contain the output of each software itself, which is convenient for finding errors (-log is required)
- --only_script
Only generate scripts, not run automatically.
- --cds2tree
Use cds sequence to construct gene tree.
- --synteny
Using the results of the covariance analysis, the GD ratio and Ks distribution were optimized. Gene annotations information for each species need to be provided in the input folder as *.bed files
Detailed parameter configuration file : config.ini
There are many softwares in the Tree2gd process. The pre-compiled versions of the programs are already used by default. At the same time, these softwares have many parameters that can be adjusted to achieve the best results.
So we used the config.ini file to summarize these settings, input it to the program through the -config parameter, and call the configuration in the corresponding program.
! note! Any item in this file is optional, users only need to add the lines they need in the corresponding section
[software] #The path of all software used by Tree2gd.If one is not set or set to empty,the program will use its own pre-compiled software version (location at /THE/PATH/OF/python/site-packages/software/) diamond =/THE/PATH/OF/python/site-packages/software/diamond muscle=/THE/PATH/OF/python/site-packages/software/muscle iqtree=/THE/PATH/OF/python/site-packages/software/iqtree tree2gd=/THE/PATH/OF/python/site-packages/software/Tree2GD phymcl=/THE/PATH/OF/python/site-packages/software/PhyloMCL KaKs_Calculator=/THE/PATH/OF/python/site-packages/software/KaKs_Calculator calculate_4DTV=/THE/PATH/OF/python/site-packages/software/calculate_4DTV_correction.pl Epal2nal=/THE/PATH/OF/python/site-packages/software/Epal2nal.pl dolloparsimony=/THE/PATH/OF/python/site-packages/software/dolloparsimony [postfix] #The file name postfix of each species protein and cds, the prefix must be exactly the same as in the tree file pep=.pep cds=.cds [diamond] #The parameters used by diamond, in addition to the following default parameters, the user can add any parameter that diamond can recognize -e=1e-10 -p=4 #The number of threads used by each diamond, the number of parallel diamonds in actual operation is Tree2gd thread//it [phymcl] #The parameters used by phymcl, the user can add any parameter that phymcl can recognize [mcl2fasta] min_taxa=4 #The minimum number of species contained in each gene set when doing paper mulberry, cannot be less than 4, otherwise a meaningful tree cannot be built [iqtree] #The parameters used by iqtree, in addition to the following default parameters, the user can add any parameter that iqtree can recognize -B=1000 #Ultrafast bootstrap (>=1000) If you do not set it default to 1000, you can force it to 0 so that bootstrap is not performed, but it is not recommended except for testing -m=JTT+G4 #If the -cds2tree parameter is added, it will default to HKY. Please specify DNA or Protein when defining the tree structure model [tree2gd] #The parameters used by tree2gd, in addition to the following default parameters, the user can add any parameter that tree2gd can recognize --bp=50
Sample output plot show
Summary output plot
Interactive html kaks plot
R kaks diagram
WGD identification by support vector machine (SVM) model
Software and Citation
step1.blastp
[diamond] Buchfink B, Xie C, Huson DH, “Fast and sensitive protein alignment using DIAMOND”, Nature Methods 12, 59-60 (2015). doi:10.1038/nmeth.3176 [seqkit] W Shen, S Le, Y Li*, F Hu*. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLOS ONE. doi:10.1371/journal.pone.0163962.
step2.MCL
[phylomcl] Zhou S , Chen Y , Guo C , et al. PhyloMCL: Accurate clustering of hierarchical orthogroups guided by phylogenetic relationship and inference of polyploidy events[J]. Methods in Ecology and Evolution, 2020.
step3.dollop
[dolloparsimony]
step4.WGD
[Tree2GD](Made some modifications on the 2.4 version) https://tree2gd.sourceforge.io/
[MUSCLE] Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797.
[iqtree] B.Q. Minh, H.A. Schmidt, O. Chernomor, D. Schrempf, M.D. Woodhams, A. von Haeseler, R. Lanfear (2020) IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol., 37:1530-1534. https://doi.org/10.1093/molbev/msaa015
[pal2nal.pl] (v14; January 6, 2012) Zhang Zhang (zhangzhang@big.ac.cn)
step5.KaKs
[MUSCLE] Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797.
step6.plot_summary
[jcvi] Tang H , Krishnakumar V , Li J . jcvi: JCVI utility libraries[J]. 2015.
[MCscan] Tang H , Bowers J E , Wang X , et al. Synteny and Collinearity in Plant Genomes[J]. Science, 2008, 320(5875):p.486-488.
[ggtree](R package) G Yu. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics, 2020, 69:e96. doi: 10.1002/cpbi.96.
[pyecharts](Python package) https://pyecharts.org/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Tree2gd-1.0.43.tar.gz
.
File metadata
- Download URL: Tree2gd-1.0.43.tar.gz
- Upload date:
- Size: 13.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 711f68476bbe17163d1efaee9947d081ae23e2946a61029e4477c91a2ae842ff |
|
MD5 | 89b05e759a0f4f6e8e6ea555126cc95f |
|
BLAKE2b-256 | c4d93d20488a92c56b16472ec65bdb2c8de5b26ed80f1a08f34f882bc5f16278 |
File details
Details for the file Tree2gd-1.0.43-py3-none-any.whl
.
File metadata
- Download URL: Tree2gd-1.0.43-py3-none-any.whl
- Upload date:
- Size: 13.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 439231d540e5e9310a18ebf2c498f2ae142e324aa80d7d047ee6529f3ef5ccaa |
|
MD5 | cdcb9f2defb422eb575f4545b9659a9c |
|
BLAKE2b-256 | a82f72c884891bc202087dc8359ff0d60ff9d861ee30f0489e40f7ff9bcecfca |