Fast genome-wide DTL(Duplication-Transfer-Loss) event mapping tool
Project description
(Under Construction) FastDTLmapper: Fast genome-wide DTL event mapper
Table of contents
Overview
Gene gain/loss is considered to be one of the most important evolutionary processes
driving adaptive evolution, but it remains largely unexplored.
Therefore, to investigate the relationship between gene gain/loss and adaptive evolution
in the evolutionary process of organisms, I developed a software pipeline FastDTLmapper
which automatically estimates and maps genome-wide gene gain/loss.
FastDTLmapper takes two inputs, 1. Species tree (Newick format) 2. Genomic CDSs (Fasta|Genbank format),
and performs genome-wide mapping of DTL(Duplication-Transfer-Loss) events by
DTL reconciliation of species tree and gene trees.
Fig. Genome-wide gain/loss map result example (all_gain_loss_map.png)
Each nodes gain/loss data is mapped in following format (NodeID | GeneCount [gain=GainCount los=LossCount])
Map data is embeded in newick format bootstrap value field and user can visualize using SeaView.
Install
FastDTLmapper is implemented in Python3(>=3.7) and runs on Linux (Tested on Ubuntu20.04).
Install PyPI stable version with pip:
pip install fastdtlmapper
Install latest development version with pip:
pip install git+git://github.com/moshi4/FastDTLmapper.git
Dependencies
All of the following dependencies are packaged in src/fastdtlmapper/bin directory.
- OrthoFinder
Orthology inference tool - mafft
Sequences alignment tool - trimal
Alignment sequences trim tool - IQ-TREE
Phylogenetic tree reconstruction tool - AnGST
DTL reconciliation tool (Requires Python 2.X to run) - parallel
Job parallelization tool (Requires Perl to run)
Pipeline Summary
- Grouping ortholog sequences using OrthoFinder
- Align each OG(Ortholog Group) sequences using mafft
- Trim each OG alignment using trimal
- Reconstruct each OG gene tree using IQ-TREE
- Species tree & each OG gene tree DTL reconciliation using AnGST
- Aggregate and map genome-wide DTL reconciliation result
Command Usage
Run Command
FastDTLmapper -i [fasta|genbank directory] -t [species tree file] -o [output directory]
Options
-h, --help show this help message and exit
-i , --indir Input Fasta(*.fa|*.faa|*.fasta), Genbank(*.gb|*.gbk|*.genbank) directory
-t , --tree Input rooted species tree file (Newick format)
-o , --outdir Output directory
-p , --process_num Number of processor (Default: MaxProcessor - 1)
--dup_cost Duplication event cost (Default: 2)
--los_cost Loss event cost (Default: 1)
--trn_cost Transfer event cost (Default: 3)
--inflation MCL inflation parameter (Default: 3.0)
--timetree Use species tree as timetree (Default: off)
--rseed Number of random seed (Default: 0)
Input Limitation
- fasta or genbank files (--indir option)
Following characters cannot be included in file name '_', '-', '|', '.' - species tree file (--tree option)
Species name in species tree must match fasta or genbank file name
Timetree Option
If user set this option, input species tree must be ultrametric tree.
--timetree enable AnGST timetree option below (See AnGST manual for details).
If the branch lengths on the provided species tree represent times, AnGST can restrict the set of possible inferred gene transfers to only those between contemporaneous lineages
Example Command
FastDTLmapper -i ./example/fasta/ -t ./example/species_tree.nwk -o ./fastdtlmapper_result
Output Contents
Output Top Directory
Top directory | Contents |
---|---|
00_user_data | Formatted user input fasta and tree files |
01_orthofinder | OrthoFinder raw output results |
02_dtl_reconciliation | Each OG(Ortholog Group) DTL reconciliation result |
03_aggregate_map_result | Genome-wide DTL reconciliation aggregated and mapped results |
log | Config log and command log files |
Output Directory Structure & Files
.
├── 00_user_data/ -- User input data
│ ├── fasta/ -- Formatted fasta files
│ └── tree/ -- Formatted newick species tree files
│
├── 01_orthofinder/ -- OrthoFinder raw output results
│
├── 02_dtl_reconciliation/ -- Each OG(Ortholog Group) DTL reconciliation result
│ ├── OG0000000/
│ │ ├── OG0000000.fa -- OG fasta file
│ │ ├── OG0000000_aln.fa -- OG alignment fasta file
│ │ ├── OG0000000_aln_trim.fa -- Trimmed OG alignement fasta file
│ │ ├── OG0000000_dtl_map.nwk -- OG DTL event mapped tree file
│ │ ├── OG0000000_gain_loss_map.nwk -- OG Gain-Loss event mapped tree file
│ │ ├── angst/ -- AnGST DTL reconciliation result
│ │ └── iqtree/ -- IQ-TREE gene tree reconstruction result
│ │
│ ├── OG0000001/
│ .
│ .
│ └── OGXXXXXXX/
│
├── 03_aggregate_map_result/ -- Genome-wide DTL reconciliation aggregated and mapped results
│ ├── all_dtl_map.nwk -- Genome-wide DTL event mapped tree file
│ ├── all_gain_loss_map.nwk -- Genome-wide Gain-Loss event mapped tree file
│ ├── all_og_node_event.tsv -- All OG DTL event record file
│ ├── all_transfer_gene_count.tsv -- All transfer gene count file
│ └── all_transfer_gene_list.tsv -- All transfer gene list file
│
└── log/
├── parallel_cmds/ -- Parallel run command log results
└── run_config.log -- Program run config log file
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for FastDTLmapper-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b44517352e10630707308454e6003ac4a5fa939b602c2aa23af0a66d0a0619b3 |
|
MD5 | 8eec09a22a8c702d9a301c81f24120ab |
|
BLAKE2b-256 | fd8cb3a2d4de5ef32db5eb19b68a37e64202636fdc03530baa1809cc3c460706 |