A toolkit for gene tree rooting, gene duplication identification, ortholog retrieval, phylogenetic noise elimination, species hybridization detection, and visualization.
Project description
PhyloTracer
###############################################################################################
██████╗ ██╗ ██╗██╗ ██╗██╗ ██████╗ ████████╗██████╗ █████╗ ██████╗███████╗██████╗
██╔══██╗██║ ██║╚██╗ ██╔╝██║ ██╔═══██╗╚══██╔══╝██╔══██╗██╔══██╗██╔════╝██╔════╝██╔══██╗
██████╔╝███████║ ╚████╔╝ ██║ ██║ ██║ ██║ ██████╔╝███████║██║ █████╗ ██████╔╝
██╔═══╝ ██╔══██║ ╚██╔╝ ██║ ██║ ██║ ██║ ██╔══██╗██╔══██║██║ ██╔══╝ ██╔══██╗
██║ ██║ ██║ ██║ ███████╗╚██████╔╝ ██║ ██║ ██║██║ ██║╚██████╗███████╗██║ ██║
╚═╝ ╚═╝ ╚═╝ ╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝╚══════╝╚═╝ ╚═╝
PhyloTracer: A Versatile Toolkit for Comparative Genomics and Phylogenomics Analysis.
Pypi: https://pypi.org/project/PhyloTracer
Github: https://github.com/YiyongZhao/PhyloTracer
License: MIT license
Release Date: 2023-7
Contacts: Taoli(l948777439@163.com); Yiyong Zhao(yiyongzhao1991@gmail.com)
###############################################################################################
PhyloTracer
A user-friendly toolkit for gene tree rooting, topology summarization, species hybridization signal screening, gene-duplication (GD) and loss profiling, and subgenome-aware ortholog splitting, with practical utilities for tree format manipulation and visualization.
What does PhyloTracer do?
PhyloTracer provides a reproducible workflow centered on accurate gene tree rooting and topology statistics. It further offers utilities for screening hybridization-like signals (via topology patterns such as ABAB/ABBA variants), summarizing GD and loss patterns, and subgenome-informed splitting of multi-copy families.
All modules are designed to be used independently or combined in larger phylogenomic pipelines. Where applicable, methods are documented with input assumptions and recommended validation steps to ensure rigorous interpretation.
Table of Contents
- What does PhyloTracer do?
- Module Features
- Getting started with PhyloTracer
- Advanced installation notes
- Example input files
- PhyloTracer Results Files
- Command line options
- Bug Reports
- Contributing
- Version History
- License
Module Features
PhyloTracer integrates 16 modular tools covering phylogenetic preprocessing, rooting, orthology refinement, duplication/loss detection, and hybridization analysis. Each module can run independently or be incorporated into larger evolutionary pipelines.
- PhyloTree_CollapseExpand: Transforms a phylogenetic tree into a “comb-like” structure based on a predefined support threshold, and re-expands it back to binary form when needed.
- PhyloSupport_Scaler: Recalibrates branch support values to standardized scales ([0–1] or [1–100]) for consistent computation.
- BranchLength_NumericConverter: Converts branch-length strings to numeric format for downstream quantitative analyses.
- Phylo_Rooter: Implements an accurate, automated rooting algorithm for gene trees to enhance evolutionary inference.
- OrthoFilter_LB: Removes tips with excessively long branches to eliminate phylogenomic noise.
- OrthoFilter_Mono: Prunes non-monophyletic outliers and paralogs under user-defined taxonomic constraints.
- TreeTopology_Summarizer: Summarizes frequencies of absolute and relative topologies across gene trees or predefined clades.
- Tree_Visualizer: Visualizes duplication events, node labels, and expression profiles on gene and species trees.
- GD_Detector: Identifies gene duplication events via reconciliation between gene and species trees.
- GD_Visualizer: Displays detected duplication nodes in a species tree context.
- GD_Loss_Tracker: Tracks duplication-loss patterns following major GD bursts across species tree nodes.
- GD_Loss_Visualizer: Visualizes node/tip-specific gene loss summaries.
- Ortho_Retriever: Extracts putative single-copy orthologs by recursively splitting paralogous clades.
- Hybrid_Tracer: Detects hybridization signals from duplicated genes using coalescent-based phylogenetic invariants.
- Hybrid_Visualizer: Highlights hybridization proportions (γ) and support values across the species tree.
- HaploFinder: Identifies ancient recombination (conversion/crossover) events by tracing subgenome haplotypes.
Together, these modules provide a comprehensive workflow for constructing, refining, and interpreting large-scale phylogenomic data.
Getting started with PhyloTracer
Clone and install environment
#A convenient one-click installation by using conda (https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html) with the following commands:
git clone https://github.com/YiyongZhao/PhyloTracer.git
cd PhyloTracer
conda env create -f environment.yml
conda activate PhyloTracer
#Alternatively, a convenient one-click installation by using pip (the package installer for Python) with the following commands:
chmod +x install_packages.sh
bash install_packages.sh
#Reminder for potential visualization issues: qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found and this application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
#Alternative available platform plugins include: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-#egl, wayland-xcomposite-glx, webgl, xcb. before running PhyloTracer, please execute the following bash command:
export QT_QPA_PLATFORM=offscreen
Install from PyPI with pip
pip install PhyloTracer
Quick start from GitHub ZIP (download + extract)
# 1) Download and unzip PhyloTracer-main.zip from GitHub
# 2) Enter project directory
cd PhyloTracer-main
# 3) Create environment
conda env create -f environment.yml
conda activate PhyloTracer
# 4) Run help
PhyloTracer -h
# 5) Example run
PhyloTracer GD_Detector \
--input_GF_list example_data/GD_Detector/GF_ID2path.imap \
--input_imap example_data/GD_Detector/gene2sps.imap \
--input_sps_tree example_data/GD_Detector/sptree.nwk \
--gd_support 50 \
--subclade_support 50 \
--dup_species_proportion 0 \
--dup_species_num 2 \
--deepvar 1
Features
-
Rooting (core, via
Phylo_Rooter):
Uses a two-stage strategy:- Stage 1 (fast screening, no RF) computes
deep,var,GD, andspecies_overlap; - Stage 2 (fine screening) computes
RFonly for top candidates, then selects the best root by sortingRFfirst, then Stage-1 score.
Stage-1 scoring in code: $$ score = w_{deep}\cdot norm(deep)+w_{var}\cdot norm(var)+w_{GD}\cdot norm(GD)-w_{SO}\cdot norm(species_overlap) $$ where
norm(x)is min-max normalization in ([0,1]).📊 Stage-1 weights in
Phylo_Rooter(from code)Metric Multi-copy GFs Single-copy GFs (Stage-1) deep0.30 0.70 var0.10 0.30 GD0.50 0.00 species_overlap(subtracted)0.10 0.00 RFnot used in Stage-1 not used in Stage-1 Stage-2 selection in code:
- keep top
max(20, ceil(0.8 * total_candidates))candidates by Stage-1 score; - compute
RFfor them; - choose best by
sort_values(by=["RF", "score"], ascending=[True, True]).
- Stage 1 (fast screening, no RF) computes
- Topology statistics (via
TreeTopology_Summarizer):
Computes the absolute and relative topology frequencies for single-copy gene trees.
Supports grouped summarization by user-provided labels (e.g., family/order tags) when supplied.
-
Hybridization screening (via
Hybrid_Tracer):
Hybrid_Tracerextends conventional HyDe-style hybridization detection by leveraging gene duplication (GD)–based signal extraction, offering both grouped and ungrouped analytical modes for cleaner and more node-specific hybridization inference.Two complementary strategies are implemented:
-
Ungrouped mode (concatenation-based):
For a specific ancestral node,Hybrid_Tracerconcatenates the alignment sequences from all duplicated genes descending from that node’s GD events, forming a targeted alignment matrix.
This enables direct HyDe-like inference of hybridization signals while minimizing unrelated noise, since only genes phylogenetically tied to that duplication origin are included. -
Grouped mode (signal integration):
Alternatively, GD events can be partitioned into multiple groups according to their gene tree topology or taxonomic context.
Each group is analyzed independently to estimate its hybridization proportion (γ) and support.
These group-level results are then integrated (e.g., averaged or weighted) to infer the overall hybridization signal of that ancestral node.
Compared with the traditional HyDe pipeline that concatenates all single-copy genes across the genome,
Hybrid_Tracerfocuses exclusively on GD-derived gene sets related to the evolutionary node of interest.
This design produces cleaner, more interpretable, and more localized hybridization signals, reducing interference from unrelated loci and improving the biological relevance of γ estimates. -
- GD & loss profiling (via
GD_DetectorandGD_Loss_Tracker):
Reconciles gene–species trees to summarize gene duplication events and lineage-specific loss patterns.
Paired visualizers (GD_Visualizer,GD_Loss_Visualizer) assist in comparative interpretation.
- HaploFinder:
Detects ancient genome recombination signals, including gene conversions and crossover events, by tracing subgenome-specific haplotypes through phylogenomic profiling.
This module helps characterize the historical exchange of genetic material between subgenomes and provides insights into genome evolution following duplication or hybridization.
Installation
Important: Python 3.13 is NOT supported. PhyloTracer depends on
ete3, which uses thecgimodule that was removed in Python 3.13. Please use Python 3.8–3.12. We recommend creating a dedicated conda environment:conda create -n phylotracer python=3.12 -y conda activate phylotracer conda install -c conda-forge pyqt=5 -y pip install PhyloTracer export QT_QPA_PLATFORM=offscreen # Required for headless/server environments
Required dependencies:
- Python 3.8–3.12 (Python 3.13+ is not supported due to ete3 dependency)
- Python modules:
- ete3==3.1.3
- HyDe (https://github.com/pblischak/HyDe)
- pandas==2.2.1
- numpy==1.26.4
- tqdm==4.66.2
- pypdf>=3.0.0
- matplotlib==3.8.3
- pyqt5==5.15.10
- pyqt5-qt5==5.15.12
- pyqt5-sip==12.13.0
- pillow==10.2.0
- fonttools==4.49.0
- cycler==0.12.1
- kiwisolver==1.4.5
- packaging==23.2
- pyparsing==3.1.1
- python-dateutil==2.8.2
- pytz==2024.1
- six==1.16.0
- tzdata==2024.1
Note: PhyloTracer uses basic functions of analysis and visualization of trees from Python framework ete3 and detects species hybridization signals using ABAB-BABA test by HyDe.
Example input files
The following input file should have two columns and be separated by a tab key.
Provide a two-column file in TSV format: each line contains <gene_family_IDs><TAB><file_paths>
------------GF_ID2path.imap-------------------------------------------------------------------------------------------------------
OG_104001 example_data/Phylo_Rooter/OG_104001.treefile
OG_104002 example_data/Phylo_Rooter/OG_104002.treefile
OG_104003 example_data/Phylo_Rooter/OG_104003.treefile
Provide a two-column file in TSV format: each line contains <gene_id><TAB><sequence_length>
------------gene2length.imap------------------------------------------------------------------------------------------------------
AMTR_s00796p00010580 201
ATCG00500.1 1467
Glyma.07G273800.2 3417
Provide a two-column file in TSV format: each line contains <gene_id><TAB><species>
------------gene2sps.imap---------------------------------------------------------------------------------------------------------
AMTR_s00796p00010580 Amborella_trichopoda
ATCG00500.1 Arabidopsis_thaliana
Glyma.07G273800.2 Glycine_max
Provide a two-column file in TSV format: each line contains <gene_id><TAB><plant_family>
------------gene2family.imap------------------------------------------------------------------------------------------------------
AMTR_s00796p00010580 Amborellaceae
ATCG00500.1 Brassicaceae
Glyma.07G273800.2 Fabaceae
Provide a two-column file in TSV format: each line contains <gene_id><TAB><plant_order>
------------gene2order.imap-------------------------------------------------------------------------------------------------------
AMTR_s00796p00010580 Amborellales
ATCG00500.1 Brassicales
Glyma.07G273800.2 Fabales
Provide a two-column file in TSV format: each line contains <gene_id><TAB><higher-level_taxa>
------------gene2taxa.imap--------------------------------------------------------------------------------------------------------
AMTR_s00796p00010580 Angiosperm
ATCG00500.1 Malvids
Glyma.07G273800.2 Fabids
Provide a two-column file in TSV format: each line contains <gene_id><TAB><functional_clade>
------------gene2clade.imap-------------------------------------------------------------------------------------------------------
AMTR_s00796p00010580 Nitrogen-fixing
ATCG00500.1 Nitrogen-fixing
Glyma.07G273800.2 non-Nitrogen-fixing
Provide a two-column file in TSV format: each line contains <gene_id><TAB><expression_values>
------------gene2expression.imap--------------------------------------------------------------------------------------------------
AMTR_s00796p00010580 5.0
ATCG00500.1 12.0
Glyma.07G273800.2 0.0
#Note: You can add any number of imap files. They will sequentially provide annotations to the right of the gene tips according to the order of input.
PhyloTracer Results Files
Most modules generate task-specific outputs in either the current working directory or module-specific subdirectories. Common outputs include:
- Rooted tree outputs:
rooted_trees/ - Long-branch filter outputs:
orthofilter_lb/ - Monophyly filter outputs:
orthofilter_mono/ - GD detection tables:
gd_result_*.txt,gd_type_*.tsv - GD-loss tables:
gd_loss_summary.txt,gd_loss_count_summary.txt,gd_loss.xlsx - Hybridization outputs:
hyde_out.txt,hyde_filtered_out.txt - Ortholog retrieval outputs:
ortho_retriever_summary.txt,ortholog_trees.tsv - Topology summaries:
absolute_*.txt,relative_*.txt, merged PNG summaries
Command line options
This section follows an OrthoFinder-like CLI reference style with compact layout and explicit parameter meanings.
PhyloTree_CollapseExpand
Description:
To transform a phylogenetic tree in Newick format into a 'comb' structure based on a predefined support value threshold. It can also revert this `comb` structure to a fully resolved binary tree, allowing dynamic topology adjustments
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
--support_value Nodes whose support is less than or equal to 'support_value' will be converted and default=50
Optional parameter:
--revert Revert this 'comb' structure to a fully resolved binary tree
Usage:
PhyloTracer PhyloTree_CollapseExpand --input_GF_list GF_ID2path.imap --support_value 50 [--revert]
PhyloSupport_Scaler
Description:
To recalibrate support value from bootstrap or posterior probability in a phylogenetic tree, scaling them between [0,1] and [1,100] ranges for computational compatibility, and vice versa to meet various analytical needs
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
--scale_to Input '1' to scale support values from 1-100 to 0-1, or '100' to scale from 0-1 to 1-100
Usage:
PhyloTracer PhyloSupport_Scaler --input_GF_list GF_ID2path.imap --scale_to 1
BranchLength_NumericConverter
Description:
To convert branch length values of a phylogenetic tree from string to numerical format
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
Optional parameter:
--decimal_place Return the branch length values to 10 decimal places and default = 10
Usage:
PhyloTracer BranchLength_NumericConverter --input_GF_list GF_ID2path.imap [--decimal_place 10]
Phylo_Rooter
Description:
Enables an accurate method for gene tree rooting and enhancing the downstream evolutionary genomic analysis
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
--input_imap File with classification information of species corresponding to genes
--input_gene_length File with information corresponding to gene lengths
--input_sps_tree A species tree file with Newick format
Usage:
PhyloTracer Phylo_Rooter --input_GF_list GF_ID2path.imap --input_imap gene2sps.imap --input_gene_length gene2length.imap --input_sps_tree sptree.nwk
OrthoFilter_LB
Description: Prunes phylogenomic noise from both single-copy and multi-copy gene family trees by removing tips with abnormally long branches. This module helps eliminate potential artifacts such as Long Branch Attraction (LBA).
Required parameters (CLI): --input_GF_list, --input_imap, --absolute_branch_length, --relative_branch_length
1. Root Relative Branch Ratio (RRBR)
Concept: Measures the deviation of a specific gene's branch length relative to the global average of the entire gene family tree.
- Purpose: Detects outlier sequences evolving significantly faster or slower than the family norm.
- Formula:
$$ \text{RRBR} = \frac{\text{Branch Length} - \text{Average Branch Length}}{\text{Average Branch Length}} $$
2. Sister Relative Branch Ratio (SRBR)
Concept: Measures the evolutionary distance of a gene relative to its nearest neighbor (sister branch).
- Purpose: Identifies local branch length asymmetry. A gene significantly longer than its "sister" is a high-risk candidate for phylogenetic noise, even in fast-evolving families.
- Formula:
$$ \text{SRBR} = \frac{\text{Branch Length} - \text{Sister Branch Length}}{\text{Sister Branch Length}} $$
Where:
- Branch Length: The branch length of the specified gene.
- Average Branch Length: The arithmetic mean of all branch lengths in the gene family tree.
- Sister Branch Length: The branch length of the nearest "neighbor" or "sister" gene.
Description:
To prune phylogenomic noises from both single-copy and multi-copy gene family trees by removing the tips with long branch length
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
--input_imap File with classification information of species corresponding to genes
--absolute_branch_length Absolute branch length multiplier and default = 5
--relative_branch_length Relative branch length multiplier and default = 2.5
Optional parameter:
--visual Visualize the results of gene family trees before and after removing long branches
Usage:
PhyloTracer OrthoFilter_LB --input_GF_list GF_ID2path.imap --input_imap gene2sps.imap --absolute_branch_length 5 --relative_branch_length 2.5 [--visual]
OrthoFilter_Mono
Required parameters (CLI): --input_GF_list, --input_taxa, --input_imap, --input_sps_tree
Scoring logic (current implementation concept):
1. Dominant Lineage Purity
Concept: Measures how strongly a lineage is dominated by target taxa labels.
- Formula:
$$ \text{Purity}=\frac{\text{N}{\text{target}}}{\text{N}{\text{dominant tips}}} $$
Where:
- $N_{\text{target}}$ = number of target taxa tips
- $N_{\text{dominant tips}}$ = total tips in dominant lineage
2. Phylogenetic Distance Score
Concept: Alien lineages mapped deeper and farther from the target lineage in the species tree are more likely to be removed.
- Formula:
$$ \text{PhyloDist}=\text{Depth}(\text{MRCA}(\text{alien}))-\text{Depth}(\text{MRCA}(\text{target}\cup\text{alien})) $$
3. Alien Coverage Score
Concept: Alien lineages occupying fewer tips within a dominant lineage are more likely to be noise.
- Formula:
$$ \text{AlienCov}=\frac{\text{N}{\text{alien}}}{\text{N}{\text{dominant tips}}} $$
4. Alien Depth-Variation Score
Concept: Alien lineages inserted more deeply relative to the dominant lineage root are more likely to be removed.
- Formula:
$$ \text{AlienDepth}=\text{Depth}(\text{alien})-\text{Depth}\left(\text{MRCA}(\text{dom})\right) $$
5. Combined Ranking Score
Concept: Candidates are ranked by a multiplicative score using normalized components.
- Formula:
$$ \text{Combined}=\text{Norm}(\text{PhyloDist})\times\text{Norm}(\text{AlienDepth})\times\left(-\log_{10}\left(\text{AlienCov} + 10^{-4}\right)\right) $$
6. Removal Stopping Rules
Concept: The iterative pruning stops as soon as the dominant-lineage purity reaches purity_cutoff, or when the removal cap max_remove is reached (including cases where removing the next candidate would exceed the cap).
- Formula:
$$ \text{max remove}=\max\left(\text{max remove fraction}\times\text{N}_{\text{dominant tips}},1\right) $$
Description:
To prune phylogenomic noise from both single-copy and multi-copy gene family trees. It removes outliers and paralogs based on predefined taxonomic constraints (e.g., ensuring members from taxa such as families or orders form monophyletic groups). Caution: Groupings should be selected with care, prioritizing well-established relationships unless otherwise required for specific objectives
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
--input_taxa File with taxonomic information for species
--input_imap File with classification information of species corresponding to genes
--input_sps_tree Species tree file in Newick format
Optional parameter:
--purity_cutoff Target purity for dominant lineage, default = 0.95
--max_remove_fraction Maximum fraction of removable tips in dominant lineage, default = 0.5
--visual Visualize the results of gene family trees before and after removing outliers and paralogs
Usage:
PhyloTracer OrthoFilter_Mono --input_GF_list GF_ID2path.imap --input_taxa gene2clade.imap --input_imap gene2sps.imap --input_sps_tree sptree.nwk [--purity_cutoff 0.95 --max_remove_fraction 0.5 --visual]
TreeTopology_Summarizer
Description:
To enumerate and visualize the frequency of both absolute and relative topologies for single-copy gene trees or interested predefined clades
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
--input_imap File with classification information of species corresponding to genes
Usage:
PhyloTracer TreeTopology_Summarizer --input_GF_list GF_ID2path.imap --input_imap gene2sps.imap
Tree_Visualizer
Description:
To mark tips of gene trees with provided tags, identify GD nodes, and integrate gene duplication results onto the species tree
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
--input_imap File with classification information of species corresponding to genes
Optional parameter:
--keep_branch 1 or 0 indicates whether branch length information is preserved
--tree_style Tree style: 'r' (rectangular) or 'c' (circular), default = r
--gene_categories File with taxonomic information for species
--gene_family File with family classification information corresponding to genes
--input_sps_tree Species tree file in Newick format (required with --gene_family)
--gene_expression Gene expression matrix file (.csv/.xls/.xlsx), genes as row index
--visual_gd Visualize GD nodes of gene family trees
Usage:
PhyloTracer Tree_Visualizer --input_GF_list GF_ID2path.imap --input_imap gene2sps.imap [--gene_categories gene2order.imap gene2taxa.imap gene2clade.imap --keep_branch {1,0} --tree_style {r,c} --gene_family gene2family.imap --input_sps_tree sptree.nwk --gene_expression gene2expression.csv --visual_gd]
GD_Detector
Description:
To identify gene duplication events by reconciling gene trees with a species tree
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
--input_imap File with classification information of species corresponding to genes
--gd_support Minimum support of GD node candidates (range: 0-100)
--subclade_support Minimum support for child subclades of GD nodes (range: 0-100)
--dup_species_proportion Minimum overlap ratio of duplicated species between two GD child clades (range: 0-1)
--dup_species_num Minimum number of overlapping duplicated species under a GD node (>=1)
--input_sps_tree A species tree file with Newick format
--deepvar Maximum tolerated depth variance in species-tree mapping (>=0)
Usage:
PhyloTracer GD_Detector --input_GF_list GF_ID2path.imap --input_imap gene2sps.imap --gd_support 50 --subclade_support 50 --dup_species_proportion 0 --dup_species_num 2 --input_sps_tree sptree.nwk --deepvar 1
GD_Visualizer
Description:
To visualize gene duplication detection results and integrate findings onto the species tree
Required parameter:
--input_sps_tree A numbered species tree file with Newick format
--gd_result Result file of GD_Detector
--input_imap File with classification information of species corresponding to genes
Usage:
PhyloTracer GD_Visualizer --input_sps_tree sptree.nwk --gd_result gd_result.txt --input_imap gene2sps.imap
GD_Loss_Tracker
Description:
To analyze and summarize gene duplication loss events across nodes and tips in the species tree
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
--input_sps_tree A species tree file with Newick format
--input_imap File with classification information of species corresponding to genes
Usage:
PhyloTracer GD_Loss_Tracker --input_GF_list GF_ID2path.imap --input_sps_tree sptree.nwk --input_imap gene2sps.imap
GD_Loss_Visualizer
Description:
To visualize the summary of gene duplication loss events on the context of species tree
Required parameter:
--gd_loss_result Result file of gd loss count summary of GD_Loss_Tracker
--input_sps_tree A numbered species tree file with Newick format
Usage:
PhyloTracer GD_Loss_Visualizer --input_sps_tree numbered_species_tree.nwk --gd_loss_result gd_loss_count_summary.txt
Ortho_Retriever
Description:
To infer single-copy putative orthologs by splitting paralogs from large-scale gene family trees for multiple species
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
--input_imap File with classification information of species corresponding to genes
--input_gene_length File with information corresponding to gene lengths
Usage:
PhyloTracer Ortho_Retriever --input_GF_list GF_ID2path.imap --input_imap gene2sps.imap --input_gene_length gene2length.imap
Hybrid_Tracer
Description:
To detect hybridization signals from GD-derived gene sets and run HyDe testing on species-tree mapped GD nodes
Required parameter:
--input_GF_list File containing paths to gene tree files, one per line
--input_Seq_GF_list File containing paths to sequence alignment files corresponding to the gene trees
--input_imap File with classification information of species corresponding to genes
--input_sps_tree A species tree file with Newick format
Optional parameter:
--mrca_node Restrict analysis to the MRCA of SP1 and SP2. Format: SpeciesA,SpeciesB (comma-separated, no space)
--split_groups Split GD events into target groups for HyDe processing, default = 1
Usage:
PhyloTracer Hybrid_Tracer --input_GF_list GF_ID2path.imap --input_Seq_GF_list Seq_GF_ID2path.imap --input_sps_tree sptree.nwk --input_imap gene2sps.imap [--mrca_node SpeciesA,SpeciesB --split_groups 2]
Hybrid_Visualizer
Description:
To visualize hybridization signals, highlighting support from gene tree topologies and D-statistic signals
Required parameter:
--hyde_out File containing the result of hyde of Hybrid_Tracer
--input_sps_tree A species tree file with Newick format
Optional parameter:
--node Node model, stack up all the heatmaps for each monophyletic clade respectively, only the squares in all heatmaps were light, the square after superimposition will be light
Usage:
PhyloTracer Hybrid_Visualizer --hyde_out hyde.out --input_sps_tree sptree.nwk [--node]
HaploFinder
Description:
To detect haplotype-level GD signals and support split-mode FASTA partitioning
Required parameter:
--mode Run mode: haplofinder or split
Mode = haplofinder required:
--input_GF_list File containing paths to gene tree files, one per line
--input_imap File with gene-to-species mapping
--input_sps_tree Species tree file in Newick format
--species_a Name of species A
--species_b Name of species B
--species_a_gff GFF file of species A
--species_b_gff GFF file of species B
--species_a_lens Lens file of species A
--species_b_lens Lens file of species B
Optional in haplofinder mode:
--gd_support Minimum GD support threshold (range: 0-100, default = 50)
--pair_support Minimum support threshold for gene pairs (range: 0-100, default = 50)
--visual_chr_a Chromosome subset file for species A visualization
--visual_chr_b Chromosome subset file for species B visualization
--size Dot size for plot output (default = 0.0005)
Mode = split required:
--input_GF_list File containing paths to gene tree files, one per line
--input_imap File with gene-to-species mapping
--input_fasta Input FASTA file
--cluster_file Cluster metadata file (currently required by CLI checks)
--hyb_sps Hybrid species name
--parental_sps Parental species names (space-separated in quotes)
--species_b_gff GFF file used for subgenome assignment validation
Usage:
PhyloTracer HaploFinder --mode haplofinder --input_GF_list GF.list --input_imap gene2sps.imap --input_sps_tree sptree.nwk --species_a A --species_b B --species_a_gff A.gff --species_b_gff B.gff --species_a_lens A.lens --species_b_lens B.lens --gd_support 50 --pair_support 50 [--visual_chr_a chr_a.txt --visual_chr_b chr_b.txt --size 0.0001]
PhyloTracer HaploFinder --mode split --input_GF_list GF.list --input_imap gene2sps.imap --input_fasta proteins.fa --cluster_file cluster.txt --hyb_sps Hybrid --parental_sps "P1 P2" --species_b_gff B.gff
Bug Reports
You can report bugs or request features through our GitHub Issues page. If you have any questions, suggestions, or issues, please do not hesitate to contact us.
Contributing
If you're interested in contributing code or reporting bugs, we welcome your ideas and contributions to improve PhyloTracer! Please check out Contribution Guidelines.
Version History
Check the Changelog for details on different versions and updates.
License
PhyloTracer is licensed under the MIT LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phylotracer-1.0.2.tar.gz.
File metadata
- Download URL: phylotracer-1.0.2.tar.gz
- Upload date:
- Size: 127.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
596ea3741cbdae1de5a42276eaf3e529f884528deb4dc50f134cc44bb5e01c59
|
|
| MD5 |
059d6bf33bed7a90062825d49590f7dc
|
|
| BLAKE2b-256 |
ee9702ec79b85b517c6fcbfdfad4fcf3308e3294de76fce33dc3eca9342fc44c
|
File details
Details for the file phylotracer-1.0.2-py3-none-any.whl.
File metadata
- Download URL: phylotracer-1.0.2-py3-none-any.whl
- Upload date:
- Size: 111.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6dcc4e36790572ba5ab3c316ce6c0148b301718482c122108f79722f7cc4d4f7
|
|
| MD5 |
a3cddb42c0e3c455a23ed80d54d3eb02
|
|
| BLAKE2b-256 |
819e3481787312a9a3d3af47572361f94693dfada84a9d91f8e6c4a7f5493387
|