tree-based orthology inference
Project description
PhyloPyPruner
PhyloPyPruner is a phylogenetic tree-based orthology inference program for refining orthology inference made by graph-based (or phenetic) approaches.
PhyloPyPruner is under active development. I would appreciate it if you try this software on your own data and leave feedback.
To get a list of options, either run the software without any parameters or by
using the -h
or --help
flag. For more details, please see the
Wiki.
Features
- Remove short sequences
- Remove sequences with long branches
- Collapse weakly supported nodes into polytomies
- Five different paralogy pruning algorithms
- Measure and remove OTUs with frequent paralogs
- Identify problematic OTUs using taxon jackknifing
- Exclude certain OTUs
- Specify taxonomical groups and see how often they form a phylogenetic group
- Mask monophylies by choosing the longest sequence or using pairwise distance
Input data
Either provide a single multiple sequence alignment (MSA) and a Newick tree by
using the --msa
and --tree
flags:
./phylopypruner --msa 16s.fas --tree 16s.tre
or, provide a path
to an input directory, containing multiple trees and
alignments, by typing --dir path
.
FASTA descriptions and Newick names must match and has to be in one of the
following formats: OTU|ID
or OTU@ID
, where OTU
is the operational
taxonomical unit (usually the species) and ID
is a unique annotation or
sequence identifier. For example: >Meiomenia_swedmarki|Contig00001_Hsp90
.
Sequence descriptions and tree names are not allowed to deviate from each other. Sequence data needs to be valid IUPAC nucleotide or amino acid sequences.
For inputting multiple files, you provide a path to the directory in which these files reside.
./phylopypruner --dir <path>
The program will automatically look for trees and alignments with the same name and run for each of these pair.
Output files
The following files are generated after running this program:
<timestamp>_<orthologs>/...
– output alignments<timestamp>_ppp_summary.log
– summary statistics for all alignments<timestamp>_ppp_run.log
– detailed report of each performed action<timestamp>_ppp_ortho_stats.csv
– statistics for output alignments<timestamp>_ppp_paralog_freq.csv
– paralogy frequency data<timestamp>_ppp_paralog_freq.png
– paralogy frequency plot*
* – only produced if Matplotlib is installed
© Kocot Lab 2018
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for phylopypruner-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 156d6e4d5ee9dc0db1862a236977dcfa7f15486fa4845f47aa357c0057a4d752 |
|
MD5 | 40a57f224a35effe94ecfdccffda28f0 |
|
BLAKE2b-256 | c0fb66779fbcd798a26e2c22b01e1227ce71ec964c536fb6c3a353c5763de5ac |