Skip to main content

tree-based orthology inference

Project description

PhyloPyPruner

PhyloPyPruner is a tree-based orthology inference program for refining orthology inference made by a graph-based approach. In addition to implementing previously published paralogy pruning algorithms seen in PhyloTreePruner, UPhO, Agalma and Phylogenomic Dataset Reconstruction, this software provides tools for identifying and getting rid of operational taxonomical units (OTUs) that display contamination-like issues.

PhyloPyPruner is currently under active development and I would appreciate it if you try this software on your own data and leave feedback.

See the Wiki for more details.

Feature list

  • Remove short sequences
  • Remove relatively long branches
  • Collapse weakly supported nodes into polytomies
  • Prune paralogs using 1 out of 5 methods
  • Measure paralogy frequency
  • Remove OTUs with relatively high paralogy frequency
  • Mask monophylies by keepipng the longest sequence or the sequence with the shortest pairwise distance
  • Exclude individual OTUs entirely
  • Root trees using outgroup or midpoint rooting
  • Get rid of OTUs with sequences that display relatively high pairwise distance
  • Measure impact of individual OTUs using taxon jackknifing

Installation

This software runs under both Python 3 and 2.7. There are no external dependencies, but the plotting library Matplotlib may be installed for generating paralog frequency plots.

You can install PhyloPyPruner using pip.

pip install --user phylopypruner

Usage

Once installed, execute this software like so:

python -m phylopypruner

To get a list of options, either run the software without any arguments or, by using the -h or --help flag.

Either provide a single multiple sequence alignment (MSA) and a Newick tree by using the --msa and --tree flags:

python -m phylopypruner --msa 16s.fas --tree 16s.tre

or, provide a path to an input directory, containing multiple trees and alignments, by typing --dir path.

FASTA descriptions and Newick names must match and has to be in one of the following formats: OTU|ID or OTU@ID, where OTU is the operational taxonomical unit (usually the species) and ID is a unique annotation or sequence identifier. For example: >Meiomenia_swedmarki|Contig00001_Hsp90.

Sequence descriptions and tree names are not allowed to deviate from each other. Sequence data needs to be valid IUPAC nucleotide or amino acid sequences.

For inputting multiple files, you provide a path to the directory in which these files reside.

python -m phylopypruner --dir <path>

The program will automatically look for trees and alignments with the same name and run for each of these pair.

Output files

The following files are generated after running this program.

  • <timestamp>_<orthologs>/... – output alignments
  • <timestamp>_ppp_summary.log – summary statistics for all alignments
  • <timestamp>_ppp_run.log – detailed report of each performed action
  • <timestamp>_ppp_ortho_stats.csv – statistics for output alignments
  • <timestamp>_ppp_paralog_freq.csv – paralogy frequency data
  • <timestamp>_ppp_paralog_freq.png – paralogy frequency plot*

If no output directory has been specified by the --output flag, then output files will be located within the same directory as the input alignment files.

* – only produced if Matplotlib is installed

© Kocot Lab 2018

Project details


Release history Release notifications | RSS feed

This version

0.1.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phylopypruner-0.1.5.tar.gz (28.3 kB view details)

Uploaded Source

Built Distribution

phylopypruner-0.1.5-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file phylopypruner-0.1.5.tar.gz.

File metadata

  • Download URL: phylopypruner-0.1.5.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for phylopypruner-0.1.5.tar.gz
Algorithm Hash digest
SHA256 7984d1a7ef226e11b14d5ac71ed4e1923900424b82e6926caef2fa6822fe55e3
MD5 c0df25c5b59d562a06c4098e19a1929b
BLAKE2b-256 b42ea312f7306c30ac82fd1e0ed95ca720b09f9414547956e1f42102df1cc12f

See more details on using hashes here.

File details

Details for the file phylopypruner-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: phylopypruner-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for phylopypruner-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8082ca15c692d7e72d7ce26d7f02ef8aa89ae7c08adf5716c5c3c6e9090a5ca9
MD5 e547e846330780ff833b6298e9297374
BLAKE2b-256 41825dc99b7694f061465244135d78cc1366e08ea92aec48c3123c50dc8aba32

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page