Skip to main content

Embeds phylogenetic tree distances and produce representations

Project description

Phylogeny Embedding &
Approximate Representation

Goldman Group - European Bioinformatics Institute

PEAR can:

  1. Compute the distance matrix given a set of phylogenetic trees;
  2. Embed and represent the distance matrix in 2D or 3D.

See also the autogenerated documentation and PyPI .

PEAR usage

Pear is both a python software and library. It can be installed with python -m pip install pear_ebi or downloaded from Github. Pear is currently compatible with Linux and Mac OSs.

PEAR as a python library

Once installed, Pear can be used to upload newick trees in python and represent them in embedded spaces. We recommend to use it on either jupyter notebook or lab, as these tools allow for more interaction with the graphs. On these platforms, the user is allowed to interact with widgets that allows to modify several parameteres of the plots. For specific uses and applications, see the examples.

PEAR as a program

Run pear_ebi --help to see the complete list of arguments and flags.

Simple usage

pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF

this script calculates the unweighted Robison Foulds distances between the trees in the file "beast_run1.trees", which contains 1001 phylogenetic trees.

the flag -m indicates the method used to compute the dissimilarity between phylogeneic trees. In this case, HasRF has been used.

To embed these distances in a lower-dimensional space, we can use PCoA (MDS) or tSNE:

pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF --pca 2

we therefore embedded the distance matrix in 2 dimensions. Using the flag -quality one can assess the correlation between the distances in the N-dimensional space and in the embedding.

pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF --pca 2 --plot

The flag -plot indicates that PEAR has to plot the embeddings and show them, respectively. If an embedding method is specified the plots are produced anyway. Plotting doesn't require any indication on the number of dimensions as the embeddings are represented in 2 dimensions if the distances are embedded in 2 dimensions, while it plots on 2 and 3 dimensions in any other case.

One can specify any number of files containing trees. Moreover, it is possible to specify a single directory using -dir, and possibly a pattern using -pattern, in order to select multiple files.

Tree Set

It's possible to compute the distance matrix and re-use it in subsequent runs of PEAR by specifying the distance matrix file with the flag -d. Additionally, it's possible to define the name of the output file (-o).

If any additional metadata is available, this may be specified by indicating a .csv file containing a dataframe of compatible shape.

Config file

A standard config toml file can be used for specific emebddings of multiple sets of trees. Instances of toml files are reported in the examples folder.

Using the config file allows one to use all the features of PEAR, including additional embedding methods and plot designs. The config file can also be used to specify lists of indexes of interesting trees in the sets, in order to highlight them in the final plots.

Interactive mode

pear_ebi -i : this script launches the program in the interactive mode. Once the program starts, it is going to guide you through its usage thanks to an intuitive interface.

Turorials and Examples

Follow this link for a complete set of basic and avanced guides and tutorials to use PEAR on the command line and as a python library.


Licensing

This project is released under the terms of the MIT Open Source License. View LICENSE.txt for more information.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pear_ebi-0.1.86.tar.gz (11.2 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page