Skip to main content

Embeds phylogenetic tree distances and produce representations

Project description

Phylogeny Embedding &
Approximate Representation

Goldman Group - European Bioinformatics Institute

PEAR can:

  1. Compute the distance matrix given a set of phylogenetic trees;
  2. Embed and represent the distance matrix in 2D or 3D.

See also the autogenerated documentation and PyPI .

PEAR usage

Pear is both a python software and library. It can be installed with python -m pip install pear_ebi or downloaded from Github. Pear is currently compatible only with Linux.

PEAR as a python library

Once installed, Pear can be used to upload Newick trees in python and represent them in embedded spaces. We recommend to use it on either jupyter notebook or lab, as these tools allow for more interaction with the graphs. On these platforms, the user is allowed to interact with widgets that allows to modify several parameteres of the plots. For specific uses and applications, see the examples.

PEAR as a program

Run pear_ebi --help to see the complete list of arguments and flags.

Simple usage

pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF

this script calculates the unweighted Robinson Foulds distances between the trees in the file "beast_run1.trees", which contains 1001 phylogenetic trees.

the flag -m indicates the method used to compute the dissimilarity between phylogeneic trees. In this case, HashRF has been used.

To embed these distances in a lower-dimensional space, we can use PCoA (MDS) or tSNE:

pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF --pcoa 2

we therefore embedded the distance matrix in 2 dimensions. Using the flag -quality one can assess the correlation between the distances in the N-dimensional space and in the embedding.

pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF --pcoa 2 --plot

The flag -plot indicates that PEAR has to plot the embeddings and show them, respectively. If an embedding method is specified the plots are produced anyway. Plotting doesn't require any indication on the number of dimensions as the embeddings are represented in 2 dimensions if the distances are embedded in 2 dimensions, while it plots on 2 and 3 dimensions in any other case.

One can specify any number of files containing trees. Moreover, it is possible to specify a single directory using --dir, and possibly a pattern using --pattern, in order to select multiple files.

Tree Set

It's possible to compute the distance matrix and re-use it in subsequent runs of PEAR by specifying the distance matrix file with the flag -d. Additionally, it's possible to define the name of the output file (-o).

If any additional metadata is available, this may be specified by indicating a .csv file containing a dataframe of compatible shape.

Config file

A standard config toml file can be used for specific emebddings of multiple sets of trees. Instances of toml files are reported in the examples folder.

Using the config file allows one to use all the features of PEAR, including additional embedding methods and plot designs. The config file can also be used to specify lists of indexes of interesting trees in the sets, in order to highlight them in the final plots.

Interactive mode

pear_ebi -i : this script launches the program in the interactive mode. Once the program starts, it is going to guide you through its usage thanks to an intuitive interface.

Turorials and Examples

Follow this link for a complete set of basic and avanced guides and tutorials to use PEAR on the command line and as a python library.


Licensing

This project is released under the terms of the MIT Open Source License. View LICENSE.txt for more information.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pear_ebi-1.0.1.6.tar.gz (23.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pear_ebi-1.0.1.6-py3-none-any.whl (24.9 MB view details)

Uploaded Python 3

File details

Details for the file pear_ebi-1.0.1.6.tar.gz.

File metadata

  • Download URL: pear_ebi-1.0.1.6.tar.gz
  • Upload date:
  • Size: 23.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for pear_ebi-1.0.1.6.tar.gz
Algorithm Hash digest
SHA256 3aefc830a3bc843b0c38e9f720fdf5a1f87b542faab408e7fe4bfb70041e70a7
MD5 6b83b3ab373d3070669ff630a1cf94cc
BLAKE2b-256 e00615e34f55b15d9cee0bf92231de4d85d7cde0a11637f3069ef7360a91b9cc

See more details on using hashes here.

File details

Details for the file pear_ebi-1.0.1.6-py3-none-any.whl.

File metadata

  • Download URL: pear_ebi-1.0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 24.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for pear_ebi-1.0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 fbd672d6e09472d7afb8ddbbcea7c00e20c5b3db4f6f50fdd409396d3634bb36
MD5 1cbd7271f3649c497616551fd9d4f767
BLAKE2b-256 786fdc675277c50bd5fdc6b14f6b411086d83779710aafd42f72ff51a2e5e8db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page