Skip to main content

Run and compare algorithms for phylogenetic reconciliation and super-reconciliation

Project description

superrec2

superrec2 is a software package enabling researchers to run and compare algorithms for phylogenetic reconciliation and super-reconciliation.

Installation

superrec2 can be installed through pip. Python ⩾3.11 is required.

$ pip install superrec2
Installing from Git (for development)

You first need to install Hatch, which is used by superrec2 to manage virtual environments and to build and publish packages.

Then clone the repository and use Hatch to install dependencies and start a development shell.

$ git clone https://github.com/UdeM-LBIT/superrec2
$ cd superrec2
$ hatch shell

The following commands are useful for development. You should make sure to pass unit tests and to reformat and lint the code before committing to the main branch.

Command Task
hatch run dev:test Run all unit tests
hatch run dev:lint Check the code using Ruff
hatch run dev:format Reformat the code using Black
hatch build Build distributable packages
hatch publish Publish distributable packages to PyPI

Usage

Preparing the input

To submit a reconciliation problem to one of the algorithms of this package, the first step is to prepare an input file containing the desired set of species, genes (or syntenies), and their phylogenetic trees. Input files are JSON objects containing the following keys:

  • object_tree: Newick string specifying the gene or synteny tree (if ancestral nodes are unnamed, they will be automatically named O# with indices increasing in pre-order),
  • species_tree: Newick string specifying the species tree (if ancestral nodes are unnamed, they will be automatically named S# with indices increasing in pre-order),
  • leaf_object_species: dictionary associating each leaf of the gene (or synteny) tree to its corresponding leaf species in the species tree.
  • leaf_syntenies (optional, only for super-reconciliations): dictionary associating each leaf of the synteny tree to its corresponding synteny, specified as an array of genes.

Below is an example input for some fictional species and syntenies:

{
    "object_tree": "((x_1,x_2),y_1);",
    "species_tree": "(X,Y);",
    "leaf_object_species": {
        "x_1": "X", "x_2": "X", "y_1": "Y"
    },
    "leaf_syntenies": {
        "x_1": ["g1", "g2", "g3"],
        "x_2": ["g1", "g3", "g4"],
        "y_1": ["g1", "g2", "g3", "g4"]
    }
}

A complete input example including some Class-1 CRISPR-Cas systems is also available. (This is the input used for the RECOMB-CG 2022 publication.)

Running reconciliation algorithms

To run a reconciliation algorithm on a given input file, use the superrec2 reconcile command. The basic usage of the command is as follows:

$ superrec2 reconcile --input data/example.in.json --output example.out.json superdtl
Multifurcation resolutions: 100%|##########################################| 1/1 [00:00<00:00, 285.91it/s]
Minimum cost: 2

The --input flag specifies the path to the input file, and the --output flag specifies where to write the results. The last argument is used to select the reconciliation algorithm, in this example superdtl (run superrec2 reconcile --help to see a list of available algorithms). The program prints out the minimum cost of a solution and writes one of the solutions to the output file, which now contains the following object (edited for readability):

{
    "input": {
        "object_tree": "((x_1,x_2)O1,y_1)O0;",
        "species_tree": "(X,Y)S0;",
        "leaf_object_species": { (repeated from above) },
        "costs": {
            "SPECIATION": 0, "DUPLICATION": 1, "HORIZONTAL_TRANSFER": 1,
            "FULL_LOSS": 1, "SEGMENTAL_LOSS": 1
        },
        "leaf_syntenies": { (repeated from above) }
    },
    "object_species": {
        "O0": "S0", "O1": "X", "x_1": "X", "x_2": "X", "y_1": "Y"
    },
    "syntenies": {
        "O0": ["g1", "g2", "g3", "g4"],
        "O1": ["g1", "g2", "g3", "g4"],
        "x_1": ["g1", "g2", "g3"],
        "x_2": ["g1", "g3", "g4"],
        "y_1": ["g1", "g2", "g3", "g4"]
    },
    "ordered": false
}

Of particular interest are the object_species key, which contains the computed reconciliation (mapping of synteny tree nodes to species tree nodes), and the syntenies key, which contains the labeling of synteny tree nodes with syntenic content.

Additional options are available to generate all possible solutions and set the individual event costs (for compatible algorithms); please run superrec2 reconcile --help for details.

Generating reconciliation diagrams

From a solution

The superrec2 draw command can be used to visualize and inspect solutions generated by reconciliation algorithms. The basic usage of the command is as follows:

$ superrec2 draw --input example.out.json --output example.out.pdf

This generates the following diagram, representing the reconciliation result:

Reconciliation diagram generated from the output above

From manual input

The superrec2 draw program can also be used to plot any reconciliation, not just one generated by a reconciliation algorithm. To that end, you need to create a JSON-formatted description of the reconciliation you are interested in plotting, for example:

{
    "input": {
        "object_tree": "((x_1,x_2)O1,y_1)O0;",
        "species_tree": "(X,Y)S0;"
    },
    "object_species": {
        "O0": "S0", "O1": "X", "x_1": "X", "x_2": "X", "y_1": "Y"
    },
    "syntenies": {
        "O0": ["g1", "g2", "g3", "g4"],
        "O1": ["g1", "g2", "g3", "g4"],
        "x_1": ["g1", "g2", "g3"],
        "x_2": ["g1", "g3", "g4"],
        "y_1": ["g1", "g2", "g3", "g4"]
    }
}

This will generate the same diagram as in the previous section. Notice that some parts are omitted compared to the previous JSON document: this is because this one is only for drawing and does not result from a reconciliation computation.

Adding color

If you need to distinguish parts of the object tree in the generated diagram, you can add color to a subtree by specifying the color attribute on its root node.

 {
     "input": {
-       "object_tree": "((x_1,x_2)O1,y_1)O0;",
+       "object_tree": "((x_1,x_2)O1[&&NHX:color=0000FF],y_1)O0;",
        "species_tree": "(X,Y)S0;"
     },
     "object_species": {
         "O0": "S0", "O1": "X", "x_1": "X", "x_2": "X", "y_1": "Y"
     },
     "syntenies": {
         "O0": ["g1", "g2", "g3", "g4"],
         "O1": ["g1", "g2", "g3", "g4"],
         "x_1": ["g1", "g2", "g3"],
         "x_2": ["g1", "g3", "g4"],
         "y_1": ["g1", "g2", "g3", "g4"]
     }
 }

The input above will add a blue coloring to the subtree that undergoes a duplication event.

Reconciliation diagram generated from the output above

References

  • M. Goodman, J. Czelusniak, G. W. Moore, A. E. Romero-Herrera, and G. Matsuda, “Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences,” Systematic Biology, vol. 28, Art. no. 2, 1979-06, doi: 10.1093/sysbio/28.2.132.
  • A. Tofigh, M. Hallett, and J. Lagergren, “Simultaneous identification of duplications and lateral gene transfers,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, Art. no. 2, 2011-03, doi: 10.1109/tcbb.2010.14.
  • M. S. Bansal, E. J. Alm, and M. Kellis, “Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss,” Bioinformatics, vol. 28, Art. no. 12, 2012-06, doi: 10.1093/bioinformatics/bts225.
  • M. Delabre, N. El-Mabrouk, K. T. Huber, M. Lafond, V. Moulton, E. Noutahi, and M. S. Castellanos, “Evolution through segmental duplications and losses: a super-reconciliation approach,” Algorithms for Molecular Biology, vol. 15, Art. no. 12, 2020-05, doi: 10.1186/s13015-020-00171-4.
  • Y. Anselmetti, M. Delabre, and N. El-Mabrouk, “Reconciliation with Segmental Duplication, Transfer, Loss and Gain,” RECOMB-CG 2022, Lecture Notes in Computer Science, vol. 13234, 2022-06, doi: 10.1007/978-3-031-06220-9_8.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superrec2-0.1.0.tar.gz (53.3 kB view details)

Uploaded Source

Built Distribution

superrec2-0.1.0-py3-none-any.whl (67.2 kB view details)

Uploaded Python 3

File details

Details for the file superrec2-0.1.0.tar.gz.

File metadata

  • Download URL: superrec2-0.1.0.tar.gz
  • Upload date:
  • Size: 53.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.24.1

File hashes

Hashes for superrec2-0.1.0.tar.gz
Algorithm Hash digest
SHA256 58c33331b212b50da5e315042cff1039b594db417d0283323ae8d3affd2544ee
MD5 e77920ea03494daffa960e3a645ffe37
BLAKE2b-256 cf60ddf46d09a4c3b2d7f051fabd33557b64ed91fcab52a21282274425b32c9c

See more details on using hashes here.

File details

Details for the file superrec2-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: superrec2-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 67.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.24.1

File hashes

Hashes for superrec2-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b3671fbdb12ae313048d448036628d9d12cd8cc76ae936d8a7954dd6e09e28f6
MD5 9d31c43391122cc93b533e5241d33873
BLAKE2b-256 61d4f158be1afbfe292413b6d9117591a991815ea3393f9ec0d00e15366ac085

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page