Run and compare algorithms for phylogenetic reconciliation and super-reconciliation
Project description
superrec2
superrec2 is a software package enabling researchers to run and compare algorithms for phylogenetic reconciliation and super-reconciliation.
Installation
superrec2 can be installed through pip
.
Python ⩾3.11 is required.
$ pip install superrec2
Installing from Git (for development)
You first need to install Hatch, which is used by superrec2 to manage virtual environments and to build and publish packages.
Then clone the repository and use Hatch to install dependencies and start a development shell.
$ git clone https://github.com/UdeM-LBIT/superrec2
$ cd superrec2
$ hatch shell
The following commands are useful for development. You should make sure to pass unit tests and to reformat and lint the code before committing to the main branch.
Command | Task |
---|---|
hatch run dev:test |
Run all unit tests |
hatch run dev:lint |
Check the code using Ruff |
hatch run dev:format |
Reformat the code using Black |
hatch build |
Build distributable packages |
hatch publish |
Publish distributable packages to PyPI |
Usage
Preparing the input
To submit a reconciliation problem to one of the algorithms of this package, the first step is to prepare an input file containing the desired set of species, genes (or syntenies), and their phylogenetic trees. Input files are JSON objects containing the following keys:
object_tree
: Newick string specifying the gene or synteny tree (if ancestral nodes are unnamed, they will be automatically namedO#
with indices increasing in pre-order),species_tree
: Newick string specifying the species tree (if ancestral nodes are unnamed, they will be automatically namedS#
with indices increasing in pre-order),leaf_object_species
: dictionary associating each leaf of the gene (or synteny) tree to its corresponding leaf species in the species tree.leaf_syntenies
(optional, only for super-reconciliations): dictionary associating each leaf of the synteny tree to its corresponding synteny, specified as an array of genes.
Below is an example input for some fictional species and syntenies:
{
"object_tree": "((x_1,x_2),y_1);",
"species_tree": "(X,Y);",
"leaf_object_species": {
"x_1": "X", "x_2": "X", "y_1": "Y"
},
"leaf_syntenies": {
"x_1": ["g1", "g2", "g3"],
"x_2": ["g1", "g3", "g4"],
"y_1": ["g1", "g2", "g3", "g4"]
}
}
A complete input example including some Class-1 CRISPR-Cas systems is also available. (This is the input used for the RECOMB-CG 2022 publication.)
Running reconciliation algorithms
To run a reconciliation algorithm on a given input file, use the superrec2 reconcile
command.
The basic usage of the command is as follows:
$ superrec2 reconcile --input data/example.in.json --output example.out.json superdtl
Multifurcation resolutions: 100%|##########################################| 1/1 [00:00<00:00, 285.91it/s]
Minimum cost: 2
The --input
flag specifies the path to the input file, and the --output
flag specifies where to write the results.
The last argument is used to select the reconciliation algorithm, in this example superdtl
(run superrec2 reconcile --help
to see a list of available algorithms).
The program prints out the minimum cost of a solution and writes one of the solutions to the output file, which now contains the following object (edited for readability):
{
"input": {
"object_tree": "((x_1,x_2)O1,y_1)O0;",
"species_tree": "(X,Y)S0;",
"leaf_object_species": { (repeated from above) },
"costs": {
"SPECIATION": 0, "DUPLICATION": 1, "HORIZONTAL_TRANSFER": 1,
"FULL_LOSS": 1, "SEGMENTAL_LOSS": 1
},
"leaf_syntenies": { (repeated from above) }
},
"object_species": {
"O0": "S0", "O1": "X", "x_1": "X", "x_2": "X", "y_1": "Y"
},
"syntenies": {
"O0": ["g1", "g2", "g3", "g4"],
"O1": ["g1", "g2", "g3", "g4"],
"x_1": ["g1", "g2", "g3"],
"x_2": ["g1", "g3", "g4"],
"y_1": ["g1", "g2", "g3", "g4"]
},
"ordered": false
}
Of particular interest are the object_species
key, which contains the computed reconciliation (mapping of synteny tree nodes to species tree nodes), and the syntenies
key, which contains the labeling of synteny tree nodes with syntenic content.
Additional options are available to generate all possible solutions and set the individual event costs (for compatible algorithms); please run superrec2 reconcile --help
for details.
Generating reconciliation diagrams
From a solution
The superrec2 draw
command can be used to visualize and inspect solutions generated by reconciliation algorithms.
The basic usage of the command is as follows:
$ superrec2 draw --input example.out.json --output example.out.pdf
This generates the following diagram, representing the reconciliation result:
From manual input
The superrec2 draw
program can also be used to plot any reconciliation, not just one generated by a reconciliation algorithm.
To that end, you need to create a JSON-formatted description of the reconciliation you are interested in plotting, for example:
{
"input": {
"object_tree": "((x_1,x_2)O1,y_1)O0;",
"species_tree": "(X,Y)S0;"
},
"object_species": {
"O0": "S0", "O1": "X", "x_1": "X", "x_2": "X", "y_1": "Y"
},
"syntenies": {
"O0": ["g1", "g2", "g3", "g4"],
"O1": ["g1", "g2", "g3", "g4"],
"x_1": ["g1", "g2", "g3"],
"x_2": ["g1", "g3", "g4"],
"y_1": ["g1", "g2", "g3", "g4"]
}
}
This will generate the same diagram as in the previous section. Notice that some parts are omitted compared to the previous JSON document: this is because this one is only for drawing and does not result from a reconciliation computation.
Adding color
If you need to distinguish parts of the object tree in the generated diagram, you can add color to a subtree by specifying the color
attribute on its root node.
{
"input": {
- "object_tree": "((x_1,x_2)O1,y_1)O0;",
+ "object_tree": "((x_1,x_2)O1[&&NHX:color=0000FF],y_1)O0;",
"species_tree": "(X,Y)S0;"
},
"object_species": {
"O0": "S0", "O1": "X", "x_1": "X", "x_2": "X", "y_1": "Y"
},
"syntenies": {
"O0": ["g1", "g2", "g3", "g4"],
"O1": ["g1", "g2", "g3", "g4"],
"x_1": ["g1", "g2", "g3"],
"x_2": ["g1", "g3", "g4"],
"y_1": ["g1", "g2", "g3", "g4"]
}
}
The input above will add a blue coloring to the subtree that undergoes a duplication event.
References
- M. Goodman, J. Czelusniak, G. W. Moore, A. E. Romero-Herrera, and G. Matsuda, “Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences,” Systematic Biology, vol. 28, Art. no. 2, 1979-06, doi: 10.1093/sysbio/28.2.132.
- A. Tofigh, M. Hallett, and J. Lagergren, “Simultaneous identification of duplications and lateral gene transfers,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, Art. no. 2, 2011-03, doi: 10.1109/tcbb.2010.14.
- M. S. Bansal, E. J. Alm, and M. Kellis, “Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss,” Bioinformatics, vol. 28, Art. no. 12, 2012-06, doi: 10.1093/bioinformatics/bts225.
- M. Delabre, N. El-Mabrouk, K. T. Huber, M. Lafond, V. Moulton, E. Noutahi, and M. S. Castellanos, “Evolution through segmental duplications and losses: a super-reconciliation approach,” Algorithms for Molecular Biology, vol. 15, Art. no. 12, 2020-05, doi: 10.1186/s13015-020-00171-4.
- Y. Anselmetti, M. Delabre, and N. El-Mabrouk, “Reconciliation with Segmental Duplication, Transfer, Loss and Gain,” RECOMB-CG 2022, Lecture Notes in Computer Science, vol. 13234, 2022-06, doi: 10.1007/978-3-031-06220-9_8.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file superrec2-0.1.0.tar.gz
.
File metadata
- Download URL: superrec2-0.1.0.tar.gz
- Upload date:
- Size: 53.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.24.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58c33331b212b50da5e315042cff1039b594db417d0283323ae8d3affd2544ee |
|
MD5 | e77920ea03494daffa960e3a645ffe37 |
|
BLAKE2b-256 | cf60ddf46d09a4c3b2d7f051fabd33557b64ed91fcab52a21282274425b32c9c |
File details
Details for the file superrec2-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: superrec2-0.1.0-py3-none-any.whl
- Upload date:
- Size: 67.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.24.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3671fbdb12ae313048d448036628d9d12cd8cc76ae936d8a7954dd6e09e28f6 |
|
MD5 | 9d31c43391122cc93b533e5241d33873 |
|
BLAKE2b-256 | 61d4f158be1afbfe292413b6d9117591a991815ea3393f9ec0d00e15366ac085 |