OMAmo - orthology-based model organism selection
OMAMO: orthology-based model organism selection
OMAMO is a tool that suggests the best model organism to study a biological process based on orthologous relationship between a species and human.
The user can consider several species as potential model organisms and the algorithm will rank them and report the output for a given biological process (searched as a GO term or a GO ID) is produced in the dataframe format.
Following Python packages are needed: numpy, matplotlib, pickle and pandas. Besides, you need to install pyOMA.
Firstly, download the OMA dataset:
wget https://omabrowser.org/All/OmaServer.h5 -O data/OmaServer.h5 #caution: 94GB
Secondly, using the file
data/oma-species.txt find the five-letter UniProt code for species of interest. For example, consider three species Dicdyostelium discodeium , Neurospora crassa and Schizosaccharomyces pombe. Their UniProt codes are
Install omamo from the git checkout:
pip install <path_to_omamo.git>
Once the package is installed, you should be able to run
omamo as a command. With
omamo -h see the available options:
usage: omamo [-h] --db DB [--query QUERY] [--ic IC] [--h5-out H5_OUT] [--tsv-out TSV_OUT] --models MODELS [MODELS ...] Run omamo for a set of model organisms optional arguments: -h, --help show this help message and exit --db DB Path to the HDF5 database --query QUERY Name of the Query species, defaults to HUMAN --ic IC Path to the information content file (tsv format) --h5-out H5_OUT Path to the HDF5 output file. If omitted, not stored in this format --tsv-out TSV_OUT Path to the TSV output file. If omitted, not stored in this format --models MODELS [MODELS ...] List of model species, or a path to a txt file with the model species
In order to create the omamo data for Dicdyostelium discodeium, Neurospora crassa and Schizosaccharomyces pombe, we would run omamo with the following parameters:
omamo --db OmaServer.h5 --query HUMAN --tsv-out omamo_output_df.csv --models DICDI NEUCR SCHPO
You might face an error about
OSError: ``OmaServer.h5.idx`` does not exist and
pyoma.browser.db.DBConsistencyError: Suffix index for protein sequences is not available which you can ignore them.
Finally, the output data frame is ready as a TSV file
omamo_output_df.csv. For example, for the GO ID of
GO0000472, "endonucleolytic cleavage to generate mature 5'-end of SSU-rRNA", OMAMO provides the following ranking for potential model organisms:
head -n 1 omamo_output_df.csv > ranked_organisms.csv awk '$1 == 472' omamo_output_df.csv >> ranked_organisms.csv cat ranked_organisms.csv GOnr Species QuerySpeciesGenes ModelSpeciesGenes NrOrthologs FuncSim_Mean FuncSim_Std Score 472 DICDI NOP9;TBL3;ABT1 Q551Y5;Q7KWS8;esf2 3 0.9095 0.1567 2.7286 472 NEUCR NOP9;TBL3 nop9;pod-5 2 1.0000 0.0000 2.0000 472 SCHPO NOP9;TBL3 nop9;utp13 2 1.0000 0.0000 2.0000
You can also visit the OMAMO website, where you can browse biological processes to study in 50 unicellular species.
- store ic values in hdf5 database
- Overhaul and creating pip package
- Initial release
Alina Nicheperovich, Adrian M Altenhoff, Christophe Dessimoz, Sina Majidian, "OMAMO: orthology-based model organism selection", submitted to Bioinformatics journal, preprint.
OMAMO is a free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
OMAMO is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with OMAMO. If not, see http://www.gnu.org/licenses/.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.