OMAmo - orthology-based model organism selection
Project description
OMAMO: orthology-based model organism selection
OMAMO is a tool that suggests the best model organism to study a biological process based on orthologous relationship between a species and human.
The user can consider several species as potential model organisms and the algorithm will rank them and report the output for a given biological process (searched as a GO term or a GO ID) is produced in the dataframe format.
Dependencies
Following Python packages are needed: numpy, matplotlib, pickle and pandas. Besides, you need to install pyOMA.
Pipeline
Firstly, download the OMA dataset:
wget https://omabrowser.org/All/OmaServer.h5 -O data/OmaServer.h5 #caution: 94GB
Secondly, using the file data/oma-species.txt
find the five-letter UniProt code for species of interest. For example, consider three species Dicdyostelium discodeium , Neurospora crassa and Schizosaccharomyces pombe. Their UniProt codes are DICDI
, NEUCR
and SCHPO
, respectively.
Install omamo from the git checkout:
pip install <path_to_omamo.git>
Once the package is installed, you should be able to run omamo
as a command. With omamo -h
see the available options:
usage: omamo [-h] --db DB [--query QUERY] [--ic IC] [--h5-out H5_OUT] [--tsv-out TSV_OUT] --models MODELS [MODELS ...]
Run omamo for a set of model organisms
optional arguments:
-h, --help show this help message and exit
--db DB Path to the HDF5 database
--query QUERY Name of the Query species, defaults to HUMAN
--ic IC Path to the information content file (tsv format)
--h5-out H5_OUT Path to the HDF5 output file. If omitted, not stored in this format
--tsv-out TSV_OUT Path to the TSV output file. If omitted, not stored in this format
--models MODELS [MODELS ...]
List of model species, or a path to a txt file with the model species
In order to create the omamo data for Dicdyostelium discodeium, Neurospora crassa and Schizosaccharomyces pombe, we would run omamo with the following parameters:
omamo --db OmaServer.h5 --query HUMAN --tsv-out omamo_output_df.csv --models DICDI NEUCR SCHPO
You might face an error about OSError: ``OmaServer.h5.idx`` does not exist
and pyoma.browser.db.DBConsistencyError: Suffix index for protein sequences is not available
which you can ignore them.
Finally, the output data frame is ready as a TSV file omamo_output_df.csv
. For example, for the GO ID of GO0000472
, "endonucleolytic cleavage to generate mature 5'-end of SSU-rRNA", OMAMO provides the following ranking for potential model organisms:
head -n 1 omamo_output_df.csv > ranked_organisms.csv
awk '$1 == 472' omamo_output_df.csv >> ranked_organisms.csv
cat ranked_organisms.csv
GOnr Species QuerySpeciesGenes ModelSpeciesGenes NrOrthologs FuncSim_Mean FuncSim_Std Score
472 DICDI NOP9;TBL3;ABT1 Q551Y5;Q7KWS8;esf2 3 0.9095 0.1567 2.7286
472 NEUCR NOP9;TBL3 nop9;pod-5 2 1.0000 0.0000 2.0000
472 SCHPO NOP9;TBL3 nop9;utp13 2 1.0000 0.0000 2.0000
OMAMO Website
You can also visit the OMAMO website, where you can browse biological processes to study in 50 unicellular species.
Change log
Version 0.2.1
- store ic values in hdf5 database
Version 0.2.0
- Overhaul and creating pip package
Version 0.0.1
- Initial release
Citation
Alina Nicheperovich, Adrian M Altenhoff, Christophe Dessimoz, Sina Majidian, "OMAMO: orthology-based model organism selection", submitted to Bioinformatics journal, preprint.
License
OMAMO is a free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
OMAMO is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with OMAMO. If not, see http://www.gnu.org/licenses/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file omamo-0.2.1.tar.gz
.
File metadata
- Download URL: omamo-0.2.1.tar.gz
- Upload date:
- Size: 304.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | edc16ff8496978879fca8aaf23a32ed90d0abc80e921cd9da679f07f64f2d925 |
|
MD5 | e0f040fbf0011baaa3e014409f7d1919 |
|
BLAKE2b-256 | 1f8b2419a9946832ba4dde2c1745eb3e65e5e0e06d1f6f4b027d892de3afb3fd |
File details
Details for the file omamo-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: omamo-0.2.1-py3-none-any.whl
- Upload date:
- Size: 301.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1ed4c4292d6af43c4b24422f43e34784d0e34f419dd970c7126be147f749a8c |
|
MD5 | 459a1734ef7031c82c87ea4ff8502ae3 |
|
BLAKE2b-256 | 53907bfaa1cf3d4ac165d2c18d82000a2914e2ab4ecad4ba96e0876ad62f8e6d |