Basic functions to start using semantic similarity measures directly from a rdf or owl file.
Project description
DiShIn: Semantic Similarity Measures using Disjunctive Shared Information
This software package provides the basic functions to start using semantic similarity measures directly from a rdf or owl file.
A web tool using this package is available at: http://labs.fc.ul.pt/dishin/
Package documentation: https://dishin.readthedocs.io/en/latest/
INSTALLATION
Either clone this repository or install from pypi:
pip install ssmpy
=======
Reference:
- F. Couto and A. Lamurias, “Semantic similarity definition,” in Encyclopedia of Bioinformatics and Computational Biology (S. Ranganathan, K. Nakai, C. Schönbach, and M. Gribskov, eds.), vol. 1, pp. 870–876, Oxford: Elsevier, 2019 [https://doi.org/10.1016/B978-0-12-809633-8.20401-9] [https://www.researchgate.net/publication/323219905_Semantic_Similarity_Definition]
USAGE:
You can use DiShIn as a command line tool with the dishin.py script of this repository:
python dishin.py <semanticbase>.db <term1> <term2> python dishin.py <semanticbase>.[owl|rdf] <semanticbase>.db <name_prefix> <relation> <annotation_file>
or use the python functions directly:
>>> import ssmpy
You can find more usage examples at https://dishin.readthedocs.io/en/latest/other_examples.html.
Metals Example
To create the semantic base file (metals.db) from the metals.owl file:
python dishin.py metals.owl metals.db https://raw.githubusercontent.com/lasigeBioTM/ssm/master/metals.owl# http://www.w3.org/2000/01/rdf-schema#subClassOf metals.txt
The metals.txt contains the a list of occurrences. For example, the following contents has one occurrence for each term, except gold and silver with two occurrences.
gold silver gold silver copper platinum palladium metal coinage precious
Now to calculate the similarity between copper and gold execute:
python dishin.py metals.db copper gold
Output:
Resnik DiShIn intrinsic 0.2938933324510595 Resnik MICA intrinsic 0.587786664902119 Lin DiShIn intrinsic 0.19539774554219633 Lin MICA intrinsic 0.39079549108439265 JC DiShIn intrinsic 0.41316029085112316 JC MICA intrinsic 0.5456783339686456 Resnik DiShIn extrinsic 0.22599256187152864 Resnik MICA extrinsic 0.45198512374305727 Lin DiShIn extrinsic 0.1504595366201814 Lin MICA extrinsic 0.3009190732403628 JC DiShIn extrinsic 0.3918424740632774 JC MICA extrinsic 0.47617668319259754
Using the python function directly (first download metals.db and metals.txt from this repository):
>>> ssmpy.create_semantic_base("metals.owl", "metals.db", "https://raw.githubusercontent.com/lasigeBioTM/ssm/master/metals.owl#", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "metals.txt") >>> ssmpy.semantic_base("metals.db") >>> e1 = ssmpy.get_id("copper") >>> e2 = ssmpy.get_id("gold") >>> ssmpy.ssm_resnik (e1,e2)
Gene Ontology (GO) and UniProt proteins Example
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz gunzip -N go201907.db.gz
Now to calculate the similarity between maltose biosynthetic process and maltose catabolic process execute:
python dishin.py go.db GO_0000023 GO_0000025
Output:
Resnik DiShIn intrinsic 3.7851272458782113 Resnik MICA intrinsic 8.911024218626034 Lin DiShIn intrinsic 0.4088671082942098 Lin MICA intrinsic 0.9625633347404052 JC DiShIn intrinsic 0.09136641197816901 JC MICA intrinsic 1.442695040888967 Resnik DiShIn extrinsic 4.273448119532465 Resnik MICA extrinsic 10.354796690276364 Lin DiShIn extrinsic 0.3919119421698985 Lin MICA extrinsic 0.9496239027945961 JC DiShIn extrinsic 0.0754073347935026 JC MICA extrinsic 0.9102392266268364
Now to calculate the similarity between proteins Q12345 and Q12346 execute:
python dishin.py go.db Q12345 Q12346
Output:
Resnik DiShIn intrinsic 1.3730675314939769 Resnik MICA intrinsic 1.653493583942882 Lin DiShIn intrinsic 0.16453282374961184 Lin MICA intrinsic 0.19975479444590458 JC DiShIn intrinsic 0.081825490673384 JC MICA intrinsic 0.09503231097236876 Resnik DiShIn extrinsic 0.9309878004221438 Resnik MICA extrinsic 1.143670161919403 Lin DiShIn extrinsic 0.15280642004118333 Lin MICA extrinsic 0.19273825637513847 JC DiShIn extrinsic 0.1013441951183969 JC MICA extrinsic 0.11970943511723715
To create an updated version of the database, download the ontology and annotations:
wget http://purl.obolibrary.org/obo/go.owl wget http://geneontology.org/gene-associations/goa_uniprot_all_noiea.gaf.gz gunzip goa_uniprot_all_noiea.gaf.gz
And then create the new database:
python dishin.py go.owl go.db http://purl.obolibrary.org/obo/ http://www.w3.org/2000/01/rdf-schema#subClassOf goa_uniprot_all_noiea.gaf
Chemical Entities of Biological Interest (ChEBI) Example
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/chebi201907.db.gz gunzip -N chebi201907.db.gz
Now to calculate the similarity between aripiprazole and bithionol execute:
python dishin.py chebi.db CHEBI_31236 CHEBI_3131
Output:
Resnik DiShIn intrinsic 1.3532341094444025 Resnik MICA intrinsic 5.3808132551673 Lin DiShIn intrinsic 0.12372266288871554 Lin MICA intrinsic 0.49195371280548356 JC DiShIn intrinsic 0.05216806727627202 JC MICA intrinsic 0.08997939012118301
To create an updated version of the database, download the ontology:
wget ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi_lite.owl
And then create the new database:
python dishin.py chebi_lite.owl chebi.db http://purl.obolibrary.org/obo/ http://www.w3.org/2000/01/rdf-schema#subClassOf ''
Human Phenotype (HP) Example
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/hp201907.db.gz gunzip -N hp201907.db.gz
Now to calculate the similarity between Optic nerve coloboma and Optic nerve dysplasia execute:
python dishin.py hp.db HP_0000588 HP_0001093
Output:
Resnik DiShIn intrinsic 4.514739038358012 Resnik MICA intrinsic 5.917583373691076 Lin DiShIn intrinsic 0.5079590611976912 Lin MICA intrinsic 0.665794870870856 JC DiShIn intrinsic 0.11433121677975834 JC MICA intrinsic 0.16832667824491762
To create an updated version of the database, download the ontology:
wget http://purl.obolibrary.org/obo/hp.owl
And then create the new database:
python dishin.py hp.owl hp.db http://purl.obolibrary.org/obo/ http://www.w3.org/2000/01/rdf-schema#subClassOf ''
Human Disease Ontology (HDO) Example
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/doid201907.db.gz gunzip -N doid201907.db.gz
Now to calculate the similarity between Asthma and Lung cancer execute:
python dishin.py doid.db DOID_2841 DOID_1324
Output:
Resnik DiShIn intrinsic 2.316903156622129 Resnik MICA intrinsic 3.730767546816189 Lin DiShIn intrinsic 0.40974430023007496 Lin MICA intrinsic 0.6597862035890811 JC DiShIn intrinsic 0.14980794775373127 JC MICA intrinsic 0.2599100799712222
To create an updated version of the database, download the ontology:
wget http://purl.obolibrary.org/obo/doid.owl
And then create the new database:
python dishin.py doid.owl doid.db http://purl.obolibrary.org/obo/ http://www.w3.org/2000/01/rdf-schema#subClassOf ''
Radiology Lexicon (RadLex) Example
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/radlex201907.db.gz gunzip -N radlex201907.db.gz
Now to calculate the similarity between nervous system of right upper limb and nervous system of left upper limb execute:
python dishin.py radlex.db RID16139 RID16140
Output:
Resnik MICA intrinsic 9.363855135365721 Lin MICA intrinsic 0.9310781524369027 JC MICA intrinsic 0.7213475204444816
To create an updated version of the database, download the RDF/XML version from http://bioportal.bioontology.org/ontologies/RADLEX and save it as radlex.rdf
And then create the new database:
python dishin.py radlex.rdf radlex.db http://radlex.org/RID/ http://www.w3.org/2000/01/rdf-schema#subClassOf ''
WordNet Example
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/wordnet201907.db.gz gunzip wordnet201907.db.gz
Now to calculate the similarity between the nouns ambulance and motorcycle execute:
python dishin.py wordnet.db ambulance-noun-1 motorcycle-noun-1
Output:
Resnik MICA intrinsic 6.331085809208157 Lin MICA intrinsic 0.6792379292396559 JC MICA intrinsic 0.1672363673134892
To create an updated version of the database, download the ontology:
wget http://www.w3.org/2006/03/wn/wn20/rdf/wordnet-hyponym.rdf
And then create the new database:
python dishin.py wordnet-hyponym.rdf wordnet.db http://www.w3.org/2006/03/wn/wn20/instances/synset- http://www.w3.org/2006/03/wn/wn20/schema/hyponymOf ''
Source Code
-
semanticbase.py : provides a function to produce the semantic-base as a SQLite database
-
ssm.py : provides the functions to calculate semantic similarity based on the SQLite database
-
annotations.py : provides the functions to get the annotations for the given proteins
-
dishin.py : executes the functions according to the input given
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size ssmpy-0.2.3-py3-none-any.whl (19.9 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size ssmpy-0.2.3.tar.gz (13.5 kB) | File type Source | Python version None | Upload date | Hashes View |