Skip to main content

Basic functions to start using semantic similarity measures directly from a rdf or owl file.

Project description

DiShIn: Semantic Similarity Measures using Disjunctive Shared Information

This software package provides the basic functions to start using semantic similarity measures directly from a rdf or owl file.

A web tool using this package is available at: http://labs.fc.ul.pt/dishin/

Package documentation: https://dishin.readthedocs.io/en/latest/

INSTALLATION

Either clone this repository or install from pypi:

pip install ssmpy

=======

Reference:

  • F. Couto and A. Lamurias, “Semantic similarity definition,” in Encyclopedia of Bioinformatics and Computational Biology (S. Ranganathan, K. Nakai, C. Schönbach, and M. Gribskov, eds.), vol. 1, pp. 870–876, Oxford: Elsevier, 2019 [https://doi.org/10.1016/B978-0-12-809633-8.20401-9] [https://www.researchgate.net/publication/323219905_Semantic_Similarity_Definition]

USAGE:

You can use DiShIn as a command line tool with the dishin.py script of this repository:

python dishin.py <semanticbase>.db <term1> <term2>
python dishin.py <semanticbase>.[owl|rdf] <semanticbase>.db <name_prefix> <relation> <annotation_file>

or use the python functions directly:

>>> import ssmpy

You can find more usage examples at https://dishin.readthedocs.io/en/latest/other_examples.html.

Metals Example

To create the semantic base file (metals.db) from the metals.owl file:

python dishin.py metals.owl metals.db https://raw.githubusercontent.com/lasigeBioTM/ssm/master/metals.owl# http://www.w3.org/2000/01/rdf-schema#subClassOf metals.txt

The metals.txt contains the a list of occurrences. For example, the following contents has one occurrence for each term, except gold and silver with two occurrences.

gold
silver
gold
silver
copper
platinum
palladium
metal
coinage
precious

Now to calculate the similarity between copper and gold execute:

python dishin.py metals.db copper gold

Output:

Resnik     DiShIn    intrinsic          0.2938933324510595
Resnik     MICA      intrinsic          0.587786664902119
Lin        DiShIn    intrinsic          0.19539774554219633
Lin        MICA      intrinsic          0.39079549108439265
JC         DiShIn    intrinsic          0.41316029085112316
JC         MICA      intrinsic          0.5456783339686456
Resnik     DiShIn    extrinsic          0.22599256187152864
Resnik     MICA      extrinsic          0.45198512374305727
Lin        DiShIn    extrinsic          0.1504595366201814
Lin        MICA      extrinsic          0.3009190732403628
JC         DiShIn    extrinsic          0.3918424740632774
JC         MICA      extrinsic          0.47617668319259754

Using the python function directly (first download metals.db and metals.txt from this repository):

>>> ssmpy.create_semantic_base("metals.owl", "metals.db", "https://raw.githubusercontent.com/lasigeBioTM/ssm/master/metals.owl#", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "metals.txt")
>>> ssmpy.semantic_base("metals.db")
>>> e1 = ssmpy.get_id("copper")
>>> e2 = ssmpy.get_id("gold")
>>> ssmpy.ssm_resnik (e1,e2)

Gene Ontology (GO) and UniProt proteins Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz
gunzip -N go201907.db.gz

Now to calculate the similarity between maltose biosynthetic process and maltose catabolic process execute:

python dishin.py go.db GO_0000023 GO_0000025

Output:

Resnik     DiShIn    intrinsic          3.7851272458782113
Resnik     MICA      intrinsic          8.911024218626034
Lin        DiShIn    intrinsic          0.4088671082942098
Lin        MICA      intrinsic          0.9625633347404052
JC         DiShIn    intrinsic          0.09136641197816901
JC         MICA      intrinsic          1.442695040888967
Resnik     DiShIn    extrinsic          4.273448119532465
Resnik     MICA      extrinsic          10.354796690276364
Lin        DiShIn    extrinsic          0.3919119421698985
Lin        MICA      extrinsic          0.9496239027945961
JC         DiShIn    extrinsic          0.0754073347935026
JC         MICA      extrinsic          0.9102392266268364

Now to calculate the similarity between proteins Q12345 and Q12346 execute:

python dishin.py go.db Q12345 Q12346

Output:

Resnik     DiShIn    intrinsic          1.3730675314939769
Resnik     MICA      intrinsic          1.653493583942882
Lin        DiShIn    intrinsic          0.16453282374961184
Lin        MICA      intrinsic          0.19975479444590458
JC         DiShIn    intrinsic          0.081825490673384
JC         MICA      intrinsic          0.09503231097236876
Resnik     DiShIn    extrinsic          0.9309878004221438
Resnik     MICA      extrinsic          1.143670161919403
Lin        DiShIn    extrinsic          0.15280642004118333
Lin        MICA      extrinsic          0.19273825637513847
JC         DiShIn    extrinsic          0.1013441951183969
JC         MICA      extrinsic          0.11970943511723715

To create an updated version of the database, download the ontology and annotations:

wget http://purl.obolibrary.org/obo/go.owl
wget http://geneontology.org/gene-associations/goa_uniprot_all_noiea.gaf.gz
gunzip goa_uniprot_all_noiea.gaf.gz 

And then create the new database:

python dishin.py go.owl go.db http://purl.obolibrary.org/obo/ http://www.w3.org/2000/01/rdf-schema#subClassOf goa_uniprot_all_noiea.gaf

Chemical Entities of Biological Interest (ChEBI) Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/chebi201907.db.gz
gunzip -N chebi201907.db.gz

Now to calculate the similarity between aripiprazole and bithionol execute:

python dishin.py chebi.db CHEBI_31236 CHEBI_3131

Output:

Resnik     DiShIn    intrinsic          1.3532341094444025
Resnik     MICA      intrinsic          5.3808132551673
Lin        DiShIn    intrinsic          0.12372266288871554
Lin        MICA      intrinsic          0.49195371280548356
JC         DiShIn    intrinsic          0.05216806727627202
JC         MICA      intrinsic          0.08997939012118301

To create an updated version of the database, download the ontology:

wget ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi_lite.owl

And then create the new database:

python dishin.py chebi_lite.owl chebi.db http://purl.obolibrary.org/obo/ http://www.w3.org/2000/01/rdf-schema#subClassOf ''

Human Phenotype (HP) Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/hp201907.db.gz
gunzip -N hp201907.db.gz

Now to calculate the similarity between Optic nerve coloboma and Optic nerve dysplasia execute:

python dishin.py hp.db HP_0000588 HP_0001093

Output:

Resnik     DiShIn    intrinsic          4.514739038358012
Resnik     MICA      intrinsic          5.917583373691076
Lin        DiShIn    intrinsic          0.5079590611976912
Lin        MICA      intrinsic          0.665794870870856
JC         DiShIn    intrinsic          0.11433121677975834
JC         MICA      intrinsic          0.16832667824491762

To create an updated version of the database, download the ontology:

wget http://purl.obolibrary.org/obo/hp.owl

And then create the new database:

python dishin.py hp.owl hp.db http://purl.obolibrary.org/obo/ http://www.w3.org/2000/01/rdf-schema#subClassOf ''

Human Disease Ontology (HDO) Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/doid201907.db.gz
gunzip -N doid201907.db.gz

Now to calculate the similarity between Asthma and Lung cancer execute:

python dishin.py doid.db DOID_2841 DOID_1324

Output:

Resnik     DiShIn    intrinsic          2.316903156622129
Resnik     MICA      intrinsic          3.730767546816189
Lin        DiShIn    intrinsic          0.40974430023007496
Lin        MICA      intrinsic          0.6597862035890811
JC         DiShIn    intrinsic          0.14980794775373127
JC         MICA      intrinsic          0.2599100799712222

To create an updated version of the database, download the ontology:

wget http://purl.obolibrary.org/obo/doid.owl

And then create the new database:

python dishin.py doid.owl doid.db http://purl.obolibrary.org/obo/ http://www.w3.org/2000/01/rdf-schema#subClassOf ''

Radiology Lexicon (RadLex) Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/radlex201907.db.gz
gunzip -N radlex201907.db.gz

Now to calculate the similarity between nervous system of right upper limb and nervous system of left upper limb execute:

python dishin.py radlex.db RID16139 RID16140

Output:

Resnik   MICA    intrinsic      9.363855135365721
Lin      MICA    intrinsic      0.9310781524369027
JC       MICA    intrinsic      0.7213475204444816

To create an updated version of the database, download the RDF/XML version from http://bioportal.bioontology.org/ontologies/RADLEX and save it as radlex.rdf

And then create the new database:

python dishin.py radlex.rdf radlex.db http://radlex.org/RID/ http://www.w3.org/2000/01/rdf-schema#subClassOf '' 

WordNet Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/wordnet201907.db.gz
gunzip wordnet201907.db.gz

Now to calculate the similarity between the nouns ambulance and motorcycle execute:

python dishin.py wordnet.db ambulance-noun-1 motorcycle-noun-1

Output:

Resnik   MICA    intrinsic      6.331085809208157
Lin      MICA    intrinsic      0.6792379292396559
JC       MICA    intrinsic      0.1672363673134892

To create an updated version of the database, download the ontology:

wget http://www.w3.org/2006/03/wn/wn20/rdf/wordnet-hyponym.rdf

And then create the new database:

python dishin.py wordnet-hyponym.rdf wordnet.db http://www.w3.org/2006/03/wn/wn20/instances/synset- http://www.w3.org/2006/03/wn/wn20/schema/hyponymOf ''

Source Code

  • semanticbase.py : provides a function to produce the semantic-base as a SQLite database

  • ssm.py : provides the functions to calculate semantic similarity based on the SQLite database

  • annotations.py : provides the functions to get the annotations for the given proteins

  • dishin.py : executes the functions according to the input given

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssmpy-0.2.3.tar.gz (13.5 kB view hashes)

Uploaded Source

Built Distribution

ssmpy-0.2.3-py3-none-any.whl (19.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page