Semantic similarity framework for knowledge graphs
Sematch is an integrated framework for the development, evaluation, and application of semantic similarity for Knowledge Graphs. Semantic similarity is a very useful metric for many applications that require to quantify the semantic similarity between concepts. Sematch focuses on knowledge-based semantic similarity using structural knowledge in taxonomy (e.g. depth, path length, least common subsumer), and statistical information contents (corpus-IC and graph-IC). Many applications share a common pipeline in using semantic similarity analysis. In textual applications, word similarity is measured based on WordNet with various semantic similarity metrics. Then text similarity (sentence level) is measured by composing word similarities, while document level text could be analysed by identifying important sentences based on text similarity, e.g. TextRank. This process starts from measuring concept similarity (word meanings or senses which are denoted as synsets in WordNet) which is normally referred as knowledge-based semantic similarity compared to its conterpart corpus-based approaches that are based on word coocurrence (e.g. Pointwise Mutual Information) or distributional semantics (Explict Semantic Analysis, Word2Vec, GLOVE and etc). Concept similarity are normally quantified using various information such as structural knowledge (e.g. path length, depth, least common subsumer) and statistical knowledge ( information content). KG based applications also meet the requirments in computing concept similarity (e.g. http://dbpedia.org/class/yago/Actor109765278) and entity similarity (e.g. http://dbpedia.org/resource/Madrid). Since entities usually contain several concepts to describe entity meanings, similar to word similarity, entity similarity can be quantified based on their concept similarity. Moreover, those concepts in KGs are usually constructed into hierarchical taxonomies, such as DBpedia ontology class, Wikipedia categories, therefore quantifying concept similarity in WordNet or DBpedia relies on similar semantic information (e.g. path length, depth, least common subsumer, information content) and semantic similarity metrics (e.g. Path, Wu & Palmer,Li, Resnik, Lin, Jiang & Conrad and WPath). Consequently, Sematch provides an integrated framework to develop and evaluate semantic similarity metrics for concepts, words, entities and their applications.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size sematch-1.0.1.tar.gz (1.7 MB)||File type Source||Python version None||Upload date||Hashes View hashes|