Skip to main content

Demo library

Project description

Semantic similarity computation with different metrics

DescriptionInstallationUsageLicense


Description

TaxoVec is a semantic similarity library for Python which implements the state-of-the-art semantic similarity metrics like Resnik, JCN and HSS.

Requirements

  • Python 3.6 or later
  • NLTK
  • NumPy
  • Pandas

Installation

There are several ways to install TaxoVec, the recommended method is to use pip(the Python package manager) in the following way:

pip install TaxoVec==0.1.0

Usage

Using Wikipedia copus for calculating the Information content:

from TaxoVec.functions import semantic_similarity
semantic_similarity('cat', 'dog', 'resnik')

6.169410755220327

Calculating Information Conent from a given corpus:

from TaxoVec.calculate_IC import calculate_IC
from TaxoVec.functions import semantic_similarity

calculate_IC(path_to_corpus, path_to_save_IC_file)
semantic_similarity('cat', 'dog', 'resnik', path_to_save_IC_file)

Semantic similarity functions

The function semantic_similarity(word1, word2, kind, ic) has these options for the argument kind:

  • hss -> HSS
  • wup -> WUP
  • lcs -> LC
  • path_sim -> Shortest Path
  • resnik -> Resnik
  • jcn -> Jiang-Conrath
  • lin -> Lin
  • seco -> Seco

Benchmark

HSS (ours) HSS (ours) WUP WUP LC LC Shortest Path Shortest Path Resnik Resnik Jiang-Conrath Jiang-Conrath Lin Lin Seco Seco
Pearson Spearman Pearson Spearman Pearson Spearman Pearson Spearman Pearson Spearman Pearson Spearman Pearson Spearman Pearson Spearman
MEN 0.41 0.33 0.36 0.33 0.14 0.05 0.07 0.03 0.05 0.03 -0.05 -0.04 0.05 0.04 -0.01 0.03
MC30 0.74 0.69 0.74 0.73 0.33 0.21 0.22 0.3 0.13 0.03 -0.06 -0.01 0.05 0.01 0.13 -0.09
WSS 0.68 0.65 0.58 0.59 0.36 0.23 0.16 0.1 0.02 -0.03 0.04 0.06 0.03 0.06 -0.01 -0.04
Simlex999 0.4 0.38 0.45 0.43 0.26 0.15 0.2 0.16 -0.04 -0.04 0.12 0.14 0.12 0.14 -0.02 -0.08
MT287 0.46 0.31 0.4 0.28 0.26 0.12 0.11 0.11 0.03 0.04 0.18 0.16 0.22 0.17 0 -0.06
MT771 0.44 0.4 0.43 0.49 0.06 0.02 0.1 0.13 0 -0.01 0 0 0 0 -0.05 -0.03
Time per pair (s) 0.0007 0.0007 0.008 0.008 0.0055 0.0055 0.0064 0.0064 0.5586 0.5586 0.551 0.551 0.5866 0.5866 0.0013 0.0013

License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TaxoVec-0.1.1.tar.gz (37.4 MB view hashes)

Uploaded Source

Built Distribution

TaxoVec-0.1.1-py2-none-any.whl (38.0 MB view hashes)

Uploaded Python 2

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page