Demo library
Project description
Semantic similarity computation with different metrics
Description • Installation • Usage • License
Description
TaxoVec is a semantic similarity library for Python which implements the state-of-the-art semantic similarity metrics like Resnik, JCN and HSS.
Requirements
- Python 3.6 or later
- NLTK
- NumPy
- Pandas
Installation
There are several ways to install TaxoVec, the recommended method
is to use pip
(the Python package manager) in the following way:
pip install TaxoVec==0.1.0
Usage
Using Wikipedia copus for calculating the Information content:
from TaxoVec.functions import semantic_similarity
semantic_similarity('cat', 'dog', 'resnik')
6.169410755220327
Calculating Information Conent from a given corpus:
from TaxoVec.calculate_IC import calculate_IC
from TaxoVec.functions import semantic_similarity
calculate_IC(path_to_corpus, path_to_save_IC_file)
semantic_similarity('cat', 'dog', 'resnik', path_to_save_IC_file)
Semantic similarity functions
The function semantic_similarity(word1, word2, kind, ic) has these options for the argument kind:
- hss -> HSS
- wup -> WUP
- lcs -> LC
- path_sim -> Shortest Path
- resnik -> Resnik
- jcn -> Jiang-Conrath
- lin -> Lin
- seco -> Seco
Benchmark
HSS (ours) | HSS (ours) | WUP | WUP | LC | LC | Shortest Path | Shortest Path | Resnik | Resnik | Jiang-Conrath | Jiang-Conrath | Lin | Lin | Seco | Seco | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | |
MEN | 0.41 | 0.33 | 0.36 | 0.33 | 0.14 | 0.05 | 0.07 | 0.03 | 0.05 | 0.03 | -0.05 | -0.04 | 0.05 | 0.04 | -0.01 | 0.03 |
MC30 | 0.74 | 0.69 | 0.74 | 0.73 | 0.33 | 0.21 | 0.22 | 0.3 | 0.13 | 0.03 | -0.06 | -0.01 | 0.05 | 0.01 | 0.13 | -0.09 |
WSS | 0.68 | 0.65 | 0.58 | 0.59 | 0.36 | 0.23 | 0.16 | 0.1 | 0.02 | -0.03 | 0.04 | 0.06 | 0.03 | 0.06 | -0.01 | -0.04 |
Simlex999 | 0.4 | 0.38 | 0.45 | 0.43 | 0.26 | 0.15 | 0.2 | 0.16 | -0.04 | -0.04 | 0.12 | 0.14 | 0.12 | 0.14 | -0.02 | -0.08 |
MT287 | 0.46 | 0.31 | 0.4 | 0.28 | 0.26 | 0.12 | 0.11 | 0.11 | 0.03 | 0.04 | 0.18 | 0.16 | 0.22 | 0.17 | 0 | -0.06 |
MT771 | 0.44 | 0.4 | 0.43 | 0.49 | 0.06 | 0.02 | 0.1 | 0.13 | 0 | -0.01 | 0 | 0 | 0 | 0 | -0.05 | -0.03 |
Time per pair (s) | 0.0007 | 0.0007 | 0.008 | 0.008 | 0.0055 | 0.0055 | 0.0064 | 0.0064 | 0.5586 | 0.5586 | 0.551 | 0.551 | 0.5866 | 0.5866 | 0.0013 | 0.0013 |
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
TaxoVec-0.1.1.tar.gz
(37.4 MB
view hashes)
Built Distribution
TaxoVec-0.1.1-py2-none-any.whl
(38.0 MB
view hashes)