Semantic similarity computation with different state-of-the-art metrics
Project description
Semantic similarity computation with different state-of-the-art metrics
Description • Installation • Usage • License
Description
TaxoSS is a semantic similarity library for Python which implements the state-of-the-art semantic similarity metrics like Resnik, JCN, and HSS.
Requirements
- Python 3.6 or later
- NLTK
- NumPy
- Pandas
Installation
TaxoSS can be installed through pip
(the Python package manager) in the following way:
pip install taxoss
Usage
Semantic similarity functions
You can compute the semantic similarity in the following way:
from TaxoSS.functions import semantic_similarity
semantic_similarity('brother', 'sister', 'hss')
3.353513521371089
The function semantic_similarity(word1, word2, kind, ic)
has these options for the argument kind
:
- hss -> HSS (default)
- wup -> WUP
- lcs -> LC
- path_sim -> Shortest Path
- resnik -> Resnik
- jcn -> Jiang-Conrath
- lin -> Lin
- seco -> Seco
For the argument ic
see the following section.
Information Content
Using a Wikipedia copus for calculating the Information Content (default of the argument ic
):
from TaxoSS.functions import semantic_similarity
semantic_similarity('cat', 'dog', 'resnik')
6.169410755220327
Calculating Information Conent from a given corpus:
from TaxoSS.calculate_IC import calculate_IC
from TaxoSS.functions import semantic_similarity
calculate_IC(path_to_corpus, path_to_save_IC_file)
semantic_similarity('cat', 'dog', 'resnik', path_to_save_IC_file)
with path_to_save_IC_file
a path into the virtual environment TaxoSS package, e.g. venv/lib/python3.6/site-packages/TaxoSS/data/prova_IC.csv.
Benchmark
HSS (ours) | HSS (ours) | WUP | WUP | LC | LC | Shortest Path | Shortest Path | Resnik | Resnik | Jiang-Conrath | Jiang-Conrath | Lin | Lin | Seco | Seco | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | Pearson | Spearman | |
MEN | 0.41 | 0.33 | 0.36 | 0.33 | 0.14 | 0.05 | 0.07 | 0.03 | 0.05 | 0.03 | -0.05 | -0.04 | 0.05 | 0.04 | -0.01 | 0.03 |
MC30 | 0.74 | 0.69 | 0.74 | 0.73 | 0.33 | 0.21 | 0.22 | 0.3 | 0.13 | 0.03 | -0.06 | -0.01 | 0.05 | 0.01 | 0.13 | -0.09 |
WSS | 0.68 | 0.65 | 0.58 | 0.59 | 0.36 | 0.23 | 0.16 | 0.1 | 0.02 | -0.03 | 0.04 | 0.06 | 0.03 | 0.06 | -0.01 | -0.04 |
Simlex999 | 0.4 | 0.38 | 0.45 | 0.43 | 0.26 | 0.15 | 0.2 | 0.16 | -0.04 | -0.04 | 0.12 | 0.14 | 0.12 | 0.14 | -0.02 | -0.08 |
MT287 | 0.46 | 0.31 | 0.4 | 0.28 | 0.26 | 0.12 | 0.11 | 0.11 | 0.03 | 0.04 | 0.18 | 0.16 | 0.22 | 0.17 | 0 | -0.06 |
MT771 | 0.44 | 0.4 | 0.43 | 0.49 | 0.06 | 0.02 | 0.1 | 0.13 | 0 | -0.01 | 0 | 0 | 0 | 0 | -0.05 | -0.03 |
Time per pair (s) | 0.0007 | 0.0007 | 0.008 | 0.008 | 0.0055 | 0.0055 | 0.0064 | 0.0064 | 0.5586 | 0.5586 | 0.551 | 0.551 | 0.5866 | 0.5866 | 0.0013 | 0.0013 |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file TaxoSS-0.1.5.tar.gz
.
File metadata
- Download URL: TaxoSS-0.1.5.tar.gz
- Upload date:
- Size: 31.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67f78de326eaae1b9a5a0e1f50eb915e93600da90af7e8ba6b9f082faf9a19ff |
|
MD5 | 924c9d971b8aec10cececf8574d75acb |
|
BLAKE2b-256 | 0cfdc722daeb3550d5a192c0be297d3f090fe7b5c01eaf2dce23403af794d89d |
File details
Details for the file TaxoSS-0.1.5-py2-none-any.whl
.
File metadata
- Download URL: TaxoSS-0.1.5-py2-none-any.whl
- Upload date:
- Size: 32.3 MB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea308db27227434ecf9b0d215fcec17ca21babd51146f6a67832ab437e82ce65 |
|
MD5 | f7e50b214450dc3d77f34cac9833fb7f |
|
BLAKE2b-256 | 2da8e69be856167f0571367f83ad603fbba2fdfbeb766cb1f766976d911a5b92 |