Embedding Evaluator
Project description
Embedding Evaluator
EmbeddingEvaluator is a tool to provide metrics for evaluating different embedding models.
The current version only supports evaluation of evaluate only embeddings of the type:
- FastText
It evaluates the embeddings based on the two following metrics:
- Analogy
- Outlier Detection
Installation
The EmbeddingEvaluator can be installed from PyPi:
pip install embeddingevaluator
Usage
Analogy Metrics
To use the EmbeddingEvaluator to measure different embeddings basead on analogy metrics the user needs a file with the following configuration:
Word 1 | Word 2 | Word 3 | Word 4 |
---|---|---|---|
1st Pair 1st Word | 1st Pair 2nd Word | 2nd Pair 1st Word | 2nd Pair 2nd Word |
Men | King | Women | Queen |
Ouliter Detection
To use the EmbeddingEvaluator to measure different embeddings basead on outlier detection metrics the user needs a file with the following configuration:
- Eight words which are semantically very similar and are all connected with each other by a clear well-known relation. (Cluster)
- Two words which are very similar to the ones in the cluster.
- Two words which are similar and related to the ones in the cluster.
- Two words which are related, but not similar to the ones in the cluster.
- Two words which are unrelated and not similar to the ones in the cluster.
Initialize the EmbeddingEvaluator
The EmbeddingEvaluator has three parameters as input:
- Input Metrics:
A dictionary with a list of the paths for the input evaluation files.
Example:
input_metric = {'analogy': ['file_1', 'file_2'],
'outlier': ['file_1']}
- Input Models: A dictionary with the model names and the paths to the models.
Example:
input_model = {'model_1': 'path_1',
'model_2': 'path_2'}
Initialize the class:
emb_evaluator = EmbeddingMetrics(input_metric, input_model)
Summary a model's metrics
To summarize the metrics of a model.
emb_evaluator.summary_metrics('model_1')
Compare models metrics
To compare the metrics of two or more models.
emb_evaluator.compare_models(['model_1', 'model_2'])
References
Levy, O. and Goldberg, Y.: Linguistic Regularities in Sparse and Explicit Word Representations (2014) Collados, J.C. and Navigli, R.: Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations (2016)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file embedding-evaluator-0.0.1.tar.gz
.
File metadata
- Download URL: embedding-evaluator-0.0.1.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce6b2cb3eb116e1c2ca042ec114c6e2a47a3e4b70946c0df00e01d0520c2c599 |
|
MD5 | bb812d16ec5f44635c83ab1b1de6e228 |
|
BLAKE2b-256 | dc6f5b775a86832cd7e25ba5f04cac8d9746713cd89a72df35b8520fbe5790f8 |
File details
Details for the file embedding_evaluator-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: embedding_evaluator-0.0.1-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66ff2b11a3fd85cee08d89af39aefd6b70e72e07e4ec4c51b296b3ae06e623d0 |
|
MD5 | 2bf804f613c35d87e88170131a9c92d5 |
|
BLAKE2b-256 | 062e42013b11d6e1bd27103089163525571d1797b82beb01b00e861a92d016b1 |