Embedding Evaluator
Project description
Embedding Evaluator
EmbeddingEvaluator is a tool to provide metrics for evaluating different embedding models.
The current version only supports evaluation of evaluate only embeddings of the type:
- FastText
It evaluates the embeddings based on the two following metrics:
- Analogy
- Outlier Detection
Installation
The EmbeddingEvaluator can be installed from PyPi:
pip install embeddingevaluator
Usage
Analogy Metrics
To use the EmbeddingEvaluator to measure different embeddings basead on analogy metrics the user needs a file with the following configuration:
Word 1 | Word 2 | Word 3 | Word 4 |
---|---|---|---|
1st Pair 1st Word | 1st Pair 2nd Word | 2nd Pair 1st Word | 2nd Pair 2nd Word |
Men | King | Women | Queen |
Ouliter Detection
To use the EmbeddingEvaluator to measure different embeddings basead on outlier detection metrics the user needs a file with the following configuration:
- Eight words which are semantically very similar and are all connected with each other by a clear well-known relation. (Cluster)
- Two words which are very similar to the ones in the cluster.
- Two words which are similar and related to the ones in the cluster.
- Two words which are related, but not similar to the ones in the cluster.
- Two words which are unrelated and not similar to the ones in the cluster.
Initialize the EmbeddingEvaluator
The EmbeddingEvaluator has three parameters as input:
- Input Metrics:
A dictionary with a list of the paths for the input evaluation files.
Example:
input_metric = {'analogy': ['file_1', 'file_2'],
'outlier': ['file_1']}
- Input Models: A dictionary with the model names and the paths to the models.
Example:
input_model = {'model_1': 'path_1',
'model_2': 'path_2'}
Initialize the class:
emb_evaluator = EmbeddingMetrics(input_metric, input_model)
Summary a model's metrics
To summarize the metrics of a model.
emb_evaluator.summary_metrics('model_1')
Compare models metrics
To compare the metrics of two or more models.
emb_evaluator.compare_models(['model_1', 'model_2'])
References
Levy, O. and Goldberg, Y.: Linguistic Regularities in Sparse and Explicit Word Representations (2014) Collados, J.C. and Navigli, R.: Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations (2016)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for embedding-evaluator-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce6b2cb3eb116e1c2ca042ec114c6e2a47a3e4b70946c0df00e01d0520c2c599 |
|
MD5 | bb812d16ec5f44635c83ab1b1de6e228 |
|
BLAKE2b-256 | dc6f5b775a86832cd7e25ba5f04cac8d9746713cd89a72df35b8520fbe5790f8 |
Hashes for embedding_evaluator-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66ff2b11a3fd85cee08d89af39aefd6b70e72e07e4ec4c51b296b3ae06e623d0 |
|
MD5 | 2bf804f613c35d87e88170131a9c92d5 |
|
BLAKE2b-256 | 062e42013b11d6e1bd27103089163525571d1797b82beb01b00e861a92d016b1 |