Skip to main content

Embedding Evaluator

Project description

Embedding Evaluator

EmbeddingEvaluator is a tool to provide metrics for evaluating different embedding models.

The current version only supports evaluation of evaluate only embeddings of the type:

  • FastText

It evaluates the embeddings based on the two following metrics:

  • Analogy
  • Outlier Detection

Installation

The EmbeddingEvaluator can be installed from PyPi:

pip install embeddingevaluator

Usage

Analogy Metrics

To use the EmbeddingEvaluator to measure different embeddings basead on analogy metrics the user needs a file with the following configuration:

Word 1 Word 2 Word 3 Word 4
1st Pair 1st Word 1st Pair 2nd Word 2nd Pair 1st Word 2nd Pair 2nd Word
Men King Women Queen

Ouliter Detection

To use the EmbeddingEvaluator to measure different embeddings basead on outlier detection metrics the user needs a file with the following configuration:

  • Eight words which are semantically very similar and are all connected with each other by a clear well-known relation. (Cluster)
  • Two words which are very similar to the ones in the cluster.
  • Two words which are similar and related to the ones in the cluster.
  • Two words which are related, but not similar to the ones in the cluster.
  • Two words which are unrelated and not similar to the ones in the cluster.

Initialize the EmbeddingEvaluator

The EmbeddingEvaluator has three parameters as input:

  • Input Metrics:
    A dictionary with a list of the paths for the input evaluation files.

Example:

input_metric = {'analogy': ['file_1', 'file_2'],
                'outlier': ['file_1']}
  • Input Models: A dictionary with the model names and the paths to the models.

Example:

input_model = {'model_1': 'path_1', 
               'model_2': 'path_2'}

Initialize the class:

emb_evaluator = EmbeddingMetrics(input_metric, input_model)

Summary a model's metrics

To summarize the metrics of a model.

emb_evaluator.summary_metrics('model_1') 

Compare models metrics

To compare the metrics of two or more models.

emb_evaluator.compare_models(['model_1', 'model_2']) 

References

Levy, O. and Goldberg, Y.: Linguistic Regularities in Sparse and Explicit Word Representations (2014) Collados, J.C. and Navigli, R.: Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations (2016)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedding-evaluator-0.0.1.tar.gz (10.8 kB view hashes)

Uploaded source

Built Distribution

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page