Skip to main content

Embedding Evaluator

Project description

Embedding Evaluator

EmbeddingEvaluator is a tool to provide metrics for evaluating different embedding models.

The current version only supports evaluation of evaluate only embeddings of the type:

  • FastText

It evaluates the embeddings based on the two following metrics:

  • Analogy
  • Outlier Detection

Installation

The EmbeddingEvaluator can be installed from PyPi:

pip install embeddingevaluator

Usage

Analogy Metrics

To use the EmbeddingEvaluator to measure different embeddings basead on analogy metrics the user needs a file with the following configuration:

Word 1 Word 2 Word 3 Word 4
1st Pair 1st Word 1st Pair 2nd Word 2nd Pair 1st Word 2nd Pair 2nd Word
Men King Women Queen

Ouliter Detection

To use the EmbeddingEvaluator to measure different embeddings basead on outlier detection metrics the user needs a file with the following configuration:

  • Eight words which are semantically very similar and are all connected with each other by a clear well-known relation. (Cluster)
  • Two words which are very similar to the ones in the cluster.
  • Two words which are similar and related to the ones in the cluster.
  • Two words which are related, but not similar to the ones in the cluster.
  • Two words which are unrelated and not similar to the ones in the cluster.

Initialize the EmbeddingEvaluator

The EmbeddingEvaluator has three parameters as input:

  • Input Metrics:
    A dictionary with a list of the paths for the input evaluation files.

Example:

input_metric = {'analogy': ['file_1', 'file_2'],
                'outlier': ['file_1']}
  • Input Models: A dictionary with the model names and the paths to the models.

Example:

input_model = {'model_1': 'path_1', 
               'model_2': 'path_2'}

Initialize the class:

emb_evaluator = EmbeddingMetrics(input_metric, input_model)

Summary a model's metrics

To summarize the metrics of a model.

emb_evaluator.summary_metrics('model_1') 

Compare models metrics

To compare the metrics of two or more models.

emb_evaluator.compare_models(['model_1', 'model_2']) 

References

Levy, O. and Goldberg, Y.: Linguistic Regularities in Sparse and Explicit Word Representations (2014) Collados, J.C. and Navigli, R.: Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations (2016)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedding-evaluator-0.0.1.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

embedding_evaluator-0.0.1-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file embedding-evaluator-0.0.1.tar.gz.

File metadata

  • Download URL: embedding-evaluator-0.0.1.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for embedding-evaluator-0.0.1.tar.gz
Algorithm Hash digest
SHA256 ce6b2cb3eb116e1c2ca042ec114c6e2a47a3e4b70946c0df00e01d0520c2c599
MD5 bb812d16ec5f44635c83ab1b1de6e228
BLAKE2b-256 dc6f5b775a86832cd7e25ba5f04cac8d9746713cd89a72df35b8520fbe5790f8

See more details on using hashes here.

File details

Details for the file embedding_evaluator-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: embedding_evaluator-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for embedding_evaluator-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 66ff2b11a3fd85cee08d89af39aefd6b70e72e07e4ec4c51b296b3ae06e623d0
MD5 2bf804f613c35d87e88170131a9c92d5
BLAKE2b-256 062e42013b11d6e1bd27103089163525571d1797b82beb01b00e861a92d016b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page