Skip to main content

A Unified Framework for Intrinsic Evaluation of Word-Embedding Algorithms

Project description

vec2best: A Unified Framework for Intrinsic Evaluation of Word-Embedding Algorithms

DescriptionRequirementsInstallationUsage


Description

vec2best is a library for Python which represents a framework for evaluating word embeddings trained using various methods and hyper-parameters on a range of tasks from the literature. The tool yields a holistic evaluation metric for each model called the $PCE$ (Principal Component Evaluation).

vec2best implements the state-of-the-art intrinsic evaluations tasks of word similarity, word analogy, concept categorisation, and outlier detection over the benchmarks in the following table.

Task Evaluation Metric Benchmark
Similarity Spearman correlation Cosine similarity WS353, RG65, RW, MEN, MTurk287, SimLex999, MC30, MTurk771, YP130, Verb143, SimVerb3500, SemEval17, WS353REL, WS353SIM
Analogy Accuracy 3CosAdd, 3CosMul Google, MSR
Spearman correlation 3CosAdd SemEval2012
Categorization Purity Clustering AP, BLESS, BM (battig), ESSLI 1a, ESSLI 2b, ESSLI 2c
Outlier detection Accuracy Compactness score 8-8-8, WordSim500

Requirements

  • Python 3.6
  • scikit-learn
  • six
  • word-embeddings-benchmarks

The package also relies on a modified version on the following repositories for outlier detection:

Installation

vec2best can be installed through pip (the Python package manager) in the following way:

pip install vec2best

Usage

To compute the $PCE$ you need to apply the function compute_pce(path_to_model) and the only parameter that you need to set is the path in which you saved the embedding models (in a .vec or .txt format) you want to evaluate.

The function compute_pce(path_to_model) has other six parameters (categorization=True, similarity=True}, analogy=True, outlier_detection=True, pce_min=True, pce_max=True, pce_mean=True) set by default as True, and so the output consists in the evaluation of the models over the three tasks and over the $PCE^{MIN}$, $PCE^{MAX}$, $PCE^{MEAN}$. By setting some of those parameters as False, the $PCE$ can be computed over a subset of those tasks or the evaluation could be computed only for one or two of the three types of $PCE$.

The output is saved in the folder results/pce, and the output on the screen shows the percentage of explained variance of the first principal component, and the top 3 models according to the chosen $PCE$.

See the following example:

from vec2best import compute_pce
path_to_model = 'data/example_models' 
compute_pce(path_to_model, analogy=False,outlier_detection=False, 
pce_max=False, pce_mean=False)

The output will look like:

PCE min - percentage of explained variance: 0.95
                                 categorization    similarity    PCE_min
example_models/ft_0_5_50_5.vec   0.38              0.29          1.00
example_models/glove_5_50_5.vec  0.41              0.25          0.94
example_models/wv2_model_11.vec  0.24              0.17          0.34

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vec2best-1.1.0.tar.gz (94.3 kB view details)

Uploaded Source

Built Distribution

vec2best-1.1.0-py3-none-any.whl (209.4 kB view details)

Uploaded Python 3

File details

Details for the file vec2best-1.1.0.tar.gz.

File metadata

  • Download URL: vec2best-1.1.0.tar.gz
  • Upload date:
  • Size: 94.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.8

File hashes

Hashes for vec2best-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f1674b0b166c3eae6a762a91b3207391e51a562860283054439a3fbb4c6bca8b
MD5 5e63855571c851e9cd6118e8941ee67c
BLAKE2b-256 da662bf124d5b5a25fb2cce9e59e578ced4f974a43b0db79620e683f1aeedb47

See more details on using hashes here.

File details

Details for the file vec2best-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: vec2best-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 209.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.8

File hashes

Hashes for vec2best-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe3bc0830a1491f38986ff46615812919767907ae06f4ed414ce9909d001d0f9
MD5 4f6128e5afdf6acab6f0b803d9ba8ff4
BLAKE2b-256 0b8078ac73057c3b3afab777c4cf4eb1cab994ec3f24f9f19947aa99138075c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page