Skip to main content

An Python Library for training and evaluating on Incremental Word Embedding.

Project description

Word Embeddings Benchmarks


https://travis-ci.org/kudkudak/word-embeddings-benchmarks.svg?branch=master

Updated WEB version. Original repository: https://github.com/kudkudak/word-embeddings-benchmarks

Word Embedding Benchmark (web) package is focused on providing methods for easy evaluating and reporting

results on common benchmarks (analogy, similarity and categorization).

Research goal of the package is to help drive research in word embeddings by easily accessible reproducible

results (as there is a lot of contradictory results in the literature right now).

This should also help to answer question if we should devise new methods for evaluating word embeddings.

To evaluate your embedding (converted to word2vec or python dict pickle)

on all fast-running benchmarks execute ./scripts/eval_on_all.py <path-to-file>.

See here results for embeddings available in the package.

Warnings and Disclaimers:

  • Analogy test does not normalize internally word embeddings.

  • Package is currently under development, and we expect within next few months an official release. The main issue that might hit you at the moment is rather long embeddings loading times (especially if you use fetchers).

Please also refer to our recent publication on evaluation methods https://arxiv.org/abs/1702.02170.

Features:

  • scikit-learn API and conventions

  • 18 popular datasets

  • 11 word embeddings (word2vec, HPCA, morphoRNNLM, GloVe, LexVec, ConceptNet, HDC/PDC and others)

  • methods to solve analogy, similarity and categorization tasks

Included datasets:

  • TR9856

  • WordRep

  • Google Analogy

  • MSR Analogy

  • SemEval2012

  • AP

  • BLESS

  • Battig

  • ESSLI (2b, 2a, 1c)

  • WS353

  • MTurk

  • RG65

  • RW

  • SimLex999

  • MEN

Note: embeddings are not hosted currently on a proper server, if the download is too slow consider downloading embeddings manually from original sources referred in docstrings.

Dependencies


Please see requirements.txt.

Install


This package uses setuptools. You can install it running:

python setup.py install

If you have problems during this installation. First you may need to install the dependencies:

pip install -r requirements.txt

If you already have the dependencies listed in requirements.txt installed,

to install in your home directory, use:

python setup.py install --user

To install for all users on Unix/Linux:

python setup.py build

sudo python setup.py install

You can also install it in development mode with:

python setup.py develop

Examples


See examples folder.

License


Code is licensed under MIT, however available embeddings distributed within package might be under different license. If you are unsure please reach to authors (references are included in docstrings)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

word-embeddings-benchmarks-0.0.1.tar.gz (41.4 kB view details)

Uploaded Source

Built Distribution

word_embeddings_benchmarks-0.0.1-py3-none-any.whl (42.5 kB view details)

Uploaded Python 3

File details

Details for the file word-embeddings-benchmarks-0.0.1.tar.gz.

File metadata

File hashes

Hashes for word-embeddings-benchmarks-0.0.1.tar.gz
Algorithm Hash digest
SHA256 085c9e803ca6921202361541a351fb890861137461bd39cc0ca2f2e0b2f87cb9
MD5 8a8192df0d44c7e27c48d6b1ca4a8feb
BLAKE2b-256 53f1585d92f2a8276dc9a6fd4daba86ee4755f2d2aef073a3bf8f1ea56c27d50

See more details on using hashes here.

File details

Details for the file word_embeddings_benchmarks-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for word_embeddings_benchmarks-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4813edf2ac47aa535fbf204320014b6a7ec3c02aa54765ae62edc3ba41662a8f
MD5 d563354205a275dde730d5b1630216a2
BLAKE2b-256 d331947a46db86268f57d7772c640d470f3de5c276c3833b6fc6082750a008a4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page