Skip to main content

Methods for evaluating low-resource word embedding models trained with gensim

Project description

gensim-evaluations

This library provides methods for evaluating word embedding models loaded with gensim. Currently, it implements two methods designed specifically for the evaluation of low-resource models. The code allows users to automatically create custom test sets in any of the 581 languages supported by Wikidata and then to evaluate on them using the OddOneOut and Topk methods proposed in this paper.

Basic Usage

Installation

Install from PyPi

$ pip install gensim-evaluations

Loading a model

These methods have been designed for evaluation of embedding models loaded through Gensim. As an example, we'll first load the famous pre-trained word2vec model from Mikolov et. al.

import gensim.downloader as api
from gensim.models import Word2Vec

model = api.load('word2vec-google-news-300')

A complete list of pre-trained models available through gensim can be found here. Of course, you can always use gensim to train and load your own model.

Generating custom language-specific test sets

In addition to a model, OddOneOut and Topk require a custom test set of categories. Each category contains a list of words belonging to it. We can easily generate a custom test set by selecting a few relevant items from Wikidata. For example we might choose

  • Tibetan Buddhist Monastery - Q54074585
  • News Agency - Q192283
  • Algorithm - Q8366
  • Theorem - Q65943
  • Mathematical Concept - Q24034552
  • Human Biblical Figure - Q20643955
  • Capital - Q5119
  • Country - Q6256
  • Mythical Character - Q4271324
  • Emotion - Q9415
  • Negative Emotion - Q60539481

As you can see, each of these classes has an associated code in the Wikidata Knowledgebase. These classes are related to other items in the knowledgebase through certain properties. One of the most important of these is the instance of property P31 which links items that are a particular example of a class to that class.

For example, Wikinews is an instance of Q192283 and Chuzang is an instance of Q54074585

Following this basic idea, we can generate test set(s) composed of all words in Wikidata belonging to these categories in any language(s) supported by the project.

from lreval.wikiqueries import generate_test_set

categories = ['Q54074585','Q192283','Q8366','Q65943','Q24034552','Q20643955',
           'Q5119','Q6256','Q4271324','Q9415','Q60539481']

langs = ['en','la']
generate_test_set(items=categories,languages=langs,filename='test_set')

All that is required for the generate_test_set function is a list of Wikidata items to be used as categories and a list of language codes. The test set(s) will be saved as .txt file(s) at location specified by the filename parameter. The appropriate language code is automatically appended to the corresponding filename.

Here are some useful links to help determine the languages and categories available.

It should also be noted that category sizes will vary. In particular a broad category such as human (Q5), contains 8,255,736 instances which is too large to work with as a single category. It is advised that you either use SQID to filter down to categories that have a reasonable number of entries or test your query on the Wikidata Query Service to make sure it runs before using it as a category.

Evaluation Using Topk and OddOneOut

We can now evaluate the word2vec model (which we loaded earlier) on these newly generated test sets using both Topk and OddOneOut

from lreval.evaluation import OddOneOut, Topk

topk_result = Topk(cat_file='test_set_en.txt',model=model, k=3, allow_oov=True)
odd_out_result = OddOneOut(cat_file='test_set_en.txt',model=model, k_in=3, allow_oov=True, sample_size=1000)

print('topk_result=', topk_result)
print('odd_out_result=', odd_out_result)

The Topk and OddOneOut functions both return a 5-tuple containing:

  1. overall accuracy
  2. category accuracy (float accuracy for each category)
  3. list of skipped categories
  4. overall raw score (total number of correct comparisons)
  5. category raw score (number of correct comparisons for each category)

Contact

Feel free to reach out to n8stringham@gmail.com with any questions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-evaluations-0.0.3.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

gensim_evaluations-0.0.3-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file gensim-evaluations-0.0.3.tar.gz.

File metadata

  • Download URL: gensim-evaluations-0.0.3.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for gensim-evaluations-0.0.3.tar.gz
Algorithm Hash digest
SHA256 e514d095b2688ffaafd6d03776b18730d041c3d4f894d49280048c24b621048f
MD5 f359853126a8e8c157d2ee5d4b1c593a
BLAKE2b-256 16f5759d2d7646756f91d7f0d17079d902e19a33f707a6e5998f5e1702bdea85

See more details on using hashes here.

File details

Details for the file gensim_evaluations-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: gensim_evaluations-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for gensim_evaluations-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8d0ed1fbcb300a8f5cc1b5ccf06c4bb9f079aef63bfa29700d59f1602a7fca38
MD5 9013a07ce58183e7be1aff780a10a04b
BLAKE2b-256 d4a2a3d2794d6cf10e60bc96863d62a0ca2bfb36e01be3feca6f80da30168290

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page