Skip to main content

Methods for evaluating low-resource word embedding models trained with gensim

Project description

gensim-evaluations

This library provides methods for evaluating word embedding models loaded with gensim. Currently, it implements two methods designed specifically for the evaluation of low-resource models. The code allows users to automatically create custom test sets in any of the 581 languages supported by Wikidata and then to evaluate on them using the OddOneOut and Topk methods proposed in this paper.

Basic Usage

Installation

Install from PyPi

$ pip install gensim-evaluations

Loading a model

These methods have been designed for evaluation of embedding models loaded through Gensim. As an example, we'll first load the famous pre-trained word2vec model from Mikolov et. al.

import gensim.downloader as api
from gensim.models import Word2Vec

model = api.load('word2vec-google-news-300')

A complete list of pre-trained models available through gensim can be found here. Of course, you can always use gensim to train and load your own model.

Generating custom language-specific test sets

In addition to a model, OddOneOut and Topk require a custom test set of categories. Each category contains a list of words belonging to it. We can easily generate a custom test set by selecting a few relevant items from Wikidata. For example we might choose

  • Tibetan Buddhist Monastery - Q54074585
  • News Agency - Q192283
  • Algorithm - Q8366
  • Theorem - Q65943
  • Mathematical Concept - Q24034552
  • Human Biblical Figure - Q20643955
  • Capital - Q5119
  • Country - Q6256
  • Mythical Character - Q4271324
  • Emotion - Q9415
  • Negative Emotion - Q60539481

As you can see, each of these classes has an associated code in the Wikidata Knowledgebase. These classes are related to other items in the knowledgebase through certain properties. One of the most important of these is the instance of property P31 which links items that are a particular example of a class to that class.

For example, Wikinews is an instance of Q192283 and Chuzang is an instance of Q54074585

Following this basic idea, we can generate test set(s) composed of all words in Wikidata belonging to these categories in any language(s) supported by the project.

from gensim_evaluations import wikiqueries

categories = ['Q54074585','Q192283','Q8366','Q65943','Q24034552','Q20643955',
           'Q5119','Q6256','Q4271324','Q9415','Q60539481']

langs = ['en','la']
wikiqueries.generate_test_set(items=categories,languages=langs,filename='test_set')

All that is required for the generate_test_set function is a list of Wikidata items to be used as categories and a list of language codes. The test set(s) will be saved as .txt file(s) at location specified by the filename parameter. The appropriate language code is automatically appended to the corresponding filename.

Here are some useful links to help determine the languages and categories available.

It should also be noted that category sizes will vary. In particular a broad category such as human (Q5), contains 8,255,736 instances which is too large to work with as a single category. It is advised that you either use SQID to filter down to categories that have a reasonable number of entries or test your query on the Wikidata Query Service to make sure it runs before using it as a category.

Evaluation Using Topk and OddOneOut

We can now evaluate the word2vec model (which we loaded earlier) on these newly generated test sets using both Topk and OddOneOut

from gensim_evaluations import methods

topk_result = methods.Topk(cat_file='test_set_en.txt',model=model, k=3, allow_oov=True)
odd_out_result = methods.OddOneOut(cat_file='test_set_en.txt',model=model, k_in=3, allow_oov=True, sample_size=1000)

print('topk_result=', topk_result)
print('odd_out_result=', odd_out_result)

The Topk and OddOneOut functions both return a 5-tuple containing:

  1. overall accuracy
  2. category accuracy (float accuracy for each category)
  3. list of skipped categories
  4. overall raw score (total number of correct comparisons)
  5. category raw score (number of correct comparisons for each category)

Contact

Feel free to reach out to n8stringham@gmail.com with any questions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-evaluations-0.0.4.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

gensim_evaluations-0.0.4-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file gensim-evaluations-0.0.4.tar.gz.

File metadata

  • Download URL: gensim-evaluations-0.0.4.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for gensim-evaluations-0.0.4.tar.gz
Algorithm Hash digest
SHA256 d2273a81dc7cd97c74ee3dc1657afe71ec6101fbcd98c722189833def2ef90f6
MD5 5ddcfb9cacd82f7b4d376ea963aa5c7d
BLAKE2b-256 740706a96a8159478683bb8d2e692e72d534945116a6c3c3ad3bc9432702f872

See more details on using hashes here.

File details

Details for the file gensim_evaluations-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: gensim_evaluations-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for gensim_evaluations-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 85a6e94d69a9f262d12b508b2793537e52349ab5f7e196f112f59d773ef31a67
MD5 d6eabf430148ead92f5b3be318509f48
BLAKE2b-256 ba45408ec1f8d9e5969f617c8b5708ecc96011c76ec9000b2983650935352067

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page