Methods for evaluating low-resource word embedding models trained with gensim

These details have not been verified by PyPI

Project links

Homepage

Project description

gensim-evaluations

This library provides methods for evaluating word embedding models loaded with gensim. Currently, it implements two methods designed specifically for the evaluation of low-resource models. The code allows users to automatically create custom test sets in any of the 581 languages supported by Wikidata and then to evaluate on them using the OddOneOut and Topk methods proposed in this paper.

Basic Usage

Installation

Install from PyPi

$ pip install gensim-evaluations

Loading a model

These methods have been designed for evaluation of embedding models loaded through Gensim. As an example, we'll first load the famous pre-trained word2vec model from Mikolov et. al.

import gensim.downloader as api
from gensim.models import Word2Vec

model = api.load('word2vec-google-news-300')

A complete list of pre-trained models available through gensim can be found here. Of course, you can always use gensim to train and load your own model.

Generating custom language-specific test sets

In addition to a model, OddOneOut and Topk require a custom test set of categories. Each category contains a list of words belonging to it. We can easily generate a custom test set by selecting a few relevant items from Wikidata. For example we might choose

Tibetan Buddhist Monastery - Q54074585
News Agency - Q192283
Algorithm - Q8366
Theorem - Q65943
Mathematical Concept - Q24034552
Human Biblical Figure - Q20643955
Capital - Q5119
Country - Q6256
Mythical Character - Q4271324
Emotion - Q9415
Negative Emotion - Q60539481

As you can see, each of these classes has an associated code in the Wikidata Knowledgebase. These classes are related to other items in the knowledgebase through certain properties. One of the most important of these is the instance of property P31 which links items that are a particular example of a class to that class.

For example, Wikinews is an instance of Q192283 and Chuzang is an instance of Q54074585

Following this basic idea, we can generate test set(s) composed of all words in Wikidata belonging to these categories in any language(s) supported by the project.

from gensim_evaluations import wikiqueries

categories = ['Q54074585','Q192283','Q8366','Q65943','Q24034552','Q20643955',
           'Q5119','Q6256','Q4271324','Q9415','Q60539481']

langs = ['en','la']
wikiqueries.generate_test_set(items=categories,languages=langs,filename='test_set')

All that is required for the generate_test_set function is a list of Wikidata items to be used as categories and a list of language codes. The test set(s) will be saved as .txt file(s) at location specified by the filename parameter. The appropriate language code is automatically appended to the corresponding filename.

Here are some useful links to help determine the languages and categories available.

List of languages supported by Wikidata - https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all
SQID Browser for all items in Wikidata containing the Instance of (P31) property - https://sqid.toolforge.org/#/browse?type=classes

It should also be noted that category sizes will vary. In particular a broad category such as human (Q5), contains 8,255,736 instances which is too large to work with as a single category. It is advised that you either use SQID to filter down to categories that have a reasonable number of entries or test your query on the Wikidata Query Service to make sure it runs before using it as a category.

Evaluation Using Topk and OddOneOut

We can now evaluate the word2vec model (which we loaded earlier) on these newly generated test sets using both Topk and OddOneOut

from gensim_evaluations import methods

topk_result = methods.Topk(cat_file='test_set_en.txt',model=model, k=3, allow_oov=True)
odd_out_result = methods.OddOneOut(cat_file='test_set_en.txt',model=model, k_in=3, allow_oov=True, sample_size=1000)

print('topk_result=', topk_result)
print('odd_out_result=', odd_out_result)

The Topk and OddOneOut functions both return a 5-tuple containing:

overall accuracy
category accuracy (float accuracy for each category)
list of skipped categories
overall raw score (total number of correct comparisons)
category raw score (number of correct comparisons for each category)

Contact

Feel free to reach out to n8stringham@gmail.com with any questions.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.2

Aug 16, 2021

0.1.1

Aug 16, 2021

0.1.0

Aug 16, 2021

0.0.7

Mar 5, 2021

0.0.6

Mar 4, 2021

0.0.5

Mar 4, 2021

This version

0.0.4

Oct 2, 2020

0.0.3

Oct 2, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-evaluations-0.0.4.tar.gz (7.2 kB view details)

Uploaded Oct 2, 2020 Source

Built Distribution

gensim_evaluations-0.0.4-py3-none-any.whl (17.0 kB view details)

Uploaded Oct 2, 2020 Python 3

File details

Details for the file gensim-evaluations-0.0.4.tar.gz.

File metadata

Download URL: gensim-evaluations-0.0.4.tar.gz
Upload date: Oct 2, 2020
Size: 7.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/0.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for gensim-evaluations-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`d2273a81dc7cd97c74ee3dc1657afe71ec6101fbcd98c722189833def2ef90f6`
MD5	`5ddcfb9cacd82f7b4d376ea963aa5c7d`
BLAKE2b-256	`740706a96a8159478683bb8d2e692e72d534945116a6c3c3ad3bc9432702f872`

See more details on using hashes here.

File details

Details for the file gensim_evaluations-0.0.4-py3-none-any.whl.

File metadata

Download URL: gensim_evaluations-0.0.4-py3-none-any.whl
Upload date: Oct 2, 2020
Size: 17.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/0.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for gensim_evaluations-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`85a6e94d69a9f262d12b508b2793537e52349ab5f7e196f112f59d773ef31a67`
MD5	`d6eabf430148ead92f5b3be318509f48`
BLAKE2b-256	`ba45408ec1f8d9e5969f617c8b5708ecc96011c76ec9000b2983650935352067`