Skip to main content

A library for working with semantics in multiple languages..

Project description

Semeval (Semantic Evaluation)

Semeval is a Python package for modelling various aspects of meaning in languages.

Installation

The package can be installed via PIP: pip install -U semeval

Download models

The following command downloads word vector embeddings for English and Finnish

python3 -m semeval.download -l eng fin -m embeddings

We currently suppor eng, fin, kpv, myv, mdf, rus, liv and sms.

Relatedness models can be downloaded using the command:

python3 -m semeval.download -l eng fin -m relatedness

Currently, only English and Finnish are supported in the relatedness model.

Usage

Word embeddings

Load embeddings of a language by running:

from semeval import Embeddings
e = Embeddings("eng")

After this, you can find related words

e.theme(['shoe', 'clothes']) #outputs a word describing all input words
>> ('clothing', 0.8700831532478333)

e.neighbours('hi') #outputs the nearest neighbors
>> [('hey', 0.6976003050804138), ('jeez', 0.6230848431587219), ('ya', 0.6213312149047852), ('hello', 0.6144036650657654) ...]

e.analogy('man', 'king', 'woman') #analogous words
>> [('monarch', 0.6457855105400085), ('regnant', 0.6354122161865234)...]

e.most_similar(positive=['hello', 'world'], negative=['king'], topn=10)
>> [('Tamana', 1.1119599342346191), ('Ibaraki', 1.087041974067688), ('Makuhari', 1.0793628692626953),... ]

Get vector representations

e.centroid(['hi', 'hello']) #the centroid vector of the input words
>> [0. 0.10486 -0.23701501 0.1293 0.052645 -0.027155...]

e.to_vector('this is a great api !'.split(' ')) # a vector representing the sentence
>> [1.1630000e-02 -2.2303334e-02  8.4316663e-02 -2.1063333e-02...]

e.vector("king")
>> [0.05574 -0.16716 0.10282 -0.10851 0.08783 -0.09499 0.16031...]

Get similar words in another language

e2 = Embeddings("fin")
e.project('king', e2)
>>[('Ahasveros', 0.5551087856292725), ('kuningas', 0.5522325038909912), ('kuninkas', 0.5242758393287659)..]

Word similarity

e.similarity('hi', 'bye')
>> 0.28805155

Vocabulary

e.vocabulary()
>> {'astrologically': <Vocab object>, 'spinto': <Vocab object>, 'NortelNet': <Vocab object>...}

Server-mode

Server-mode is optimal for some cases such debugging or not wanting to wait for multiple models to load. To start word embeddings server, run the below command in the terminal: python -m semeval.server --service embeddings

Once the server is loaded, the service is accessible through EmbeddingsAPI class. Note that the language/s must be passed every call, otherwise the server cannot know which model to use. Here is an example of accessing the service from Python.

from semeval import EmbeddingsAPI

api = EmbeddingsAPI()
api.theme(words=['shoe', 'clothes'], lang='eng')
api.neighbours(word='hi', threshold=0.4, lang='eng')
api.analogy('man', 'king', 'woman', topn=10, lang='fin')
api.centroid(words=['hi', 'hello'], lang='eng')
api.to_vector(tokens='this is a great api !'.split(' '), lang='eng')
api.align(word='king', lang1='eng', lang2='fin')
api.similarity(w1='hi', w2='bye', lang='eng')
api.most_similar(positive=['hello', 'world'], negative=['king'], topn=10, lang='eng')
api.vector(word='king', lang='eng')
api.vocabulary(lang='eng')

Relatedness

Load the relatedness model:

from semeval import Relatedness
m = Relatedness(lang='eng')

Get most 5 related word to the word car:

m.get_sorted_rel('car')[:5]
>> [('park', 0.20570222), ('insurance', 0.085593514), ('parking', 0.06783264), ('hire', 0.036158927), ('cheap', 0.028782822)]

Get relatedness score between two words:

m.get_rel('car')['insurance']
>> 0.085593514

Top 10 interpretations for the metaphor "Alcohol is a Crutch". NOTE: unlike the original paper, there is no filtering (e.g., POS filtering) applied in this function. Read the paper for further details and post-processing steps to improve the results.

m.interpret('alcohol', 'crutch')[:10]
>> [('use', 0), ('cane', 0), ('smoking', 1), ('psychological', 2), ('dependence', 2), ('emotional', 3), ('cigarette', 3), ('drug', 4), ('bandage', 4), ('week', 5)]

Metaphoricity scores for the tenor computer, vehicle creative and expression 'The algorithm for painting.'

m.metaphoricity('computer', 'creative', ['the', 'algorithm', 'for', 'painting'], 300) # 0 to select all
>> (3.2977940826411467e-07, 0.000510800164192915, 0.00025556497180058955) # (magnitude score, difference score, avg if both positive)

Business solutions

Rootroo logo

When your NLP needs grow out of what UralicNLP can provide, we have your back! Rootroo offers consulting related to a variety of NLP tasks. We have a strong academic background in the state-of-the-art AI solutions for every NLP need. Just contact us, we won't bite.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semeval-1.0.3.tar.gz (17.0 kB view hashes)

Uploaded Source

Built Distribution

semeval-1.0.3-py3-none-any.whl (15.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page