A library for working with semantics in multiple languages..
Project description
Semeval (Semantic Evaluation)
Semeval is a Python package for modelling various aspects of meaning in languages.
Installation
The package can be installed via PIP: pip install -U semeval
Download models
The following command downloads word vector embeddings for English and Finnish
python3 -m semeval.download -l eng fin -m embeddings
We currently suppor eng, fin, kpv, myv, mdf, rus, liv and sms.
Relatedness models can be downloaded using the command:
python3 -m semeval.download -l eng fin -m relatedness
Currently, only English and Finnish are supported in the relatedness model.
Usage
Word embeddings
Load embeddings of a language by running:
from semeval import Embeddings
e = Embeddings("eng")
After this, you can find related words
e.theme(['shoe', 'clothes']) #outputs a word describing all input words
>> ('clothing', 0.8700831532478333)
e.neighbours('hi') #outputs the nearest neighbors
>> [('hey', 0.6976003050804138), ('jeez', 0.6230848431587219), ('ya', 0.6213312149047852), ('hello', 0.6144036650657654) ...]
e.analogy('man', 'king', 'woman') #analogous words
>> [('monarch', 0.6457855105400085), ('regnant', 0.6354122161865234)...]
e.most_similar(positive=['hello', 'world'], negative=['king'], topn=10)
>> [('Tamana', 1.1119599342346191), ('Ibaraki', 1.087041974067688), ('Makuhari', 1.0793628692626953),... ]
Get vector representations
e.centroid(['hi', 'hello']) #the centroid vector of the input words
>> [0. 0.10486 -0.23701501 0.1293 0.052645 -0.027155...]
e.to_vector('this is a great api !'.split(' ')) # a vector representing the sentence
>> [1.1630000e-02 -2.2303334e-02 8.4316663e-02 -2.1063333e-02...]
e.vector("king")
>> [0.05574 -0.16716 0.10282 -0.10851 0.08783 -0.09499 0.16031...]
Get similar words in another language
e2 = Embeddings("fin")
e.project('king', e2)
>>[('Ahasveros', 0.5551087856292725), ('kuningas', 0.5522325038909912), ('kuninkas', 0.5242758393287659)..]
Word similarity
e.similarity('hi', 'bye')
>> 0.28805155
Vocabulary
e.vocabulary()
>> {'astrologically': <Vocab object>, 'spinto': <Vocab object>, 'NortelNet': <Vocab object>...}
Server-mode
Server-mode is optimal for some cases such debugging or not wanting to wait for multiple models to load. To start word
embeddings server, run the below command in the terminal: python -m semeval.server --service embeddings
Once the server is loaded, the service is accessible through EmbeddingsAPI class.
Note that the language/s must be passed every call, otherwise the server cannot know which model to use.
Here is an example of accessing the service from Python.
from semeval import EmbeddingsAPI
api = EmbeddingsAPI()
api.theme(words=['shoe', 'clothes'], lang='eng')
api.neighbours(word='hi', threshold=0.4, lang='eng')
api.analogy('man', 'king', 'woman', topn=10, lang='fin')
api.centroid(words=['hi', 'hello'], lang='eng')
api.to_vector(tokens='this is a great api !'.split(' '), lang='eng')
api.align(word='king', lang1='eng', lang2='fin')
api.similarity(w1='hi', w2='bye', lang='eng')
api.most_similar(positive=['hello', 'world'], negative=['king'], topn=10, lang='eng')
api.vector(word='king', lang='eng')
api.vocabulary(lang='eng')
Relatedness
Load the relatedness model:
from semeval import Relatedness
m = Relatedness(lang='eng')
Get most 5 related word to the word car:
m.get_sorted_rel('car')[:5]
>> [('park', 0.20570222), ('insurance', 0.085593514), ('parking', 0.06783264), ('hire', 0.036158927), ('cheap', 0.028782822)]
Get relatedness score between two words:
m.get_rel('car')['insurance']
>> 0.085593514
Top 10 interpretations for the metaphor "Alcohol is a Crutch". NOTE: unlike the original paper, there is no filtering (e.g., POS filtering) applied in this function. Read the paper for further details and post-processing steps to improve the results.
m.interpret('alcohol', 'crutch')[:10]
>> [('use', 0), ('cane', 0), ('smoking', 1), ('psychological', 2), ('dependence', 2), ('emotional', 3), ('cigarette', 3), ('drug', 4), ('bandage', 4), ('week', 5)]
Metaphoricity scores for the tenor computer, vehicle creative and expression 'The algorithm for painting.'
m.metaphoricity('computer', 'creative', ['the', 'algorithm', 'for', 'painting'], 300) # 0 to select all
>> (3.2977940826411467e-07, 0.000510800164192915, 0.00025556497180058955) # (magnitude score, difference score, avg if both positive)
Business solutions
When your NLP needs grow out of what UralicNLP can provide, we have your back! Rootroo offers consulting related to a variety of NLP tasks. We have a strong academic background in the state-of-the-art AI solutions for every NLP need. Just contact us, we won't bite.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semeval-1.0.3.tar.gz.
File metadata
- Download URL: semeval-1.0.3.tar.gz
- Upload date:
- Size: 17.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
417eee133b3ef4ba04e967d2ea65f7776ebd8f24c781d76e1fe95f66a4257b56
|
|
| MD5 |
d4586b43bb262416a11c325b00343601
|
|
| BLAKE2b-256 |
9c5bc1675e8922f27d28bca0b7f1ba771d109f3810146115640193188969c5ca
|
File details
Details for the file semeval-1.0.3-py3-none-any.whl.
File metadata
- Download URL: semeval-1.0.3-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3b9e98ee15d5e1d00f6205e7e81575ca1d42ae7ffbf9d45f3be5d3e4020d7f7
|
|
| MD5 |
9c6e09665b591e777c701bee4a9ca430
|
|
| BLAKE2b-256 |
36dbfe5cfcd5904ac575bbe6c1ef5fa1eb6156799fe9a7dcdea659430c0b5b46
|