Add a short description here!
Project description
spaCy WordNet
spaCy Wordnet is a simple custom component for using WordNet, MultiWordnet and WordNet domains with spaCy.
The component combines the NLTK wordnet interface with WordNet domains to allow users to:
- Get all synsets for a processed token. For example, getting all the synsets (word senses) of the word
bank
. - Get and filter synsets by domain. For example, getting synonyms of the verb
withdraw
in the financial domain.
Getting started
The spaCy WordNet component can be easily integrated into spaCy pipelines. You just need the following:
Prerequisites
- Python 3.X
- spaCy
You also need to install the following NLTK wordnet data:
python -m nltk.downloader wordnet
python -m nltk.downloader omw
Install
pip install spacy-wordnet
Supported languages
Almost all Open Multi Wordnet languages are supported.
Usage
Once you choose the desired language (from the list of supported ones above), you will need to manually download a spaCy model for it. Check the list of available models for each language at SpaCy 2.x or SpaCy 3.x.
English example
Download example model:
python -m spacy download en_core_web_sm
Run:
import spacy
from spacy_wordnet.wordnet_annotator import WordnetAnnotator
# Load an spacy model
nlp = spacy.load('en_core_web_sm')
# Spacy 3.x
nlp.add_pipe("spacy_wordnet", after='tagger')
# Spacy 2.x
# nlp.add_pipe(WordnetAnnotator(nlp, name="spacy_wordnet"), after='tagger')
token = nlp('prices')[0]
# wordnet object link spacy token with nltk wordnet interface by giving acces to
# synsets and lemmas
token._.wordnet.synsets()
token._.wordnet.lemmas()
# And automatically tags with wordnet domains
token._.wordnet.wordnet_domains()
spaCy WordNet lets you find synonyms by domain of interest for example economy
economy_domains = ['finance', 'banking']
enriched_sentence = []
sentence = nlp('I want to withdraw 5,000 euros')
# For each token in the sentence
for token in sentence:
# We get those synsets within the desired domains
synsets = token._.wordnet.wordnet_synsets_for_domain(economy_domains)
if not synsets:
enriched_sentence.append(token.text)
else:
lemmas_for_synset = [lemma for s in synsets for lemma in s.lemma_names()]
# If we found a synset in the economy domains
# we get the variants and add them to the enriched sentence
enriched_sentence.append('({})'.format('|'.join(set(lemmas_for_synset))))
# Let's see our enriched sentence
print(' '.join(enriched_sentence))
# >> I (need|want|require) to (draw|withdraw|draw_off|take_out) 5,000 euros
Portuguese example
Download example model:
python -m spacy download pt_core_news_sm
Run:
import spacy
from spacy_wordnet.wordnet_annotator import WordnetAnnotator
# Load an spacy model
nlp = spacy.load('pt_core_news_sm')
# Spacy 3.x
nlp.add_pipe("spacy_wordnet", after='tagger', config={'lang': nlp.lang})
# Spacy 2.x
# nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')
text = "Eu quero retirar 5.000 euros"
economy_domains = ['finance', 'banking']
enriched_sentence = []
sentence = nlp(text)
# For each token in the sentence
for token in sentence:
# We get those synsets within the desired domains
synsets = token._.wordnet.wordnet_synsets_for_domain(economy_domains)
if not synsets:
enriched_sentence.append(token.text)
else:
lemmas_for_synset = [lemma for s in synsets for lemma in s.lemma_names('por')]
# If we found a synset in the economy domains
# we get the variants and add them to the enriched sentence
enriched_sentence.append('({})'.format('|'.join(set(lemmas_for_synset))))
# Let's see our enriched sentence
print(' '.join(enriched_sentence))
# >> Eu (querer|desejar|esperar) retirar 5.000 euros
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file spacy-wordnet-0.1.0.tar.gz
.
File metadata
- Download URL: spacy-wordnet-0.1.0.tar.gz
- Upload date:
- Size: 651.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2102145ca92e37e94d775e80a05c7e303d977a5a7fd4ff4d5de6e17a251117cd |
|
MD5 | 731f9fe4cb88c76f374b8f9db6dfd8f3 |
|
BLAKE2b-256 | 095f46b883073e4d9ab68a6635b33c6f48e8496de4feb0838eaadd38d0921e32 |
File details
Details for the file spacy_wordnet-0.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: spacy_wordnet-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 652.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff81865fafa1bf9c84b8e741c57b5489ecb61a816b5f3316d093613dd9a6c437 |
|
MD5 | 4f84a297f5f965c29d823afa88b79d9e |
|
BLAKE2b-256 | 206e3263cc4117399c6df0460ee7b7f69e82ccfea2a924dd92f42af2279cdad7 |