Skip to main content

Add a short description here!

Project description

spaCy WordNet

spaCy Wordnet is a simple custom component for using WordNet, MultiWordnet and WordNet domains with spaCy.

The component combines the NLTK wordnet interface with WordNet domains to allow users to:

  • Get all synsets for a processed token. For example, getting all the synsets (word senses) of the word bank.
  • Get and filter synsets by domain. For example, getting synonyms of the verb withdraw in the financial domain.

Getting started

The spaCy WordNet component can be easily integrated into spaCy pipelines. You just need the following:

Prerequisites

  • Python 3.X
  • spaCy

You also need to install the following NLTK wordnet data:

python -m nltk.downloader wordnet
python -m nltk.downloader omw

Install

pip install spacy-wordnet

Supported languages

We currently support Spanish, English and Portuguese, but we welcome contributions in order to add and test new languages supported by spaCy and NLTK.

Usage

English example

import spacy

from spacy_wordnet.wordnet_annotator import WordnetAnnotator 

# Load an spacy model (supported models are "es", "en" and "pt") 
nlp = spacy.load('en')
# Spacy 3.x
nlp.add_pipe("spacy_wordnet", after='tagger', config={'lang': nlp.lang})
# Spacy 2.x
# self.nlp_en.add_pipe(WordnetAnnotator(self.nlp_en.lang))
token = nlp('prices')[0]

# wordnet object link spacy token with nltk wordnet interface by giving acces to
# synsets and lemmas 
token._.wordnet.synsets()
token._.wordnet.lemmas()

# And automatically tags with wordnet domains
token._.wordnet.wordnet_domains()

spaCy WordNet lets you find synonyms by domain of interest for example economy

economy_domains = ['finance', 'banking']
enriched_sentence = []
sentence = nlp('I want to withdraw 5,000 euros')

# For each token in the sentence
for token in sentence:
    # We get those synsets within the desired domains
    synsets = token._.wordnet.wordnet_synsets_for_domain(economy_domains)
    if not synsets:
        enriched_sentence.append(token.text)
    else:
        lemmas_for_synset = [lemma for s in synsets for lemma in s.lemma_names()]
        # If we found a synset in the economy domains
        # we get the variants and add them to the enriched sentence
        enriched_sentence.append('({})'.format('|'.join(set(lemmas_for_synset))))

# Let's see our enriched sentence
print(' '.join(enriched_sentence))
# >> I (need|want|require) to (draw|withdraw|draw_off|take_out) 5,000 euros
    

Portuguese example

import spacy

from spacy_wordnet.wordnet_annotator import WordnetAnnotator 

# Load an spacy model (you need to download the spacy pt model) 
nlp = spacy.load('pt')
nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')
text = "Eu quero retirar 5.000 euros"
economy_domains = ['finance', 'banking']
enriched_sentence = []
sentence = nlp(text)

# For each token in the sentence
for token in sentence:
    # We get those synsets within the desired domains
    synsets = token._.wordnet.wordnet_synsets_for_domain(economy_domains)
    if not synsets:
        enriched_sentence.append(token.text)
    else:
        lemmas_for_synset = [lemma for s in synsets for lemma in s.lemma_names('por')]
        # If we found a synset in the economy domains
        # we get the variants and add them to the enriched sentence
        enriched_sentence.append('({})'.format('|'.join(set(lemmas_for_synset))))

# Let's see our enriched sentence
print(' '.join(enriched_sentence))
# >> Eu (querer|desejar|esperar) retirar 5.000 euros

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-wordnet-0.0.5b2.tar.gz (649.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spacy_wordnet-0.0.5b2-py2.py3-none-any.whl (650.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file spacy-wordnet-0.0.5b2.tar.gz.

File metadata

  • Download URL: spacy-wordnet-0.0.5b2.tar.gz
  • Upload date:
  • Size: 649.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for spacy-wordnet-0.0.5b2.tar.gz
Algorithm Hash digest
SHA256 b484c9fb8992129d797ab08dafd97c404e54812e3619b10907442f481b2d125c
MD5 21db8fa9e61cdd560326e4115d817232
BLAKE2b-256 9856199085da41da4879a613c738105d35d26797af21e337266eddbbf6f671e1

See more details on using hashes here.

File details

Details for the file spacy_wordnet-0.0.5b2-py2.py3-none-any.whl.

File metadata

  • Download URL: spacy_wordnet-0.0.5b2-py2.py3-none-any.whl
  • Upload date:
  • Size: 650.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5

File hashes

Hashes for spacy_wordnet-0.0.5b2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4db898963b4abf4340be1315973c18351657120dcac6f51a4ac47582523fdc0f
MD5 8cf022fb08af9606ad1631d2c11c7e1a
BLAKE2b-256 3734325b40722174ff96c780a85f1df90799153c40dd76bd0b666a6e2328218d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page