Skip to main content

Word Sense Disambiguation wrapper

Project description

Word Sense Disambiguation wrapper

In natural language processing word sense disambiguation (WSD) is the problem of determining which "sense" (meaning) of a word is activated by the use of the word in a particular context, a process which appears to be largely unconscious in people.

This is a simple library that wrap two WSD methods: NLTK and Babelfy.

Requirements

You should run

pip3 install xmltodict
pip3 install nltk
pip3 install pywsd

The NLTK library requires more extra configurations, see this link to more details.

Methods

The wsdNLTK methods call the function pywsd.disambiguate which returns a mapping between words of the input text and their WornNet Synsets.

wsd = WrapperWSD()
wsd.wsdNLTK(u'My sister has a dog. She loves him.')
#output: [('sister', Synset('sister.n.02'), 3, 9), ('dog', Synset('pawl.n.01'), 16, 19), ('loves', Synset('sleep_together.v.01'), 25, 30)]

Instead of returning the WornNet Synsets, the method wsdNLTK_offset returns a mapping between words of the input text and their WornNet offset.

wsd.wsdNLTK_offset(u'My sister has a dog. She loves him.')
#output: [('president', 597265, 21, 30), ('USA', 8394922, 38, 41), ('best', 67379, 54, 58)]

A mapping between WordNet and Wikipedia was proposed in [Miller et al] available for download here. In the next example you can see some key-values of it.

wd2wiki = {
 1740: 'https://en.wikipedia.org/wiki/Madison_Square_Garden,_L.P.',
 2137: 'https://en.wikipedia.org/wiki/Abstraction',
 2452: 'https://en.wikipedia.org/wiki/Object_(philosophy)',
 2684: 'https://en.wikipedia.org/wiki/Computer_file',
 3553: 'https://en.wikipedia.org/wiki/Unit_of_alcohol',
 ...
 }

We used this mapping to link entities from Wikipedia for those cases where exists a correspondence.

wsd.wsdNLTK_links(u'My sister has a dog. She loves him.')
#output: [{'start': 38, 'end': 41, 'label': 'USA', 'link': 'United_States_Army'}]

On the other hand, we include Babelfy targetting BabelSynsets

wsd.wsdBabelfy(u'My sister has a dog. She loves him.')
#output: [('sister', 'bn:00071838n', 3, 9), ('dog', 'bn:00015267n', 16, 19), ('loves', 'bn:00090504v', 25, 30)]

Combining the output with Entity Linking

You can use the nifwrapper library in order to merge the WSD outputs with Entity Linking annotations.

from wrapperWSD import WrapperWSD
from nifwrapper import *


#---- Obtaining disambiguation
wsd = WrapperWSD()
corefWSD = wsd.wsdNLTK_links(u'My sister has a dog. She loves him.')
print("corefWSD:",corefWSD)
#output: [('sister', Synset('sister.n.02'), 3, 9), ('dog', Synset('pawl.n.01'), 16, 19), ('loves', Synset('sleep_together.v.01'), 25, 30)]


#---- Obtaining Entity Linking results
# inline NIF corpus creation
wrp = NIFWrapper()
doc = NIFDocument("https://example.org/doc1")
#--
sent = NIFSentence("https://example.org/doc1#char=0,19")
sent.addAttribute("nif:beginIndex","0","xsd:nonNegativeInteger")
sent.addAttribute("nif:endIndex","19","xsd:nonNegativeInteger")
sent.addAttribute("nif:isString","My sister has a dog.","xsd:string")
sent.addAttribute("nif:broaderContext",["https://example.org/doc1"],"URI LIST")


#-- 
a1 = NIFAnnotation("https://example.org/doc1#char=3,9", "3", "9", ["https://en.wikipedia.org/wiki/Sibling"], ["dbo:FamilyRelations"])
a1.addAttribute("nif:anchorOf","sister","xsd:string")
sent.pushAnnotation(a1)
doc.pushSentence(sent)

#--
sent2 = NIFSentence("https://example.org/doc1#char=21,35")
sent2.addAttribute("nif:isString","She loves him.","xsd:string")
sent2.addAttribute("nif:broaderContext",["https://example.org/doc1"],"URI LIST")
sent2.addAttribute("nif:beginIndex","21","xsd:nonNegativeInteger")
sent2.addAttribute("nif:endIndex","35","xsd:nonNegativeInteger")
doc.pushSentence(sent2)
#--
wrp.pushDocument(doc)

#---- Combining EL annotations with coreferences 
wrp.extendsDocWithWSD(corefWSD, doc.uri)
print(wrp.toString())

Reference

[Miller et al] WordNet–Wikipedia–Wiktionary: Construction of a Three-way Alignment. Tristan Miller and Iryna Gurevych. 2014 https://pdfs.semanticscholar.org/90cd/22a9cd59dc1fc21f4ec36e9c7d95085f7fb6.pdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wrapperWSD-0.0.4.tar.gz (2.5 MB view details)

Uploaded Source

Built Distribution

wrapperWSD-0.0.4-py3-none-any.whl (2.5 MB view details)

Uploaded Python 3

File details

Details for the file wrapperWSD-0.0.4.tar.gz.

File metadata

  • Download URL: wrapperWSD-0.0.4.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for wrapperWSD-0.0.4.tar.gz
Algorithm Hash digest
SHA256 0c1b62635b07a32d5f5c20c651b68184bff90137c500e27c56d4154563e44252
MD5 8d99ceab239636bc7b35d4b9f0730b06
BLAKE2b-256 17f5d58dca6f1c159c0e1f315c2d740ff2f2281db7e57ceb993a3b95a2adae3d

See more details on using hashes here.

File details

Details for the file wrapperWSD-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: wrapperWSD-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for wrapperWSD-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 da5c85c350a9819d30cf1e478bb1207377c820e24343cf587d5a9495fc58eea9
MD5 5b7dd1b49b76bb746b12d5d274f38031
BLAKE2b-256 c6b69c4e7b598a2820d1eba19916209dda0fb5278aba2882209bedc4daf4fc8b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page