Skip to main content

A spaCy pipeline component for word sense disambiguation using the GlossBERT model

Project description

Spacy-GlossBert

A spaCy pipeline component for word sense disambiguation using the GlossBERT model.

Overview

This package provides a spaCy component that performs Word Sense Disambiguation (WSD) using the GlossBERT model. GlossBERT leverages BERT's contextual embeddings to disambiguate word senses by comparing the context with WordNet sense definitions (glosses).

Installation

pip install spacy-glossbert

You'll also need to download the spaCy model:

python -m spacy download en_core_web_sm

Usage

Basic Usage

import spacy
from spacy_glossbert import has_glossbert_wsd, get_synset_info

# Load spaCy with the GlossBERT component
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("glossbert_wsd", last=True)

# Process a text
doc = nlp("He went to the bank to deposit money.")

# Check if the document has been processed with GlossBERT
if has_glossbert_wsd(doc):
    # Print disambiguated senses
    for token in doc:
        synset = token._.glossbert_synset
        if synset:
            print(f"{token.text}: {token.pos_} -- {synset.name()} - {synset.definition()}")

    # Alternative: Get all sense information as a list of dictionaries
    senses = get_synset_info(doc)
    for sense in senses:
        print(f"{sense['text']}: {sense['synset']} - {sense['definition']}")

Visualization

The package includes visualization utilities using spaCy's displaCy:

from spacy_glossbert import visualize_wsd

# Visualize the disambiguated senses
visualize_wsd(doc)

Configuration Options

# Configure the component
nlp.add_pipe(
    "glossbert_wsd",
    config={
        "pos_filter": ["NOUN", "VERB", "ADJ"],  # Part-of-speech tags to process
        "supervision": True,  # Highlight the target word in context
        "model_name": "kanishka/GlossBERT"  # HuggingFace model name/path
    }
)

How It Works

The GlossBERT component:

  1. Identifies tokens with POS tags specified in pos_filter
  2. Retrieves WordNet synsets (possible senses) for each token
  3. For each candidate sense, creates an input of the form: "context [SEP] definition"
  4. Scores each sense using the GlossBERT model
  5. Assigns the highest-scoring sense to each token

License

This project is licensed under the GNU General Public License v2.0 (GPLv2).

Credits

Author

Igor Morgado morgado.igor@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_glossbert-0.1.0.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spacy_glossbert-0.1.0-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file spacy_glossbert-0.1.0.tar.gz.

File metadata

  • Download URL: spacy_glossbert-0.1.0.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for spacy_glossbert-0.1.0.tar.gz
Algorithm Hash digest
SHA256 934ce7bc28c2f534b1639ef4921b6dbf76d8a5431eb55cd25a67a574530e153e
MD5 ca9b1647143daef752491c3de0ab64b0
BLAKE2b-256 c5d053618e12bb78879e79d72f0c0888def159434f56145cdeb0e2718e357e7e

See more details on using hashes here.

File details

Details for the file spacy_glossbert-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for spacy_glossbert-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fb4a88978c12f32e8e02e0ffe6e9df2a9167825f2f9f1c5e340459b0290404b0
MD5 fa3096a4e4a84a8f1097eb79f7367179
BLAKE2b-256 ce0fc69d1987c07d5800ff50ef3948d1cd4b84f1840b4627668bd21b07cb0520

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page