A spaCy pipeline component for word sense disambiguation using the GlossBERT model

These details have not been verified by PyPI

Project description

Spacy-GlossBert

A spaCy pipeline component for word sense disambiguation using the GlossBERT model.

Overview

This package provides a spaCy component that performs Word Sense Disambiguation (WSD) using the GlossBERT model. GlossBERT leverages BERT's contextual embeddings to disambiguate word senses by comparing the context with WordNet sense definitions (glosses).

Installation

pip install spacy-glossbert

You'll also need to download the spaCy model:

python -m spacy download en_core_web_sm

Usage

Basic Usage

import spacy
from spacy_glossbert import has_glossbert_wsd, get_synset_info

# Load spaCy with the GlossBERT component
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("glossbert_wsd", last=True)

# Process a text
doc = nlp("He went to the bank to deposit money.")

# Check if the document has been processed with GlossBERT
if has_glossbert_wsd(doc):
    # Print disambiguated senses
    for token in doc:
        synset = token._.glossbert_synset
        if synset:
            print(f"{token.text}: {token.pos_} -- {synset.name()} - {synset.definition()}")

    # Alternative: Get all sense information as a list of dictionaries
    senses = get_synset_info(doc)
    for sense in senses:
        print(f"{sense['text']}: {sense['synset']} - {sense['definition']}")

Visualization

The package includes visualization utilities using spaCy's displaCy:

from spacy_glossbert import visualize_wsd

# Visualize the disambiguated senses
visualize_wsd(doc)

Configuration Options

# Configure the component
nlp.add_pipe(
    "glossbert_wsd",
    config={
        "pos_filter": ["NOUN", "VERB", "ADJ"],  # Part-of-speech tags to process
        "supervision": True,  # Highlight the target word in context
        "model_name": "kanishka/GlossBERT"  # HuggingFace model name/path
    }
)

How It Works

The GlossBERT component:

Identifies tokens with POS tags specified in pos_filter
Retrieves WordNet synsets (possible senses) for each token
For each candidate sense, creates an input of the form: "context [SEP] definition"
Scores each sense using the GlossBERT model
Assigns the highest-scoring sense to each token

License

This project is licensed under the GNU General Public License v2.0 (GPLv2).

Credits

GlossBERT model: Huang et al
spaCy: Explosion AI
WordNet: Princeton University

Author

Igor Morgado morgado.igor@gmail.com

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Mar 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_glossbert-0.1.0.tar.gz (7.8 kB view details)

Uploaded Mar 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spacy_glossbert-0.1.0-py3-none-any.whl (8.0 kB view details)

Uploaded Mar 27, 2025 Python 3

File details

Details for the file spacy_glossbert-0.1.0.tar.gz.

File metadata

Download URL: spacy_glossbert-0.1.0.tar.gz
Upload date: Mar 27, 2025
Size: 7.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for spacy_glossbert-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`934ce7bc28c2f534b1639ef4921b6dbf76d8a5431eb55cd25a67a574530e153e`
MD5	`ca9b1647143daef752491c3de0ab64b0`
BLAKE2b-256	`c5d053618e12bb78879e79d72f0c0888def159434f56145cdeb0e2718e357e7e`

See more details on using hashes here.

File details

Details for the file spacy_glossbert-0.1.0-py3-none-any.whl.

File metadata

Download URL: spacy_glossbert-0.1.0-py3-none-any.whl
Upload date: Mar 27, 2025
Size: 8.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for spacy_glossbert-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fb4a88978c12f32e8e02e0ffe6e9df2a9167825f2f9f1c5e340459b0290404b0`
MD5	`fa3096a4e4a84a8f1097eb79f7367179`
BLAKE2b-256	`ce0fc69d1987c07d5800ff50ef3948d1cd4b84f1840b4627668bd21b07cb0520`

See more details on using hashes here.

spacy-glossbert 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Spacy-GlossBert

Overview

Installation

Usage

Basic Usage

Visualization

Configuration Options

How It Works

License

Credits

Author

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes