A spaCy pipeline component for word sense disambiguation using the GlossBERT model
Project description
Spacy-GlossBert
A spaCy pipeline component for word sense disambiguation using the GlossBERT model.
Overview
This package provides a spaCy component that performs Word Sense Disambiguation (WSD) using the GlossBERT model. GlossBERT leverages BERT's contextual embeddings to disambiguate word senses by comparing the context with WordNet sense definitions (glosses).
Installation
pip install spacy-glossbert
You'll also need to download the spaCy model:
python -m spacy download en_core_web_sm
Usage
Basic Usage
import spacy
from spacy_glossbert import has_glossbert_wsd, get_synset_info
# Load spaCy with the GlossBERT component
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("glossbert_wsd", last=True)
# Process a text
doc = nlp("He went to the bank to deposit money.")
# Check if the document has been processed with GlossBERT
if has_glossbert_wsd(doc):
# Print disambiguated senses
for token in doc:
synset = token._.glossbert_synset
if synset:
print(f"{token.text}: {token.pos_} -- {synset.name()} - {synset.definition()}")
# Alternative: Get all sense information as a list of dictionaries
senses = get_synset_info(doc)
for sense in senses:
print(f"{sense['text']}: {sense['synset']} - {sense['definition']}")
Visualization
The package includes visualization utilities using spaCy's displaCy:
from spacy_glossbert import visualize_wsd
# Visualize the disambiguated senses
visualize_wsd(doc)
Configuration Options
# Configure the component
nlp.add_pipe(
"glossbert_wsd",
config={
"pos_filter": ["NOUN", "VERB", "ADJ"], # Part-of-speech tags to process
"supervision": True, # Highlight the target word in context
"model_name": "kanishka/GlossBERT" # HuggingFace model name/path
}
)
How It Works
The GlossBERT component:
- Identifies tokens with POS tags specified in
pos_filter - Retrieves WordNet synsets (possible senses) for each token
- For each candidate sense, creates an input of the form:
"context [SEP] definition" - Scores each sense using the GlossBERT model
- Assigns the highest-scoring sense to each token
License
This project is licensed under the GNU General Public License v2.0 (GPLv2).
Credits
- GlossBERT model: Huang et al
- spaCy: Explosion AI
- WordNet: Princeton University
Author
Igor Morgado morgado.igor@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spacy_glossbert-0.1.0.tar.gz.
File metadata
- Download URL: spacy_glossbert-0.1.0.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
934ce7bc28c2f534b1639ef4921b6dbf76d8a5431eb55cd25a67a574530e153e
|
|
| MD5 |
ca9b1647143daef752491c3de0ab64b0
|
|
| BLAKE2b-256 |
c5d053618e12bb78879e79d72f0c0888def159434f56145cdeb0e2718e357e7e
|
File details
Details for the file spacy_glossbert-0.1.0-py3-none-any.whl.
File metadata
- Download URL: spacy_glossbert-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb4a88978c12f32e8e02e0ffe6e9df2a9167825f2f9f1c5e340459b0290404b0
|
|
| MD5 |
fa3096a4e4a84a8f1097eb79f7367179
|
|
| BLAKE2b-256 |
ce0fc69d1987c07d5800ff50ef3948d1cd4b84f1840b4627668bd21b07cb0520
|