Skip to main content

A SpaCy wrapper for the GLiNER model for enhanced Named Entity Recognition capabilities

Project description

GLiNER SpaCy Wrapper

Introduction

This project is a wrapper for integrating GLiNER, a Named Entity Recognition (NER) model, with the SpaCy Natural Language Processing (NLP) library. GLiNER, which stands for Generalized Language INdependent Entity Recognition, is an advanced model for recognizing entities in text. The SpaCy wrapper enables easy integration and use of GLiNER within the SpaCy environment, enhancing NER capabilities with GLiNER's advanced features.

For GliNER to work properly, you need to use a Python version 3.7-3.10

Features

  • Integrates GLiNER with SpaCy for advanced NER tasks.
  • Customizable chunk size for processing large texts.
  • Support for specific entity labels like 'person' and 'organization'.
  • Configurable output style for entity recognition results.

Installation

To install this library, install it via pip:

pip install gliner-spacy

Usage

To use this wrapper in your SpaCy pipeline, follow these steps:

  1. Import SpaCy.
  2. Create a SpaCy Language instance.
  3. Add the gliner_spacy component to the SpaCy pipeline.
  4. Process text using the pipeline.

Example code:

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("gliner_spacy")
text = "This is a text about Bill Gates and Microsoft."
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, ent.label_)

Expected Output

Bill Gates person
Microsoft organization

Example with Custom Configs

import spacy

custom_spacy_config = { "gliner_model": "urchade/gliner_multi",
                            "chunk_size": 250,
                            "labels": ["people","company"],
                            "style": "ent"}
nlp = spacy.blank("en")
nlp.add_pipe("gliner_spacy", config=custom_spacy_config)

text = "This is a text about Bill Gates and Microsoft."
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, ent.label_, ent._.score)

#Output
# Bill Gates people 0.9967108964920044
# Microsoft company 0.9966742992401123    

Configuration

The default configuration of the wrapper can be modified according to your requirements. The configurable parameters are:

  • gliner_model: The GLiNER model to be used.
  • chunk_size: Size of the text chunk to be processed at once.
  • labels: The entity labels to be recognized.
  • style: The style of output for the entities (either 'ent' or 'span').
  • threshold: The threshold of the GliNER model (controls the degree to which a hit is considered an entity)
  • map_location: The device on which to run the model: cpu or cuda

Contributing

Contributions to this project are welcome. Please ensure that your code adheres to the project's coding standards and include tests for new features.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gliner-spacy-0.0.10.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

gliner_spacy-0.0.10-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file gliner-spacy-0.0.10.tar.gz.

File metadata

  • Download URL: gliner-spacy-0.0.10.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for gliner-spacy-0.0.10.tar.gz
Algorithm Hash digest
SHA256 4ae1a7aea3d81872ea2ac5640d318bd4923c7d4eb1af719f1adecb64514ee46a
MD5 4dce0be4655b06bab0f312317c3f0281
BLAKE2b-256 314c19f9f2abb3aae6a8df1f860d9565cff3d31b86682bdb0c4e6bc2e156085c

See more details on using hashes here.

File details

Details for the file gliner_spacy-0.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for gliner_spacy-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 88532eaf43baa744ae807983cb4276ec678c5f5bee75e0ba40d33da1a05bcb10
MD5 9b6475a069d700e6b1d5aca0fd38d65e
BLAKE2b-256 3a1219d20532e91b35e2529f6cb3dd3ff927cacce679c86d51bac073d295fb72

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page