Skip to main content

A SpaCy wrapper for the GLiNER model for enhanced Named Entity Recognition capabilities

Project description

GLiNER SpaCy Wrapper

Introduction

This project is a wrapper for integrating GLiNER, a Named Entity Recognition (NER) model, with the SpaCy Natural Language Processing (NLP) library. GLiNER, which stands for Generalized Language INdependent Entity Recognition, is an advanced model for recognizing entities in text. The SpaCy wrapper enables easy integration and use of GLiNER within the SpaCy environment, enhancing NER capabilities with GLiNER's advanced features.

For GliNER to work properly, you need to use a Python version 3.7-3.10

Features

  • Integrates GLiNER with SpaCy for advanced NER tasks.
  • Customizable chunk size for processing large texts.
  • Support for specific entity labels like 'person' and 'organization'.
  • Configurable output style for entity recognition results.

Installation

To install this library, install it via pip:

pip install gliner-spacy

Usage

To use this wrapper in your SpaCy pipeline, follow these steps:

  1. Import SpaCy.
  2. Create a SpaCy Language instance.
  3. Add the gliner_spacy component to the SpaCy pipeline.
  4. Process text using the pipeline.

Example code:

import spacy

nlp = spacy.blank("en")
nlp.add_pipe("gliner_spacy")
text = "This is a text about Bill Gates and Microsoft."
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, ent.label_)

Expected Output

Bill Gates person
Microsoft organization

Example with Custom Configs

import spacy

custom_spacy_config = { "gliner_model": "urchade/gliner_multi",
                            "chunk_size": 250,
                            "labels": ["people","company"],
                            "style": "ent"}
nlp = spacy.blank("en")
nlp.add_pipe("gliner_spacy", config=custom_spacy_config)

text = "This is a text about Bill Gates and Microsoft."
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, ent.label_, ent._.score)

#Output
# Bill Gates people 0.9967108964920044
# Microsoft company 0.9966742992401123    

Example with loading onnx model

import spacy

custom_spacy_config = {
    "gliner_model": "onnx-community/gliner_base",
    "chunk_size": 250,
    "labels": ["people", "company"],
    "style": "ent",
    "load_onnx_model": True,
    "onnx_model_file": "onnx/model.onnx",
}
nlp = spacy.blank("en")
nlp.add_pipe("gliner_spacy", config=custom_spacy_config)

text = "This is a text about Bill Gates and Microsoft."
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, ent.label_, ent._.score)

# Output
# Bill Gates people 0.9937531352043152
# Microsoft company 0.994135856628418

Configuration

The default configuration of the wrapper can be modified according to your requirements. The configurable parameters are:

  • gliner_model: The GLiNER model to be used.
  • chunk_size: Size of the text chunk to be processed at once.
  • labels: The entity labels to be recognized.
  • style: The style of output for the entities (either 'ent' or 'span').
  • threshold: The threshold of the GliNER model (controls the degree to which a hit is considered an entity)
  • map_location: The device on which to run the model: cpu or cuda
  • load_onnx_model: Whether the gliner_model specificied is an ONNX model (False by default)
  • onnx_model_file: The path to the onnx file in the Huggingface repo. Defaults to model.onnx

Contributing

Contributions to this project are welcome. Please ensure that your code adheres to the project's coding standards and include tests for new features.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gliner_spacy-0.0.11.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gliner_spacy-0.0.11-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file gliner_spacy-0.0.11.tar.gz.

File metadata

  • Download URL: gliner_spacy-0.0.11.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for gliner_spacy-0.0.11.tar.gz
Algorithm Hash digest
SHA256 80d07933c7a2c2e457b0e0274f69962425603c7dbeaac5838c010bfaf3178091
MD5 8115f26e23a5885f034f4c1fe436da0b
BLAKE2b-256 001bc737e988cddedc00aadd90bd6ca9cae2a1b974e3a3d4ea7ca64dcf7670d2

See more details on using hashes here.

File details

Details for the file gliner_spacy-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: gliner_spacy-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for gliner_spacy-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 b44836c4b4a895307aaaf15694b1c2cd100d2365eed510ca49062227be85d000
MD5 93a438de3f835fb3cd51321e6bde477f
BLAKE2b-256 757d3942a58b5d3be6021f93552fb170d18ecf2079bc73927c7af36524938900

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page