Skip to main content

A BERT-based inference module for negation detection (cue, scope) -> planning to add focus and event in the near future

Project description

dneg

dneg is a Python package for detecting negation cues and their scopes in text using fine-tuned BERT models. It provides a pipeline to process batched text inputs, identify negation cues (e.g., "not", "n't"), and determine the scope of negation within sentences. The package leverages the Hugging Face Transformers library, PyTorch-Geometric, and PyTorch for efficient inference.

Features

  • Negation Cue Detection: Identifies negation cues (e.g., "not", "n't") using the CueBertInference or CueBertInferenceGAT class.
  • Negation Scope Detection: Determines the scope of negation in text using the ScopeBertInference or ScopeBertInferenceGAT class.
  • Pipeline Processing: Combines cue and scope detection in a single pipeline for streamlined processing.
  • Batch Processing: Supports batched inputs for efficient inference.
  • GPU Support: Utilizes CUDA for accelerated inference on compatible hardware.
  • TODO: In the future there will be negation event and focus detection components added to the Pipeline.
  • 🌟✴️🌟 German Language Support: The pipeline now supports negation detection in German as well as English.
  • 🌟✴️🌟 Multi Language Support: The pipeline now supports negation detection for 10 additional Languages: German, Italian, Spanish, French, Dutch, Chinese, Japanese, Russian, Hindi, Arabic
  • We trained around 300 models, which can all accessed via this package (and can be found on huggingface)

Prerequisites

  • Python 3.6 or higher
  • PyTorch
  • PyTorch Geometric
  • Scikit-Learn
  • UD-Pipe
  • Spacy
  • Hugging Face Transformers
  • CUDA-enabled GPU (optional, for faster inference)

Install via PyPI

pip install dneg

Install Dependencies

Ensure dependencies are installed:

pip install torch transformers

Usage

Basic Example

The following example demonstrates how to use the Pipeline class to detect negation cues and scopes in a batch of sentences.

from dneg import PipelineTests, Pipeline
pipe = Pipeline.from_language()
batch_tokens = [
    "This is not an example for testing, it is also not an example for multi negation testing and i never ate spinach .".split(
        " "),
    ['In', 'contrast', 'to', 'anti-CD3/IL-2-activated', 'LN', 'cells', ',', 'adoptive', 'transfer', 'of',
     'freshly', 'isolated', 'tumor-draining', 'LN', 'T', 'cells', 'has', 'no', 'therapeutic', 'activity',
     '.'],
]
res = pipe.run(batch_tokens)
Pipeline.pretty_print(res)
Results in:

This                           S     X     X    
is                             S     X     X    
not                            C     X     X    
an                             S     X     X    
example                        X     X     X    
for                            S     X     X    
testing,                       X     X     X    
it                             X     S     X    
is                             X     S     X    
also                           X     S     X    
not                            X     C     X    
an                             X     S     X    
example                        X     X     X    
for                            X     S     X    
multi                          X     S     X    
negation                       X     X     X    
testing                        X     X     X    
and                            X     X     X    
i                              X     S     S    
never                          X     X     C    
ate                            X     X     S    
spinach                        X     X     S    
.                              X     X     X    

In                             X    
contrast                       X    
to                             X    
anti-CD3/IL-2-activated        X    
LN                             X    
cells                          X    
,                              X    
adoptive                       X    
transfer                       X    
of                             X    
freshly                        X    
isolated                       X    
tumor-draining                 X    
LN                             X    
T                              X    
cells                          X    
has                            X    
no                             C    
therapeutic                    X    
activity                       X    
.                              X  

Advanced Usage

For custom models or tokenizers, you can initialize the pipeline with specific components:

from dneg import Pipeline, CueBertInference, ScopeBertInference

# Load custom models and tokenizers
mcue_path = "D-NEG/cue-de-sfu"
mscope_path = "D-NEG/scope-de-sfu"

# Initialize pipeline with custom components
pipe = Pipeline(
    components=[CueBertInference, ScopeBertInference],
    model_paths=[mcue_path, mscope_path]
)

# Define input
batch_tokens = [
    "Das ist nicht ein Testsatz .".split(" ")
]

# Run inference
results = pipe.run(batch_tokens)

# Print results
Pipeline.pretty_print(results)

Package Structure

  • CueBertInference: Detects negation cues (labeled as "C" for cues, "X" otherwise).
  • ScopeBertInference: Identifies the scope of negation (labeled as "S" for scope, "X" otherwise).
    • CueBertInferenceGAT: Detects negation cues (labeled as "C" for cues, "X" otherwise) + syntax aware GAN.
  • ScopeBertInferenceGAT: Identifies the scope of negation (labeled as "S" for scope, "X" otherwise) + syntax aware GAN.
  • Pipeline: Combines CueBertInference and ScopeBertInference for end-to-end negation detection.
  • Special Tokens:
    • [CUE]: Marks negation cues.
    • [SCO]: Marks negation scope.

Requirements

See requirements.txt for a full list of dependencies. Key dependencies include:

  • torch>=1.9.0
  • transformers>=4.9.0

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dneg-0.1.1.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dneg-0.1.1-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file dneg-0.1.1.tar.gz.

File metadata

  • Download URL: dneg-0.1.1.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for dneg-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f487468324e3ad2631f99237da3d06d16d5780e8490fcb48201fd3b609ec8b4d
MD5 4f3383ba8d05d5ff6426a358ba57edc8
BLAKE2b-256 facecd0158be7c2704ab5a53c6e77952856622bab26f273af87ac369ab105b1e

See more details on using hashes here.

File details

Details for the file dneg-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dneg-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for dneg-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bcbc425cc21034c44f25393bf1a77bd9838e7c5ef33d0789506aaf26c9eadb8d
MD5 87ffa057523a9924588230a98d833203
BLAKE2b-256 f110743eee0d6d39dfec7bb2c925b64dc9c44419ad62467e13a4cbaa68ccb816

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page