Skip to main content

A spaCy pipeline component for negating concepts in text (NegEx).

Project description

negspacy: negation for spaCy

CI Built with spaCy pypi Version DOI

spaCy pipeline object for negating concepts in text. Based on the NegEx algorithm.

NegEx - A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries Chapman, Bridewell, Hanbury, Cooper, Buchanan https://doi.org/10.1006/jbin.2001.1029

What's new

Version 1.0 is a major version update providing support for spaCy 3.0's new interface for adding pipeline components. As a result, it is not backwards compatible with previous versions of negspacy.

If your project uses spaCy 2.3.5 or earlier, you will need to use version 0.1.9. See archived readme.

Installation and usage

Install the library.

pip install negspacy

Import library and spaCy.

import spacy
from negspacy.negation import Negex

Load spacy language model. Add negspacy pipeline object. Filtering on entity types is optional.

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

View negations.

doc = nlp("She does not like Steve Jobs but likes Apple products.")

for e in doc.ents:
	print(e.text, e._.negex)
Steve Jobs True
Apple False

Consider pairing with scispacy to find UMLS concepts in text and process negations.

NegEx Patterns

  • pseudo_negations - phrases that are false triggers, ambiguous negations, or double negatives
  • preceding_negations - negation phrases that precede an entity
  • following_negations - negation phrases that follow an entity
  • termination - phrases that cut a sentence in parts, for purposes of negation detection (.e.g., "but")

Termsets

Designate termset to use, en_clinical is used by default.

  • en = phrases for general english language text
  • en_clinical DEFAULT = adds phrases specific to clinical domain to general english
  • en_clinical_sensitive = adds additional phrases to help rule out historical and possibly irrelevant entities

To set:

from negspacy.negation import Negex
from negspacy.termsets import termset

ts = termset("en")

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
    "negex",
    config={
        "neg_termset":ts.get_patterns()
    }
)

Additional Functionality

Change patterns or view patterns in use

Replace all patterns with your own set

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
    "negex",
    config={
        "neg_termset":{
            "pseudo_negations": ["might not"],
            "preceding_negations": ["not"],
            "following_negations":["declined"],
            "termination": ["but","however"]
        }
    }
    )

Add and remove individual patterns on the fly from built-in termsets

from negspacy.termsets import termset
ts = termset("en")
ts.add_patterns({
            "pseudo_negations": ["my favorite pattern"],
            "termination": ["these are", "great patterns", "but"],
            "preceding_negations": ["wow a negation"],
            "following_negations": ["extra negation"],
        })
#OR
ts.remove_patterns(
        {
            "termination": ["these are", "great patterns"],
            "pseudo_negations": ["my favorite pattern"],
            "preceding_negations": ["denied", "wow a negation"],
            "following_negations": ["unlikely", "extra negation"],
        }
    )

View patterns in use

from negspacy.termsets import termset
ts = termset("en_clinical")
print(ts.get_patterns())

Negations with Spans

Span Groups can be negated by providing a list of span keys to the span_keys argument.

Load spacy language model that adds spans to the Doc object.

nlp = spacy.load("your_span_cat_model")
# 'sc' is the default SpanGroup spans_key
nlp.add_pipe("negex", config={"span_keys":["sc"]})

View negations.

doc = nlp("Analysis showed no sign of Human TR Beta 1 mRNA")

for span in doc.spans["sc"]:
	print(span.text, span.label, span._.negex)
Human TR Beta 1 PROTEIN True
Human TR Beta 1 mRNA RNA True

Negations in noun chunks

Depending on the Named Entity Recognition model you are using, you may have negations "chunked together" with nouns. For example:

nlp = spacy.load("en_core_sci_sm")
doc = nlp("There is no headache.")
for e in doc.ents:
    print(e.text)

# no headache

This would cause the Negex algorithm to miss the preceding negation. To account for this, you can add a chunk_prefix:

nlp = spacy.load("en_core_sci_sm")
ts = termset("en_clinical")
nlp.add_pipe(
    "negex",
    config={
        "chunk_prefix": ["no"],
    },
    last=True,
)
doc = nlp("There is no headache.")
for e in doc.ents:
    print(e.text, e._.negex)

# no headache True

Contributing

contributing

Authors

  • Jeno Pizarro

License

license

Other libraries

This library is featured in the spaCy Universe. Check it out for other useful libraries and inspiration.

If you're looking for a spaCy pipeline object to extract values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results) take a look at extractacy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

negspacy-1.1.0.tar.gz (40.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

negspacy-1.1.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file negspacy-1.1.0.tar.gz.

File metadata

  • Download URL: negspacy-1.1.0.tar.gz
  • Upload date:
  • Size: 40.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for negspacy-1.1.0.tar.gz
Algorithm Hash digest
SHA256 13d8c4d7bd1b9f5c7e83bc1d7e4689ca70ace3232b8b0c38c22a3807fc0b9a1f
MD5 0df2bb03f89d1262735a3c3b9acdaeb0
BLAKE2b-256 7ce47207fe5ad3d9a67bf0313db555fe7ea043d101ae33af2e39a70028970c65

See more details on using hashes here.

Provenance

The following attestation bundles were made for negspacy-1.1.0.tar.gz:

Publisher: release.yml on jenojp/negspacy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file negspacy-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: negspacy-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for negspacy-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52b88bf1e842fbcd02c5a3ef594daf4149b8737f6237f01436a2f8b95c16d6b2
MD5 df265830e9e338e0659bde78865b46be
BLAKE2b-256 4c0d6d8ff299e43ae1c014fe1a8c5e258dcf603c5e0d8ef309a4d6e5f0d1fc54

See more details on using hashes here.

Provenance

The following attestation bundles were made for negspacy-1.1.0-py3-none-any.whl:

Publisher: release.yml on jenojp/negspacy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page