A spaCy pipeline object for negation.
Project description
negspacy: negation for spaCy
spaCy pipeline object for negating concepts in text. Based on the NegEx algorithm.
NegEx - A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries Chapman, Bridewell, Hanbury, Cooper, Buchanan https://doi.org/10.1006/jbin.2001.1029
Installation and usage
Install the library.
pip install negspacy
Import library and spaCy.
import spacy
from negspacy.negation import Negex
Load spacy language model. Add negspacy pipeline object. Filtering on entity types is optional.
nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, ent_types=["PERSON","ORG"])
nlp.add_pipe(negex, last=True)
View negations.
doc = nlp("She does not like Steve Jobs but likes Apple products.")
for e in doc.ents:
print(e.text, e._.negex)
Steve Jobs True
Apple False
Consider pairing with scispacy to find UMLS concepts in text and process negations.
NegEx Patterns
- psuedo_negations - phrases that are false triggers, ambiguous negations, or double negatives
- preceding_negations - negation phrases that precede an entity
- following_negations - negation phrases that follow an entity
- termination - phrases that cut a sentence in parts, for purposes of negation detection (.e.g., "but")
Termsets
Designate termset to use, en_clinical
is used by default.
negex = Negex(nlp, language = "en_clinical")
en
= phrases for general english language texten_clinical
DEFAULT = adds phrases specific to clinical domain to general englishen_clinical_sensitive
= adds additional phrases to help rule out historical and possibly irrelevant entities
Additional Functionality
Use own patterns or view patterns in use
Use own patterns
nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, termination=["but", "however", "nevertheless", "except"])
View patterns in use
patterns_dict = negex.get_patterns
Negations in noun chunks
Depending on the Named Entity Recognition model you are using, you may have negations "chunked together" with nouns. For example when using scispacy:
nlp = spacy.load("en_core_sci_sm")
doc = nlp("There is no headache.")
for e in doc.ents:
print(e.text)
# no headache
This would cause the Negex algorithm to miss the preceding negation. To account for this, you can add a chunk_prefix
:
nlp = spacy.load("en_core_sci_sm")
negex = Negex(nlp, language = "en_clinical", chunk_prefix = ["no"])
nlp.add_pipe(negex)
doc = nlp("There is no headache.")
for e in doc.ents:
print(e.text, e._.negex)
# no headache True
Contributing
Authors
- Jeno Pizarro
License
API Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file negspacy-0.1.7.tar.gz
.
File metadata
- Download URL: negspacy-0.1.7.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74b930b2aa5d834a6e9496e96abc025a00c4ae292e861899258924a24e25538d |
|
MD5 | c0c4762f4c0c19709934691f0bd27635 |
|
BLAKE2b-256 | b7690c8f46cef8d8b6ee8925270e2d48c7ebd93153dcfbd28db778eaf3588f3f |