Document section detector using spaCy for clinical NLP
Project description
Clinical Sectionizer
This package offers a spaCy component for tagging clinical section titles in docs. The sectionizer
takes a list of
patterns for section titles and searches for matches in a doc
. When a section is found, it generates three outputs:
section_name
: The normalized name of a section, astring
section_header
: The span of the doc containing the header, aSpan
section
: The entire span of the doc containing the section, aSpan
Calling sectionizer(doc)
adds the
following extensions to spaCy objects:
Doc.sections
: A list of 3-tuples of (name
,header
,section
)Token.section
: Thespan
of the entire section which the token occurs inToken.section_header
: Thespan
of the section header of the section a token occurs inToken.section_name
: The name of the section header defined by a patternSpan
attributes correspondingsection
,section_header
, andsection_name
to the first token in a span
Example
>>> text = """Family History:
Diabetes
Past Medical History:
Pneumonia
Assessment and Plan:
Atrial fibrillation. There is no evidence of pneumonia.
"""
>>> import spacy
>>> nlp = spacy.load(...) # Load a model which will match clinical concepts
>>> from clinical_sectionizer import Sectionizer
>>> sectionizer = nlp.add_pipe(Sectionizer(nlp))
>>> section_patterns = [
{"section_name": "family_history", "pattern": "Family History:"},
{"section_name": "past_medical_history",
"pattern": [
{"LOWER": "past", "OP": "?"},
{"LOWER": "medical"},
{"LOWER": "history"},
{"LOWER": ":"},
]
},
{"section_name": "assessment_and_plan", "pattern": "Assessment and Plan:"},
]
>>> sectionizer.add(section_patterns)
>>> nlp.add_pipe(sectionizer)
>>> doc = nlp(text)
>>> print(nlp.ents)
(Diabetes, Pneumonia, Atrial fibrillation, pneumonia)
>>> for (section_name, section_header, section) in doc._.sections:
print(section_name, section_header, section, sep="\n")
print("---"*5)
family_history
Family History:
Family History:
Diabetes
---------------
past_medical_history
Past Medical History:
Past Medical History:
Pneumonia
---------------
assessment_and_plan
Assessment and Plan:
Assessment and Plan:
Atrial fibrillation. There is no evidence of pneumonia.
---------------
>>> for ent in doc.ents:
print(ent, ent._.section_name)
Diabetes family_history
Pneumonia past_medical_history
Atrial fibrillation assessment_and_plan
pneumonia assessment_and_plan
Using cycontext, you can also use a visualizer which shows section headers, along with any extracted entities and optionally cycontext modifiers, in an NER-style visualization.
from cycontext.viz import visualize_ent
visualize_ent(doc, sections=True, context=False)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
clinical_sectionizer-0.0.1.tar.gz
(17.4 kB
view hashes)
Built Distributions
Close
Hashes for clinical_sectionizer-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90c201d1ff3084655260cc879650747f2217b83b2b3d6e4bfbefcf9882258bfa |
|
MD5 | 4a46d279a7c61c5533ead04a5bafd444 |
|
BLAKE2b-256 | 4b607a9ac4a4f0f369664b2497cb7dca403c2778fc552c52a0103f27d83f3283 |
Close
Hashes for clinical_sectionizer-0.0.1-py3.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2bfca6106bf0bf1710360cd9b87451b163f507d7f0f3e323b329e965b6cd0d16 |
|
MD5 | b3bd9fdc9dde237682ccfae7138fe706 |
|
BLAKE2b-256 | d0c07dc925ee1f3f86cb52d32c95d46ab82a8567ca323aa44cd26ff8ebf1b788 |
Close
Hashes for clinical_sectionizer-0.0.1-py3.6.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20fb10fa15e607505faba930cad7c00f7dd6aeccc1fa71f785f0a2e5e6450b6c |
|
MD5 | d5e5d78c5f8543f164f2330888f73d81 |
|
BLAKE2b-256 | 2388c3d30132e0e128d54431aaf6de713c005aab94b2e6884445befac6a2fb65 |