Skip to main content

Library for clinical NLP with spaCy.

Project description

License: MIT Build Status

medspacy

Library for clinical NLP with spaCy.

alt text

MedSpaCy is currently in beta.

Overview

MedSpaCy is a library of tools for performing clinical NLP and text processing tasks with the popular spaCy framework. The medspacy package brings together a number of other packages, each of which implements specific functionality for common clinical text processing specific to the clinical domain, such as sentence segmentation, contextual analysis and attribute assertion, and section detection.

medspacy is modularized so that each component can be used independently. All of medspacy is designed to be used as part of a spacy processing pipeline. Each of the following modules is available as part of medspacy:

  • medspacy.preprocess: Destructive preprocessing for modifying clinical text before processing
  • medspacy.sentence_splitter: Clinical sentence segmentation
  • medspacy.ner: Utilities for extracting concepts from clinical text
  • medspacy.context: Implementation of the ConText for detecting semantic modifiers and attributes of entities, including negation and uncertainty
  • medspacy.section_detection: Clinical section detection and segmentation
  • medspacy.postprocess: Flexible framework for modifying and removing extracted entities
  • medspacy.io: Utilities for converting processed texts to structured data and interacting with databases
  • medspacy.visualization: Utilities for visualizing concepts and relationships extracted from text
  • SpacyQuickUMLS: UMLS concept extraction compatible with spacy and medspacy implemented by our fork of QuickUMLS. More detail on this component, how to use it, how to generate UMLS resources beyond the small UMLS sample can be found in this notebook.

Future work could include I/O, relations extraction, and pre-trained clinical models.

Latest release 1.3.0 (11/21/2024)

What's new in 1.3.0:

    • Optimized database I/O to write concepts into SQLite in batches, which has been tested on mariadb (not integrated into main branch yet).
    • Reconfigured requirements.txt and updated medspacydependencies to support later versions of spaCy up to 3.8.2.
    • Due to deprecated and incompatible configurations, we have stopped supporting Python 3.6 and 3.7.

As of 10/2/2021 (version 0.2.0.0), medspaCy supports spaCy v3

Language support

As of May 2024, medspacy has been restructured to allow distributing rules and resources in languages besides English.

Please note that some languages are effectively "empty" from the perspective of medspacy. We've had requests to allow some of these languages, but we have no rules since the medspacy team works primarily in English. Please see the table below for a summary of what is available in each language, as they each vary. As of this writing, the languages have been sorted in order of what rules/maturity they currently support. If you are reading this and you have rules or would like to develop rules in another language, please consider contributing. We can help get your work integrated.

Language ConText rules Section Rules QuickUMLS sample (and unit test)
English (en) Yes Yes Yes
French (fr) Yes Very few Yes
Dutch (nl) Yes No No
Spanish (es) No Very few Yes
Polish (pl) No No No
Portuguese (pt) No No Yes
Italian (it) No No Yes
German (de) No No No

Regarding the rules that are available in these languages, a few citations to mention:

Usage

Installation

You can install medspacy using setup.py:

python setup.py install

Or with pip:

pip install medspacy

To install a previous version which uses spaCy 2:

pip install medspacy==medspacy 0.1.0.2

Requirements

The following packages are required and installed when medspacy is installed:

If you download other models, you can use them by providing the model itself or model name to medspacy.load(model_name):

import spacy; import medspacy
# Option 1: Load default
nlp = medspacy.load()

# Option 2: Load from existing model
nlp = spacy.load("en_core_web_sm", disable={"ner"})
nlp = medspacy.load(nlp)

# Option 3: Load from model name
nlp = medspacy.load("en_core_web_sm", disable={"ner"})

Basic Usage

Here is a simple example showing how to implement and visualize a simple rule-based pipeline using medspacy:

import medspacy
from medspacy.ner import TargetRule
from medspacy.visualization import visualize_ent

# Load medspacy model
nlp = medspacy.load()
print(nlp.pipe_names)

text = """
Past Medical History:
1. Atrial fibrillation
2. Type II Diabetes Mellitus

Assessment and Plan:
There is no evidence of pneumonia. Continue warfarin for Afib. Follow up for management of type 2 DM.
"""

# Add rules for target concept extraction
target_matcher = nlp.get_pipe("medspacy_target_matcher")
target_rules = [
    TargetRule("atrial fibrillation", "PROBLEM"),
    TargetRule("atrial fibrillation", "PROBLEM", pattern=[{"LOWER": "afib"}]),
    TargetRule("pneumonia", "PROBLEM"),
    TargetRule("Type II Diabetes Mellitus", "PROBLEM", 
              pattern=[
                  {"LOWER": "type"},
                  {"LOWER": {"IN": ["2", "ii", "two"]}},
                  {"LOWER": {"IN": ["dm", "diabetes"]}},
                  {"LOWER": "mellitus", "OP": "?"}
              ]),
    TargetRule("warfarin", "MEDICATION")
]
target_matcher.add(target_rules)

doc = nlp(text)
visualize_ent(doc)

Output: alt text

For more detailed examples and explanations of each component, see the notebooks folder.

Citing medspaCy

If you use medspaCy in your work, consider citing our paper! Presented at the AMIA Annual Symposium 2021, preprint available on Arxiv.

H. Eyre, A.B. Chapman, K.S. Peterson, J. Shi, P.R. Alba, M.M. Jones, T.L. Box, S.L. DuVall, O. V Patterson,
Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python,
AMIA Annu. Symp. Proc. 2021 (in Press. (n.d.). 
http://arxiv.org/abs/2106.07799.
@Article{medspacy,
   Author="Eyre, H.  and Chapman, A. B.  and Peterson, K. S.  and Shi, J.  and Alba, P. R.  and Jones, M. M.  and Box, T. L.  and DuVall, S. L.  and Patterson, O. V. ",
   Title="{{L}aunching into clinical space with medspa{C}y: a new clinical text processing toolkit in {P}ython}",
   Journal="AMIA Annu Symp Proc",
   Year="2021",
   Volume="2021",
   Pages="438--447"
}

}

Made with medSpaCy

Here are some links to projects or tutorials which use medSpacy. If you have a project which uses medSpaCy which you'd like to use, let us know!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medspacy-1.3.0.tar.gz (244.7 kB view details)

Uploaded Source

File details

Details for the file medspacy-1.3.0.tar.gz.

File metadata

  • Download URL: medspacy-1.3.0.tar.gz
  • Upload date:
  • Size: 244.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for medspacy-1.3.0.tar.gz
Algorithm Hash digest
SHA256 ffb8837ccc9c2174b3ed20e2423538ddc815e77cbaa30497d91302a26752d2ce
MD5 bed3ef0f9152559fdc97732ff3226b7f
BLAKE2b-256 68ff96590d087cc80a507ac8942a77e70bb69dd3d31974cbd2b20a9b089a18dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page