Skip to main content

A wrapper class for extended rule-based matching in spaCy.

Project description

target_matcher

A wrapper class for extended rule-based matching in spaCy.

Overview

This package offers utilities for extended rule-based matching in spaCy pipelines. The main classes used in this package are TargetMatcher and TargetRule. Similar to other spaCy rule-based matching components, the TargetMatcher matches spans of text in a spaCy Doc. This class offers the followowing functionality:

  • Patterns are defined using the TargetRule class. This class has the following attributes:
    • literal: An exact span of text defining the term. If pattern is None, this will be the phrase used to match in the doc.
    • category: The label which will be assigned to any matched spans
    • pattern (opt): An optional spaCy pattern. If this argument is provided, it will be used to match a span, and literal can be used as a normalized version of the phrase
    • attributes (opt): An optional dictionary of attributes to set for the Span._. For example, if {"is_negated": True} is provided, then the resulting span._.is_negated will evaluate to True
    • on_match (opt): Optional callback functions for the spaCy matchers
  • The original rule which matched a Span will be added to span._.target_rule. This allows you to see which specific rule picked up a match, which is useful for debugging and data aggregation/analysis
  • By default, matching spans will be added to doc.ents, but by setting add_ents to False, it will instead return tuples of (span, category)
  • The ConceptTagger is a wrapper class around TargetMatcher which will assign token-level labels based on the category attribute for all matches

Basic Usage

Installation

You can install target_matcher using pip:

pip install target_matcher

Or clone this repository install target_matcher using the setup.py script:

$ python setup.py install

Once you've installed the package and spaCy, make sure you have a spaCy language model installed (see https://spacy.io/usage/models):

$ python -m spacy download en_core_web_sm

Example

In the example below, we'll use target matcher to extract two different forms of "Type II Diabetes" and show how they can be mapped to the same normalized ("literal") term and ICD-10 code:

from target_matcher import TargetMatcher, TargetRule
import spacy
from spacy.tokens import Span

# Register a new custom attribute to store ICD-10 diagnosis codes
Span.set_extension("icd10_code", default="")

nlp = spacy.blank("en")
target_matcher = TargetMatcher(nlp)
nlp.add_pipe(target_matcher)

rules = [
    TargetRule(literal="Type II Diabetes Mellitus", category="PROBLEM",
              attributes={"icd10_code": "E11.9"}),
    TargetRule(literal="Type II Diabetes Mellitus", category="PROBLEM",
               pattern=[{"LOWER": "type"}, {"LOWER": {"IN": ["two", "ii", "2"]}}, {"LOWER": "dm"}],
              attributes={"icd10_code": "E11.9"}),
]
target_matcher.add(rules)

text = """
DIAGNOSIS: Type II Diabetes Mellitus
The patient presents today for management of Type 2 DM.
"""

doc = nlp(text)

# Even though different rules were used to match the ents,
# they have the same 'literal' value, and both are assigned "E11.9" 
# as an icd10 code
for ent in doc.ents:
    print(ent, ent._.target_rule.literal, ent._.icd10_code, sep="\t")

>>> Type II Diabetes Mellitus	Type II Diabetes Mellitus	E11.9
    Type 2 DM	Type II Diabetes Mellitus	E11.9

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

target_matcher-0.0.3.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

target_matcher-0.0.3-py3.7.egg (9.2 kB view details)

Uploaded Source

File details

Details for the file target_matcher-0.0.3.tar.gz.

File metadata

  • Download URL: target_matcher-0.0.3.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.6

File hashes

Hashes for target_matcher-0.0.3.tar.gz
Algorithm Hash digest
SHA256 b4b8b06188544a24872a5f50d5df2f354870ef68c81575c61893d500dcb688ae
MD5 59fc40df1ca8fb0d57ca6ab20894445d
BLAKE2b-256 ea8fc6aa786dec6f5be02af0409df63add74e509c9c4250ffdfb0c629ce57e9c

See more details on using hashes here.

File details

Details for the file target_matcher-0.0.3-py3.7.egg.

File metadata

  • Download URL: target_matcher-0.0.3-py3.7.egg
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.6

File hashes

Hashes for target_matcher-0.0.3-py3.7.egg
Algorithm Hash digest
SHA256 4c54d2e6ee072390992b0788ff698f867457778f551a992138a2a65659033fa6
MD5 1044cac2828090c2edd49bad5ee6d198
BLAKE2b-256 7c0322a497cd21e1ba020cd8ed9010683220c4fc9a773ba94c05356ddcf8eea9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page