A wrapper class for extended rule-based matching in spaCy.
Project description
target_matcher
A wrapper class for extended rule-based matching in spaCy.
Overview
This package offers utilities for extended rule-based matching in spaCy pipelines. The main classes used in this package
are TargetMatcher and TargetRule. Similar to other spaCy rule-based matching components, the TargetMatcher
matches spans of text in a spaCy Doc. This class offers the followowing functionality:
- Patterns are defined using the
TargetRuleclass. This class has the following attributes:literal: An exact span of text defining the term. IfpatternisNone, this will be the phrase used to match in the doc.category: The label which will be assigned to any matched spanspattern (opt): An optional spaCy pattern. If this argument is provided, it will be used to match a span, andliteralcan be used as a normalized version of the phraseattributes (opt): An optional dictionary of attributes to set for theSpan._. For example, if{"is_negated": True}is provided, then the resultingspan._.is_negatedwill evaluate toTrueon_match (opt): Optional callback functions for the spaCy matchers
- The original rule which matched a Span will be added to
span._.target_rule. This allows you to see which specific rule picked up a match, which is useful for debugging and data aggregation/analysis - By default, matching spans will be added to
doc.ents, but by settingadd_entstoFalse, it will instead return tuples of(span, category) - The
ConceptTaggeris a wrapper class aroundTargetMatcherwhich will assign token-level labels based on thecategoryattribute for all matches
Basic Usage
Installation
You can install target_matcher using pip:
pip install target_matcher
Or clone this repository install target_matcher using the setup.py script:
$ python setup.py install
Once you've installed the package and spaCy, make sure you have a spaCy language model installed (see https://spacy.io/usage/models):
$ python -m spacy download en_core_web_sm
Example
In the example below, we'll use target matcher to extract two different forms of "Type II Diabetes" and show how they can be mapped to the same normalized ("literal") term and ICD-10 code:
from target_matcher import TargetMatcher, TargetRule
import spacy
from spacy.tokens import Span
# Register a new custom attribute to store ICD-10 diagnosis codes
Span.set_extension("icd10_code", default="")
nlp = spacy.blank("en")
target_matcher = TargetMatcher(nlp)
nlp.add_pipe(target_matcher)
rules = [
TargetRule(literal="Type II Diabetes Mellitus", category="PROBLEM",
attributes={"icd10_code": "E11.9"}),
TargetRule(literal="Type II Diabetes Mellitus", category="PROBLEM",
pattern=[{"LOWER": "type"}, {"LOWER": {"IN": ["two", "ii", "2"]}}, {"LOWER": "dm"}],
attributes={"icd10_code": "E11.9"}),
]
target_matcher.add(rules)
text = """
DIAGNOSIS: Type II Diabetes Mellitus
The patient presents today for management of Type 2 DM.
"""
doc = nlp(text)
# Even though different rules were used to match the ents,
# they have the same 'literal' value, and both are assigned "E11.9"
# as an icd10 code
for ent in doc.ents:
print(ent, ent._.target_rule.literal, ent._.icd10_code, sep="\t")
>>> Type II Diabetes Mellitus Type II Diabetes Mellitus E11.9
Type 2 DM Type II Diabetes Mellitus E11.9
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file target_matcher-0.0.3.tar.gz.
File metadata
- Download URL: target_matcher-0.0.3.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4b8b06188544a24872a5f50d5df2f354870ef68c81575c61893d500dcb688ae
|
|
| MD5 |
59fc40df1ca8fb0d57ca6ab20894445d
|
|
| BLAKE2b-256 |
ea8fc6aa786dec6f5be02af0409df63add74e509c9c4250ffdfb0c629ce57e9c
|
File details
Details for the file target_matcher-0.0.3-py3.7.egg.
File metadata
- Download URL: target_matcher-0.0.3-py3.7.egg
- Upload date:
- Size: 9.2 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c54d2e6ee072390992b0788ff698f867457778f551a992138a2a65659033fa6
|
|
| MD5 |
1044cac2828090c2edd49bad5ee6d198
|
|
| BLAKE2b-256 |
7c0322a497cd21e1ba020cd8ed9010683220c4fc9a773ba94c05356ddcf8eea9
|