Skip to main content

A SpaCy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results)

Project description

spaCy 3.0 support coming soon

extractacy - pattern extraction and named entity linking for spaCy

Build Status Built with spaCy Code style: black pypi Version DOI

spaCy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results)

Installation and usage

Install the library.

pip install extractacy

Import library and spaCy.

import spacy
from spacy.pipeline import EntityRuler
from extractacy.extract import ValueExtractor

Load spacy language model. Set up an EntityRuler for the example.

nlp = spacy.load("en_core_web_sm")
# Set up entity ruler
ruler = nlp.add_pipe("entity_ruler")
patterns = [
    {"label": "TEMP_READING", "pattern": [{"LOWER": "temperature"}]},
    {"label": "TEMP_READING", "pattern": [{"LOWER": "temp"}]},
    {
        "label": "DISCHARGE_DATE",
        "pattern": [{"LOWER": "discharge"}, {"LOWER": "date"}],
    },
    
]
ruler.add_patterns(patterns)

Define which entities you would like to link patterns to. Each entity needs 3 things:

  1. patterns to search for (list). This relies on spaCy token matching syntax.
  2. n_tokens to search around a named entity (int or sent)
  3. direction (right, left, both)
# Define ent_patterns for value extraction
ent_patterns = {
    "DISCHARGE_DATE": {"patterns": [[{"SHAPE": "dd/dd/dddd"}],[{"SHAPE": "dd/d/dddd"}]],"n": 2, "direction": "right"},
    "TEMP_READING": {"patterns": [[
                        {"LIKE_NUM": True},
                        {"LOWER": {"IN": ["f", "c", "farenheit", "celcius", "centigrade", "degrees"]}
                        },
                    ]
                ],
                "n": "sent",
                "direction": "both"
        },
}

Add ValueExtractor to spaCy processing pipeline

nlp.add_pipe("valext", config={"ent_patterns":ent_patterns}, last=True)

doc = nlp("Discharge Date: 11/15/2008. Patient had temp reading of 102.6 degrees.")
for e in doc.ents:
    if e._.value_extract:
        print(e.text, e.label_, e._.value_extract)
        
## Discharge Date DISCHARGE_DATE 11/15/2008
## temp reading TEMP_READING 102.6 degrees

Contributing

contributing

Authors

  • Jeno Pizarro

License

license

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extractacy-1.0.2.tar.gz (5.0 kB view details)

Uploaded Source

File details

Details for the file extractacy-1.0.2.tar.gz.

File metadata

  • Download URL: extractacy-1.0.2.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.3

File hashes

Hashes for extractacy-1.0.2.tar.gz
Algorithm Hash digest
SHA256 4d66c8ce3ed1a5b44c2d08839bd9b087ec5171aa6e1c8d3bf1c5a0082b8c4a8f
MD5 07f600814b6e44204e524d935d2c8c2b
BLAKE2b-256 4ca94f3392aeafc8c226d9c0100c5bf4e300f26dbe3d61d99e6dcfb8a824fe39

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page