Skip to main content

A SpaCy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results)

Project description

extractacy - value extraction and linking for spaCy

Build Status Built with spaCy Code style: black

spaCy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results)

Installation and usage

Install the library.

pip install extractacy

Import library and spaCy.

import spacy
from extractacy.extract import ValueExtractor

Load spacy language model. Set up an EntityRuler for the example. Define the entites and value extraction patterns and add to nlp pipeline.

nlp = spacy.load("en_core_web_sm")
# Set up entity ruler
ruler = EntityRuler(nlp)
patterns = [
    {"label": "TEMP_READING", "pattern": [{"LOWER": "temperature"}]},
    {"label": "TEMP_READING", "pattern": [{"LOWER": "temp"}]},
    {
        "label": "DISCHARGE_DATE",
        "pattern": [{"LOWER": "discharge"}, {"LOWER": "date"}],
    },
    
]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler, last=True)

# Define ent_patterns for value extraction
ent_patterns = {
    "DISCHARGE_DATE": {"patterns": [[{"SHAPE": "dd/dd/dddd"}, {"SHAPE": "dd/d/dddd"}]],"n": 2, "direction": "right"},
    "TEMP_READING": {"patterns": [[
                        {"LIKE_NUM": True},
                        {"LOWER": {"IN": ["f", "c", "farenheit", "celcius", "centigrade", "degrees"]}
                        },
                    ]
                ],
                "n": "sent",
                "direction": "both"
        },
}

valext = ValueExtractor(nlp, ent_patterns)
nlp.add_pipe(valext, last=True)

doc = nlp("Discharge Date: 11/15/2008. Patient had temp reading of 102.6 degrees.")
for e in doc.ents:
    if e._.value_extract:
        print(e.text, e.label_, e._.value_extract)
## Discharge Date DISCHARGE_DATE 11/15/2008
## temp reading TEMP_READING 102.6 degrees

Value Extraction patterns

Returns all patterns within n tokens of entity of interest or within the same sentence. It relies on spaCy token matching syntax.

{"ENTITY_NAME":{"patterns":[{"LOWER":"awesome"}, {"LOWER":"pattern"}], "n": 5, "direction": "right"}}

Use "n":"sent" for within sentence method rather than n tokens.

Contributing

contributing

Authors

  • Jeno Pizarro

License

license

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extractacy-0.1.0.tar.gz (2.8 kB view details)

Uploaded Source

File details

Details for the file extractacy-0.1.0.tar.gz.

File metadata

  • Download URL: extractacy-0.1.0.tar.gz
  • Upload date:
  • Size: 2.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for extractacy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f28c7833e6fab957569f109a1af1ffc8a6b387b32d375afbbb2af603c351ff72
MD5 583d9125854f633c96bea382ecda18ca
BLAKE2b-256 6a395b8cad3df873d6ee1a1522896f4026532e426068f5d81f06f83d32d66987

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page