A SpaCy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results)
Project description
extractacy - value extraction and linking for spaCy
spaCy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results)
Installation and usage
Install the library.
pip install extractacy
Import library and spaCy.
import spacy
from extractacy.extract import ValueExtractor
Load spacy language model. Set up an EntityRuler for the example. Define the entites and value extraction patterns and add to nlp pipeline.
nlp = spacy.load("en_core_web_sm")
# Set up entity ruler
ruler = EntityRuler(nlp)
patterns = [
{"label": "TEMP_READING", "pattern": [{"LOWER": "temperature"}]},
{"label": "TEMP_READING", "pattern": [{"LOWER": "temp"}]},
{
"label": "DISCHARGE_DATE",
"pattern": [{"LOWER": "discharge"}, {"LOWER": "date"}],
},
]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler, last=True)
# Define ent_patterns for value extraction
ent_patterns = {
"DISCHARGE_DATE": {"patterns": [[{"SHAPE": "dd/dd/dddd"}, {"SHAPE": "dd/d/dddd"}]],"n": 2, "direction": "right"},
"TEMP_READING": {"patterns": [[
{"LIKE_NUM": True},
{"LOWER": {"IN": ["f", "c", "farenheit", "celcius", "centigrade", "degrees"]}
},
]
],
"n": "sent",
"direction": "both"
},
}
valext = ValueExtractor(nlp, ent_patterns)
nlp.add_pipe(valext, last=True)
doc = nlp("Discharge Date: 11/15/2008. Patient had temp reading of 102.6 degrees.")
for e in doc.ents:
if e._.value_extract:
print(e.text, e.label_, e._.value_extract)
## Discharge Date DISCHARGE_DATE 11/15/2008
## temp reading TEMP_READING 102.6 degrees
Value Extraction patterns
Returns all patterns within n tokens of entity of interest or within the same sentence. It relies on spaCy token matching syntax.
{"ENTITY_NAME":{"patterns":[{"LOWER":"awesome"}, {"LOWER":"pattern"}], "n": 5, "direction": "right"}}
Use "n":"sent"
for within sentence method rather than n tokens.
Contributing
Authors
- Jeno Pizarro
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file extractacy-0.1.0.tar.gz
.
File metadata
- Download URL: extractacy-0.1.0.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f28c7833e6fab957569f109a1af1ffc8a6b387b32d375afbbb2af603c351ff72 |
|
MD5 | 583d9125854f633c96bea382ecda18ca |
|
BLAKE2b-256 | 6a395b8cad3df873d6ee1a1522896f4026532e426068f5d81f06f83d32d66987 |