A spaCy custom component to extract structural information from text using the SpanRuler and regex patterns.
Project description
Span Extructure
You might think the name is mispelled but it ain't. It is a word play on spaCy's Span, extract and structure. span_exctructure is a spaCy component that builds upon SpanRuler and regex to extract structured information, e.g. dates, amounts with currency and multipliers etc.
Installation
pip install span_extructure
Usage
import spacy
nlp = spacy.blank("en")
# Optionally add config if varying from default values
config = {
"overwrite": False, # default: False
"rules": [
{
"patterns": [[{"SHAPE": "dd.dd.dddd"}]],
"extruct": r"(?P<day>[0-3]\d).(?P<month>0[1-9]|1[0-2]).(?P<year>20[0-5]\d|19\d\d)",
"label": "DATE",
}
]
}
nlp.add_pipe("span_extructure", config=config)
doc = nlp("This date 21.04.1986 will be a DATE entity while the structured information will be extracted to `Span._.extructure`")
for e in doc.ents:
print(f"{e.text}\t{e.label_}\t{e._.extructure}")
>>> 21.04.1986 DATE {'day': '21', 'month': '04', 'year': '1986'}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file span-extructure-0.1.1.tar.gz.
File metadata
- Download URL: span-extructure-0.1.1.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.10.7 Linux/5.15.0-1020-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b730a58fe0b4936c22f7d8f8b8cdd50aa66a4cd41283a561302b45e6828c30fc
|
|
| MD5 |
36b8c5076d33bdfc88251aebb4f2648c
|
|
| BLAKE2b-256 |
542b2b8e092b7f028ff8b3d1c26ddcaa86f3c9469b58736e0f1099a297f21ec5
|
File details
Details for the file span_extructure-0.1.1-py3-none-any.whl.
File metadata
- Download URL: span_extructure-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.10.7 Linux/5.15.0-1020-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74ae557d2a39b76ab3710a4b39765a4237a5874fee23e804a61352e5f6e558fd
|
|
| MD5 |
cbdb63185683aa8b664cdb35a9c46e85
|
|
| BLAKE2b-256 |
06cba3303408cc31fb49eefc4ae0b98c8373404478e0e7d0ed26a99db7f5ed00
|