Skip to main content

A spaCy custom component to extract structural information from text using the SpanRuler and regex patterns.

Project description

Span Extructure

codecov

You might think the name is mispelled but it ain't. It is a word play on spaCy's Span, extract and structure. span_exctructure is a spaCy component that builds upon SpanRuler and regex to extract structured information, e.g. dates, amounts with currency and multipliers etc.

Installation

pip install span_extructure

Usage

import spacy

nlp = spacy.blank("en")

# Optionally add config if varying from default values
config = {
    "overwrite": False,       # default: False
    "rules": [
        {
            "patterns": [[{"SHAPE": "dd.dd.dddd"}]],
            "extruct": r"(?P<day>[0-3]\d).(?P<month>0[1-9]|1[0-2]).(?P<year>20[0-5]\d|19\d\d)",
            "label": "DATE",
        }
    ]
}
nlp.add_pipe("span_extructure", config=config)

doc = nlp("This date 21.04.1986 will be a DATE entity while the structured information will be extracted to `Span._.extructure`")
for e in doc.ents:
    print(f"{e.text}\t{e.label_}\t{e._.extructure}")
>>> 21.04.1986      DATE    {'day': '21', 'month': '04', 'year': '1986'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

span-extructure-0.1.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

span_extructure-0.1.1-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file span-extructure-0.1.1.tar.gz.

File metadata

  • Download URL: span-extructure-0.1.1.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.7 Linux/5.15.0-1020-azure

File hashes

Hashes for span-extructure-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b730a58fe0b4936c22f7d8f8b8cdd50aa66a4cd41283a561302b45e6828c30fc
MD5 36b8c5076d33bdfc88251aebb4f2648c
BLAKE2b-256 542b2b8e092b7f028ff8b3d1c26ddcaa86f3c9469b58736e0f1099a297f21ec5

See more details on using hashes here.

File details

Details for the file span_extructure-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: span_extructure-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.7 Linux/5.15.0-1020-azure

File hashes

Hashes for span_extructure-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 74ae557d2a39b76ab3710a4b39765a4237a5874fee23e804a61352e5f6e558fd
MD5 cbdb63185683aa8b664cdb35a9c46e85
BLAKE2b-256 06cba3303408cc31fb49eefc4ae0b98c8373404478e0e7d0ed26a99db7f5ed00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page