Skip to main content

A spaCy extension for enhanced date and number entity recognition and extraction as structured data.

Project description

Date spaCy

date spacy logo

Date spaCy is a collection of custom spaCy pipeline component that enables you to easily identify date entities in a text and fetch the parsed date values using spaCy's token extensions. It uses RegEx to find dates and then uses the dateparser library to convert those dates into structured datetime data. One current limitation is that if no year is given, it presumes it is the current year. The dateparser output is stored in a custom entity extension: ._.date.

This lightweight approach can be added to an existing spaCy pipeline or to a blank model. If using in an existing spaCy pipeline, be sure to add it before the NER model.

Installation

To install date_spacy, simply run:

pip install date-spacy

Usage

Adding the Component to your spaCy Pipeline

First, you'll need to import the find_dates component and add it to your spaCy pipeline:

import spacy
from date_spacy import find_dates

# Load your desired spaCy model
nlp = spacy.blank('en')

# Add the component to the pipeline
nlp.add_pipe('find_dates')

Processing Text with the Pipeline

After adding the component, you can process text as usual:

doc = nlp("""The event is scheduled for 25th August 2023.
          We also have a meeting on 10 September and another one on the twelfth of October and a
          final one on January fourth.""")

Accessing the Parsed Dates

You can iterate over the entities in the doc and access the special date extension:

for ent in doc.ents:
    if ent.label_ == "DATE":
        print(f"Text: {ent.text} -> Parsed Date: {ent._.date}")

This will output:

Text: 25th August 2023 -> Parsed Date: 2023-08-25 00:00:00
Text: 10 September -> Parsed Date: 2023-09-10 00:00:00
Text: twelfth of October -> Parsed Date: 2023-10-12 00:00:00
Text: January fourth -> Parsed Date: 2023-01-04 00:00:00

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

date_spacy-0.0.1.tar.gz (3.7 kB view hashes)

Uploaded Source

Built Distribution

date_spacy-0.0.1-py3-none-any.whl (3.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page