Skip to main content

A spaCy extension for enhanced number entity recognition and extraction as structured data.

Project description

GitHub Stars PyPi Version PyPi Downloads

Number spaCy

number spacy logo

Number spaCy is a custom spaCy pipeline component that enhances the identification of number entities in text and fetches the parsed numeric values using spaCy's token extensions. It uses RegEx to identify number entities written in words and then leverages the word2number library to convert those words into structured numeric data. The output numeric value is stored in a custom entity extension: ._.number.

This lightweight component can be seamlessly added to an existing spaCy pipeline or integrated into a blank model. If using within an existing spaCy pipeline, ensure to insert it before the NER model.

Installation

To install Number spaCy, execute:

pip install number-spacy

Usage

Integrating the Component into your spaCy Pipeline

Begin by importing the find_numbers component and then integrating it into your spaCy pipeline:

import spacy
from number_spacy import find_numbers

# Initialize your preferred spaCy model
nlp = spacy.blank('en')

# Integrate the component into the pipeline
nlp.add_pipe('find_numbers')

Text Processing with the Pipeline

Post the component addition, you can process text as you typically would:

doc = nlp("I have three apples. She gave me twenty-two more, and now I have twenty-five apples in total.")

Retrieving the Parsed Numbers

You can loop through the entities in the doc and access the specific number extension:

for ent in doc.ents:
    if ent.label_ == "NUMBER":
        print(f"Text: {ent.text} -> Parsed Number: {ent._.number}")

This should output:

Text: three -> Parsed Number: 3
Text: twenty-two -> Parsed Number: 22
Text: twenty-five -> Parsed Number: 25

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

number_spacy-0.0.1.tar.gz (3.3 kB view details)

Uploaded Source

File details

Details for the file number_spacy-0.0.1.tar.gz.

File metadata

  • Download URL: number_spacy-0.0.1.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for number_spacy-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c4c27e7f7ed093223bc48abf24b13a4dc3444b2639860cd3267d754cdb36f372
MD5 faa29ce93f2b3db2d3113d44ae6b2c6a
BLAKE2b-256 dbaea3ec63882ffb200376d6c32269885f14e4fae6470fe3aac6c22030e25e88

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page