Skip to main content

Python module for converting natural language numbers into ints and floats.

Project description

A Python module to convert natural language numerics into ints and floats. This is a port of the Ruby gem numerizer

Numerizer has been tested on Python 3.9, 3.10 and 3.11.

Installation

The numerizer library can be installed from PyPI as follows:

$ pip install numerizer

Usage

>>> from numerizer import numerize
>>> numerize('forty two')
'42'
>>> numerize('forty-two')
'42'
>>> numerize('four hundred and sixty two')
'462'
>>> numerize('one fifty')
'150'
>>> numerize('twelve hundred')
'1200'
>>> numerize('twenty one thousand four hundred and seventy three')
'21473'
>>> numerize('one million two hundred and fifty thousand and seven')
'1250007'
>>> numerize('one billion and one')
'1000000001'
>>> numerize('nine and three quarters')
'9.75'
>>> numerize('platform nine and three quarters')
'platform 9.75'

Using the SpaCy extension

Since version 0.2, numerizer is available as a SpaCy extension.

Any named entities of a quantitative nature within a SpaCy document can be numerized as follows:

>>> from spacy import load
>>> nlp = load('en_core_web_sm')  # or load any other spaCy model
>>> doc = nlp('The projected revenue for the next quarter is over two million dollars.')
>>> doc._.numerize()
{the next quarter: 'the next 1/4', over two million dollars: 'over 2000000 dollars'}

Users can specify which entity types are to be numerized, by using the labels argument in the extension function, as follows:

>>> doc._.numerize(labels=['MONEY'])  # only numerize entities of type 'MONEY'
{over two million dollars: 'over 2000000 dollars'}

The extension is available for tokens and spans as well.

>>> two_million = doc[-4:-2]  # span corresponding to "two million"
>>> two_million._.numerize()
'2000000'
>>> quarter = doc[6]  # token corresponding to "quarter"
>>> quarter._.numerized
'1/4'

Extras

For R users, a wrapper library has been developed by @amrrs. Try it out here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numerizer-0.2.4.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

numerizer-0.2.4-py2.py3-none-any.whl (7.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file numerizer-0.2.4.tar.gz.

File metadata

  • Download URL: numerizer-0.2.4.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for numerizer-0.2.4.tar.gz
Algorithm Hash digest
SHA256 2093849f05eb803ddb4643414a23cd1736c63112c98874cd40adebf5cf86de24
MD5 b7551eaf10dcbdb6dc12fa8f206ea141
BLAKE2b-256 bb63f8e3c8feb34814ed30d1fbea3c2785d277f58ee691f1e0f7f45a3ba6c43c

See more details on using hashes here.

File details

Details for the file numerizer-0.2.4-py2.py3-none-any.whl.

File metadata

  • Download URL: numerizer-0.2.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for numerizer-0.2.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4d5bb2630e5137cbcb1014069e6206e3c7748f7ee8dc18a0d22fbc1fcfd010e4
MD5 b1d72a02d46501da71ce1ca8b26de832
BLAKE2b-256 bd333f516a6fc3c9c4d31e2baf0c22c0acc13d8d4bed56f5bb8e17f250775f27

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page