Python module for converting natural language numbers into ints and floats.
Project description
A Python module to convert natural language numerics into ints and floats. This is a port of the Ruby gem numerizer
Numerizer has been tested on Python 3.9, 3.10 and 3.11.
Installation
The numerizer library can be installed from PyPI as follows:
$ pip install numerizer
Usage
>>> from numerizer import numerize
>>> numerize('forty two')
'42'
>>> numerize('forty-two')
'42'
>>> numerize('four hundred and sixty two')
'462'
>>> numerize('one fifty')
'150'
>>> numerize('twelve hundred')
'1200'
>>> numerize('twenty one thousand four hundred and seventy three')
'21473'
>>> numerize('one million two hundred and fifty thousand and seven')
'1250007'
>>> numerize('one billion and one')
'1000000001'
>>> numerize('nine and three quarters')
'9.75'
>>> numerize('platform nine and three quarters')
'platform 9.75'
Using the SpaCy extension
Since version 0.2, numerizer is available as a SpaCy extension.
Any named entities of a quantitative nature within a SpaCy document can be numerized as follows:
>>> from spacy import load
>>> nlp = load('en_core_web_sm') # or load any other spaCy model
>>> doc = nlp('The projected revenue for the next quarter is over two million dollars.')
>>> doc._.numerize()
{the next quarter: 'the next 1/4', over two million dollars: 'over 2000000 dollars'}
Users can specify which entity types are to be numerized, by using the labels argument in the extension function, as follows:
>>> doc._.numerize(labels=['MONEY']) # only numerize entities of type 'MONEY'
{over two million dollars: 'over 2000000 dollars'}
The extension is available for tokens and spans as well.
>>> two_million = doc[-4:-2] # span corresponding to "two million"
>>> two_million._.numerize()
'2000000'
>>> quarter = doc[6] # token corresponding to "quarter"
>>> quarter._.numerized
'1/4'
Extras
For R users, a wrapper library has been developed by @amrrs. Try it out here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for numerizer-0.2.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d5bb2630e5137cbcb1014069e6206e3c7748f7ee8dc18a0d22fbc1fcfd010e4 |
|
MD5 | b1d72a02d46501da71ce1ca8b26de832 |
|
BLAKE2b-256 | bd333f516a6fc3c9c4d31e2baf0c22c0acc13d8d4bed56f5bb8e17f250775f27 |