Inverse of num2words2: convert spoken-form numbers back to numeric values across 100+ languages.
Project description
The inverse of num2words2.
Convert spoken-form numbers (“forty-two”, “trois cent quatre”, “二十三”) back into numeric values across 100+ locales.
Hand-written English grammar parser: cardinals, ordinals, decimals, negatives, scale words up to centillion, year forms, “and” conjunctions, hyphenation, ASR-style outputs.
Generic backend for every other locale supported by num2words2, derived automatically by reverse-mapping the forward conversion.
Sentence-level mode that walks running text and replaces every word-number with its numeric form, preserving punctuation and surrounding text.
Installation
pip install words2num2
num2words2 is a runtime dependency for the generic multi-language backend.
Usage
>>> from words2num2 import words2num, words2num_sentence
>>> words2num("forty-two")
42
>>> words2num("one thousand two hundred thirty-four")
1234
>>> words2num("minus seven")
-7
>>> words2num("three point one four")
Decimal('3.14')
>>> words2num("nineteen ninety nine", to="year")
1999
>>> words2num("twenty-first", to="ordinal")
21
>>> words2num("quarante-deux", lang="fr")
42
>>> words2num("zweiundvierzig", lang="de")
42
>>> words2num("сорок два", lang="ru")
42
>>> words2num_sentence("I bought twenty-three apples and fourteen pears.")
'I bought 23 apples and 14 pears.'
Auto-parse: numbers + currencies + units + locales
auto_parse extracts a numeric value plus its unit from any free-text expression. auto_parse_sentence walks running text and replaces every quantity in place. Supports configurable thousands/decimal separators per locale, currency symbols ($€£¥₹₽₩₺) and ISO codes (USD, EUR, GBP, …), scale shortcuts ($5m, $1.5b), SI/imperial units (length/mass/temperature/time/volume), percent, and disambiguation hints.
>>> from words2num2 import auto_parse, auto_parse_sentence
>>> auto_parse("$12,345.00")
Quantity(value=12345.0, unit='USD', kind='currency', confidence=1.0)
>>> auto_parse("$5m").value
5000000
>>> auto_parse("5cm")
Quantity(value=5, unit='cm', kind='length', confidence=1.0)
>>> auto_parse("20°C").kind
'temperature'
>>> auto_parse("forty-two kg").value
42
# Configurable separators
>>> auto_parse("1.234,56", lang="de").value
1234.56
>>> auto_parse("1 234,56", lang="fr").value
1234.56
# Disambiguation
>>> auto_parse("5m", prefer={"m": "mile"}).unit_long
'mile'
# Sentence mode
>>> auto_parse_sentence("Pay $12.50 for 5kg of apples at -5°C.")
'Pay 12.5 USD for 5 kg of apples at -5 °C.'
>>> auto_parse_sentence("Pay $12.50 for 5kg.", expand=True)
'Pay 12.5 dollar for 5 kilogram.'
Command-line
$ words2num2 "forty-two"
42
$ words2num2 "trois cent quatre" --lang=fr
304
$ words2num2 "twenty-third" --to=ordinal
23
Supported locales
words2num2 mirrors num2words2’s locale list — 100+ entries including:
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, ce, cs, cy, da, de, el, en, en_IN, en_NG, eo, es, es_CO, es_CR, es_GT, es_NI, es_VE, et, eu, fa, fi, fo, fr, fr_BE, fr_CH, fr_DZ, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, kz, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, pt_BR, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tet, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, wo, yi, yo, zh, zh_CN, zh_HK, zh_TW
Aliases: jp → ja, cn → zh_CN.
API
- words2num(text, lang="en", to="cardinal")
Parse text and return int, float, or Decimal. to is one of cardinal, ordinal, ordinal_num, year, currency.
- words2num_sentence(text, lang="en", to="cardinal")
Walk a sentence, replacing every word-number with its numeric form. Returns a string. Aliases: convert_sentence, sentence_to_words.
How it works
English (lang_EN) ships a hand-written recursive-descent parser that handles the full grammar — including ordinal/cardinal mixing, decimals, year mode, “a hundred” filler, and negative tokens.
Every other locale uses Words2Num_Base, which lazily builds a {normalized_words: integer} table by calling num2words2 for each integer in a configurable range (defaults to -1..10000). This guarantees correctness for the lookup window for every locale supported upstream — at the cost of out-of-range values raising Words2NumError until a hand-written parser is added.
Hand-written parsers can be added incrementally per locale by overriding to_cardinal / to_ordinal in the corresponding lang_XX.py module — same pattern as num2words2.
Development
make install-dev
make test
make lint
make format
License
LGPL-2.1, mirroring num2words2.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file words2num2-0.2.1.tar.gz.
File metadata
- Download URL: words2num2-0.2.1.tar.gz
- Upload date:
- Size: 44.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f31e3eda125d8d7867437cab79dfcfcfb904d27f97242ac7155936ccbbfd964
|
|
| MD5 |
134efda36bfb807568d8582bc06c0f67
|
|
| BLAKE2b-256 |
3d4fb7065042eb9f13a7dc99817feff23dfbee123a36b002832552158cdc67d9
|
Provenance
The following attestation bundles were made for words2num2-0.2.1.tar.gz:
Publisher:
release.yml on jqueguiner/words2num2
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
words2num2-0.2.1.tar.gz -
Subject digest:
6f31e3eda125d8d7867437cab79dfcfcfb904d27f97242ac7155936ccbbfd964 - Sigstore transparency entry: 1422267324
- Sigstore integration time:
-
Permalink:
jqueguiner/words2num2@e5943fccda3f25b70e929fea911683d02f288628 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/jqueguiner
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e5943fccda3f25b70e929fea911683d02f288628 -
Trigger Event:
push
-
Statement type:
File details
Details for the file words2num2-0.2.1-py3-none-any.whl.
File metadata
- Download URL: words2num2-0.2.1-py3-none-any.whl
- Upload date:
- Size: 87.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
852c975667e1089e2d28550fa4c556b035701f83de1070342767462a578d0440
|
|
| MD5 |
64ac48043916ded5195841c1ef2c3e26
|
|
| BLAKE2b-256 |
03c95e0b8f0430b445558e0529604605d8c2cb1353a7799f4870e9f6567d63f1
|
Provenance
The following attestation bundles were made for words2num2-0.2.1-py3-none-any.whl:
Publisher:
release.yml on jqueguiner/words2num2
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
words2num2-0.2.1-py3-none-any.whl -
Subject digest:
852c975667e1089e2d28550fa4c556b035701f83de1070342767462a578d0440 - Sigstore transparency entry: 1422267415
- Sigstore integration time:
-
Permalink:
jqueguiner/words2num2@e5943fccda3f25b70e929fea911683d02f288628 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/jqueguiner
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e5943fccda3f25b70e929fea911683d02f288628 -
Trigger Event:
push
-
Statement type: