words2num2

Inverse of num2words2: convert spoken-form numbers back to numeric values across 100+ languages.

These details have not been verified by PyPI

Project links

Project description

The inverse of num2words2.

words2num2 parses spoken-form numbers — "forty-two", "trois cent quatre", "二十三" — and returns numeric values. It mirrors num2words2’s locale list (100+ languages, 120 dispatch entries) and adds a free-text auto-parse mode that handles currencies, units, configurable thousands/decimal separators, and ASR/LLM-style mixed text.

The project is hosted on GitHub. Contributions are welcome.

Why this library

Existing inverse libraries are usually English-only, lack a sentence mode, and don’t compose with the locale defaults you already use for the forward direction. words2num2:

Accepts the same locale codes as num2words2 so the two libraries are drop-in inverses of each other.
Has a hand-written grammar parser for English and a generic reverse-lookup backend that auto-derives {words → number} tables from num2words2 for every other locale out of the box.
Walks free text via words2num_sentence / auto_parse_sentence — useful when post-processing ASR transcripts, LLM output, or user-typed forms that mix words and digits.
Handles currency symbols ($ € £ ¥ ₹ ₽ ₩ ₺), ISO codes (USD/EUR/...), scale shortcuts ($5m → 5,000,000), units (length / mass / temperature / time / volume / percent), and CLDR-style number formats per locale.
Pluralizes long-form units in expand mode (5 dollars / 1 dollar, 5 feet / 1 foot, 5 yen / 1 yen).

Installation

pip:

pip install words2num2

Arch Linux / Manjaro (AUR):

# With an AUR helper
yay -S python-words2num2
paru -S python-words2num2

# Or manually
git clone https://aur.archlinux.org/python-words2num2.git
cd python-words2num2
makepkg -si

From source:

git clone https://github.com/jqueguiner/words2num2
cd words2num2
pip install -e .

num2words2 is a runtime dependency for the generic multi-language backend and is installed automatically.

Quickstart

>>> from words2num2 import words2num, words2num_sentence
>>> words2num("forty-two")
42
>>> words2num("one thousand two hundred thirty-four")
1234
>>> words2num("minus seven")
-7
>>> words2num("three point one four")
Decimal('3.14')
>>> words2num("nineteen ninety nine", to="year")
1999
>>> words2num("twenty-first", to="ordinal")
21
>>> words2num("quarante-deux", lang="fr")
42
>>> words2num("zweiundvierzig", lang="de")
42
>>> words2num("сорок два", lang="ru")
42

>>> words2num_sentence("I bought twenty-three apples and fourteen pears.")
'I bought 23 apples and 14 pears.'

Auto-parse mode

auto_parse extracts a numeric value plus its unit from any free-text expression. auto_parse_sentence walks running text and replaces every quantity in place. It supports configurable thousands/decimal separators per locale, currency symbols and ISO codes, scale shortcuts, SI/imperial units, percent, and disambiguation hints.

>>> from words2num2 import auto_parse, auto_parse_sentence

# Currencies
>>> auto_parse("$12,345.00")
Quantity(value=12345.0, unit='USD', kind='currency', confidence=1.0)
>>> auto_parse("$5m").value
5000000
>>> auto_parse("12,50 €", lang="de").value
12.5

# Units
>>> auto_parse("5cm")
Quantity(value=5, unit='cm', kind='length', confidence=1.0)
>>> auto_parse("20°C").kind
'temperature'
>>> auto_parse("forty-two kg").value
42

# Configurable separators
>>> auto_parse("1.234,56", lang="de").value
1234.56
>>> auto_parse("1 234,56", lang="fr").value
1234.56

# Disambiguation for ambiguous unit tokens
>>> auto_parse("5m", prefer={"m": "mile"}).unit_long
'mile'

# Sentence mode
>>> auto_parse_sentence("Pay $12.50 for 5kg of apples at -5°C.")
'Pay 12.5 USD for 5 kg of apples at -5 °C.'

# Expand mode renders the long unit form, with English plural rules
>>> auto_parse_sentence("Pay $12.50 for 5kg.", expand=True)
'Pay 12.5 dollars for 5 kilograms.'
>>> auto_parse_sentence("Pay $1.00 for 1kg.", expand=True)
'Pay 1 dollar for 1 kilogram.'
>>> auto_parse_sentence("5 ft and 1 ft.", expand=True)
'5 feet and 1 foot.'

Configurable number formats

parse_number_string is the primitive used by auto_parse for digit-form numbers. You can call it directly with explicit separators or rely on per-locale CLDR-style defaults:

>>> from words2num2 import parse_number_string

>>> parse_number_string("12,345.67")                              # auto-detect
12345.67
>>> parse_number_string("12.345,67", lang="de")                   # German defaults
12345.67
>>> parse_number_string("1 234,56", lang="fr")                    # French defaults (NBSP)
1234.56
>>> parse_number_string("12'345.67", thousands_sep="'", decimal_sep=".")  # Swiss
12345.67
>>> parse_number_string("1_234.56", thousands_sep="_")            # programmer
1234.56

The locale defaults table covers 50+ locales: English/CJK use comma thousands and period decimal; French uses non-breaking-space + comma; Swiss French uses apostrophe + period; German/Spanish/Italian/Portuguese/ Dutch/Romanian use period + comma; Russian/Scandinavian/Slavic use space + comma. See words2num2/formats.py for the full table.

Auto-detection heuristic (when no override and no locale match):

If both . and , appear, the rightmost one is the decimal.
If one separator appears multiple times, it is thousands.
If one separator appears once with exactly 3 trailing digits, it is thousands; otherwise it is decimal.
Spaces, NBSP, apostrophe, and underscore are always thousands.

Command line

$ words2num2 "forty-two"
42
$ words2num2 "trois cent quatre" --lang=fr
304
$ words2num2 "twenty-third" --to=ordinal
23

Supported locales

words2num2 mirrors num2words2’s locale list — 120 dispatch entries including:

af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, ce, cs, cy, da, de, el, en, en_IN, en_NG, eo, es, es_CO, es_CR, es_GT, es_NI, es_VE, et, eu, fa, fi, fo, fr, fr_BE, fr_CH, fr_DZ, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, kz, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, pt_BR, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tet, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, wo, yi, yo, zh, zh_CN, zh_HK, zh_TW

Aliases: jp → ja, cn → zh_CN.

Conversion types

The to= parameter accepts cardinal, ordinal, ordinal_num, year, and currency — same set as num2words2.

How it works

English (lang_EN) ships a hand-written recursive-descent parser that handles cardinals, ordinals, decimals, negatives, scale words to centillion, year mode, “and” connectors, and hyphenation.
Every other locale uses Words2Num_Base, which lazily builds a {normalized_words: integer} table by calling num2words2 for each integer in a configurable range (defaults to -1..10000). This guarantees correctness for the lookup window for every locale supported upstream — at the cost of out-of-range values raising Words2NumError until a hand-written parser is added.

Hand-written grammar parsers can be added incrementally per locale by overriding to_cardinal / to_ordinal in the corresponding words2num2/lang_XX.py module — same pattern as num2words2.

Public API

Function / class	Purpose
words2num(text, lang, to)	Parse a single word-form number.
words2num_sentence(text, ...)	Replace every word-number in running text.
auto_parse(text, ...)	Parse a single quantity (number + unit).
auto_parse_sentence(text, ...)	Replace every quantity in running text.
parse_number_string(text, ...)	Digit-form parser with separators.
Quantity	Dataclass returned by auto_parse.
UNITS / CURRENCIES	Registries of recognized units and currencies.
NUMBER_FORMAT_DEFAULTS	Per-locale separator defaults.
CONVERTER_CLASSES	Per-locale converter registry.
Words2NumError	Raised when input cannot be parsed.

See REFERENCE.md for the full API reference with parameters, return types, and examples.

Development

git clone https://github.com/jqueguiner/words2num2
cd words2num2
make install-dev
make test          # pytest
make lint          # black + flake8 + isort
make format        # apply black + isort

Releasing

Every push of a tag matching v* triggers GitHub Actions to:

Build sdist + wheel.
Run the test installation in a clean environment.
Generate release notes and create a GitHub Release.
Publish to PyPI via Trusted Publishing (no token in CI).

To cut a release:

git tag vX.Y.Z
git push origin vX.Y.Z

A manual fallback workflow (Publish to PyPI (manual)) is available via gh workflow run and uses PYPI_API_TOKEN / TEST_PYPI_API_TOKEN repo secrets.

Changelog

See CHANGELOG.md.

License

LGPL-2.1, mirroring num2words2. See COPYING.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0.dev2 pre-release

May 2, 2026

0.3.0.dev1 pre-release

May 1, 2026

0.2.3

May 1, 2026

This version

0.2.2

May 1, 2026

0.2.1

May 1, 2026

0.2.0

May 1, 2026

0.1.1

May 1, 2026

0.1.0

May 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

words2num2-0.2.2.tar.gz (64.0 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

words2num2-0.2.2-py3-none-any.whl (89.7 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file words2num2-0.2.2.tar.gz.

File metadata

Download URL: words2num2-0.2.2.tar.gz
Upload date: May 1, 2026
Size: 64.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for words2num2-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`b8557179b5bb45b3e3fca21103e116b15ffd4e729001ada786817378b3f9ce8a`
MD5	`e1f7b89af9214e42b50f5bdfc4d05040`
BLAKE2b-256	`befa6fdee20ef4ac5fcd1c082d0d195e01f2ecbfa4b100e5fce9428a9a1e1938`

See more details on using hashes here.

File details

Details for the file words2num2-0.2.2-py3-none-any.whl.

File metadata

Download URL: words2num2-0.2.2-py3-none-any.whl
Upload date: May 1, 2026
Size: 89.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for words2num2-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cbbccb75333de5862e1a14b6402d5215e5f195537105f18564b50694c5b6c027`
MD5	`0d8e2539962e7bf7e3c35970d56ed076`
BLAKE2b-256	`e49a9b5cdc9a2f0356b8b1a3a4c08f86cee8d9403ec1ea05bf9b8fd7dbbedc2c`

See more details on using hashes here.

words2num2 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Why this library

Installation

Quickstart

Auto-parse mode

Configurable number formats

Command line

Supported locales

Conversion types

How it works

Public API

Development

Releasing

Changelog

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes