A Python port of Facebook's Duckling: parse natural-language English and Arabic into structured numbers, dates, durations, money, and more.
Project description
Puckling
A Python port of Facebook Duckling, scoped to English and Arabic.
Puckling parses natural-language English and Arabic into structured values: numbers, ordinals, dates, durations, distances, temperatures, money, emails, URLs, phone numbers, and more.
The library has minimal dependencies (regex for PCRE-compatible Unicode patterns).
Installation
pip install puckling
Usage
import datetime as dt
from puckling import (
AmountOfMoneyValue,
Context,
Lang,
Locale,
Options,
TimeValue,
parse,
)
ctx = Context(
reference_time=dt.datetime(2013, 2, 12, 4, 30, tzinfo=dt.UTC),
locale=Locale(Lang.EN),
)
for entity in parse("I'll meet you tomorrow at 5pm for $50", ctx, Options()):
match entity.value:
case TimeValue() as tv:
print(entity.body, "→", tv.start_datetime(), "to", tv.end_datetime())
case AmountOfMoneyValue(value=amount, currency=currency):
print(entity.body, "→", amount, currency)
Switch Locale(Lang.EN) to Locale(Lang.AR) for Arabic input.
entity.value is one of the typed *Value dataclasses (AmountOfMoneyValue,
DistanceValue, TimeValue, …) re-exported from puckling. Narrow with
isinstance, match, or by passing dims=("amount_of_money",) to filter the
parse to a single dimension. For TimeValue, start_datetime() and
end_datetime() cover the instant / closed-interval / open-interval cases
without an isinstance ladder; either may be None for an unbounded side.
Latent matches
Some inputs are only entities under a charitable reading. parse("on the 5th", …) returns just an ordinal by default. With with_latent=True it also
surfaces a time entity for "the 5th of the next month" — flagged
latent=True so callers can demote it:
parse("on the 5th", ctx, Options()) # → [Ordinal(5)]
parse("on the 5th", ctx, Options(with_latent=True)) # → [Time(2013-03-05, latent=True)]
Supported dimensions
| Dimension | EN | AR | Notes |
|---|---|---|---|
| Numeral | :white_check_mark: | :white_check_mark: | Cardinals, decimals, Arabic-Indic digits |
| Ordinal | :white_check_mark: | :white_check_mark: | |
| Time | :white_check_mark: | :white_check_mark: | Dates, clock times, holidays, intervals |
| Duration | :white_check_mark: | :white_check_mark: | |
| Distance | :white_check_mark: | :white_check_mark: | |
| Temperature | :white_check_mark: | :white_check_mark: | |
| Quantity | :white_check_mark: | :white_check_mark: | |
| Volume | :white_check_mark: | :white_check_mark: | |
| AmountOfMoney | :white_check_mark: | :white_check_mark: | |
| :white_check_mark: | :white_check_mark: | Locale-agnostic | |
| URL | :white_check_mark: | :white_check_mark: | Locale-agnostic |
| PhoneNumber | :white_check_mark: | :white_check_mark: | |
| CreditCardNumber | :white_check_mark: | :white_check_mark: | Locale-agnostic |
Locale-agnostic dimensions (Email, URL, CreditCard) match across both
Lang.ENandLang.ARcontexts.
Architecture
Puckling mirrors Duckling's parsing model in idiomatic, functional Python:
- Rules are pure data:
Rule(name, pattern, prod). - Patterns are tuples of
RegexItem(regex over source text) andPredicateItem(predicates over existing tokens). - Productions are pure functions
tuple[Token, ...] → Token | None. - The engine is a saturating fixed-point parser that applies rules iteratively until no new tokens appear.
- Resolution is context-aware (reference time, locale) and dimension-specific.
All public types are @dataclass(frozen=True, slots=True) — no mutation. Parsed entity values are structured runtime dataclasses; access fields directly. Cross-dimension references go through predicates (is_numeral, is_grain, …), never imports, so each rule file stays independent.
Engine budgets
The saturating fixed-point parser is bounded by three caps to prevent runaway parses on pathological compositional inputs:
Options field |
Default | Disable with |
|---|---|---|
parse_timeout_ms |
2000 |
None |
max_tokens |
10000 |
n/a |
max_iterations |
50 |
n/a |
When any cap is hit, the engine returns the tokens it has accumulated so far (a valid, possibly partial parse). For offline corpus runs where you want unbounded analysis, pass Options(parse_timeout_ms=None).
Running scripts safely
Inline smoke tests should always be wrapped with the shell timeout so a runaway parse can't survive the calling shell:
timeout 5 uv run python -c "
from puckling import parse, Context, Locale, Lang, Options
import datetime as dt
ctx = Context(reference_time=dt.datetime.now(dt.UTC), locale=Locale(Lang.EN))
print(parse('tomorrow at 5pm', ctx, Options()))
"
The engine's own budget should be enough on its own, but the shell-level timeout is belt-and-suspenders against any future engine path that bypasses the budget check.
Development
- Requires Python 3.13+.
- Requires
uvfor dev dependencies.
uv sync --all-extras
uv run pytest
Adding a dimension or locale
To port a Duckling rule file, add:
src/puckling/dimensions/<dim>/<lang>/__init__.py
src/puckling/dimensions/<dim>/<lang>/rules.py # exports RULES: tuple[Rule, ...]
src/puckling/dimensions/<dim>/<lang>/corpus.py # exports CORPUS: tuple[Example, ...]
tests/dimensions/test_<dim>_<lang>.py
The registry auto-discovers any <dim>/<lang>/rules.py exporting RULES. No central registration list to update.
License
Apache-2.0, mirroring upstream Duckling.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file puckling-0.4.0.tar.gz.
File metadata
- Download URL: puckling-0.4.0.tar.gz
- Upload date:
- Size: 213.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8c2f7e2ac5b5bfa3484ae42fad6913aeb7d58db9dc0fbe26220a46a7f378d9c
|
|
| MD5 |
8df627bd74c58b81c92ea5db8fa1513a
|
|
| BLAKE2b-256 |
8820b507d77381cd13727ac55df99179cbe17c1f5cdfb5f771a5022e67ac0661
|
Provenance
The following attestation bundles were made for puckling-0.4.0.tar.gz:
Publisher:
publish.yml on Mazyod/puckling
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
puckling-0.4.0.tar.gz -
Subject digest:
a8c2f7e2ac5b5bfa3484ae42fad6913aeb7d58db9dc0fbe26220a46a7f378d9c - Sigstore transparency entry: 1439707948
- Sigstore integration time:
-
Permalink:
Mazyod/puckling@c25a55ef5dae1014dc5d5eea54c2e0c7ece0016d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Mazyod
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c25a55ef5dae1014dc5d5eea54c2e0c7ece0016d -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file puckling-0.4.0-py3-none-any.whl.
File metadata
- Download URL: puckling-0.4.0-py3-none-any.whl
- Upload date:
- Size: 193.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2048074517bd8569553a11c54bd6fa0e61e7979472b000339808d9cbc331fac4
|
|
| MD5 |
1afc155191a9ae002f759822c676a392
|
|
| BLAKE2b-256 |
11e824541511c339c82b4859147a3339ee4c8fe83938fce3a7b127907abb0531
|
Provenance
The following attestation bundles were made for puckling-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on Mazyod/puckling
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
puckling-0.4.0-py3-none-any.whl -
Subject digest:
2048074517bd8569553a11c54bd6fa0e61e7979472b000339808d9cbc331fac4 - Sigstore transparency entry: 1439708038
- Sigstore integration time:
-
Permalink:
Mazyod/puckling@c25a55ef5dae1014dc5d5eea54c2e0c7ece0016d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Mazyod
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c25a55ef5dae1014dc5d5eea54c2e0c7ece0016d -
Trigger Event:
workflow_dispatch
-
Statement type: