Skip to main content

A Python port of Facebook's Duckling: parse natural-language English and Arabic into structured numbers, dates, durations, money, and more.

Project description

Puckling

PyPI version Python Tests License: Apache 2.0

A Python port of Facebook Duckling, scoped to English and Arabic.

image

Puckling parses natural-language English and Arabic into structured values: numbers, ordinals, dates, durations, distances, temperatures, money, emails, URLs, phone numbers, and more.

The library has minimal dependencies (regex for PCRE-compatible Unicode patterns).

Installation

pip install puckling

Usage

import datetime as dt

from puckling import (
    AmountOfMoneyValue,
    Context,
    Lang,
    Locale,
    Options,
    TimeValue,
    parse,
)

ctx = Context(
    reference_time=dt.datetime(2013, 2, 12, 4, 30, tzinfo=dt.UTC),
    locale=Locale(Lang.EN),
)

for entity in parse("I'll meet you tomorrow at 5pm for $50", ctx, Options()):
    match entity.value:
        case TimeValue() as tv:
            print(entity.body, "→", tv.start_datetime(), "to", tv.end_datetime())
        case AmountOfMoneyValue(value=amount, currency=currency):
            print(entity.body, "→", amount, currency)

Switch Locale(Lang.EN) to Locale(Lang.AR) for Arabic input.

entity.value is one of the typed *Value dataclasses (AmountOfMoneyValue, DistanceValue, TimeValue, …) re-exported from puckling. Narrow with isinstance, match, or by passing dims=("amount_of_money",) to filter the parse to a single dimension. For TimeValue, start_datetime() and end_datetime() cover the instant / closed-interval / open-interval cases without an isinstance ladder; either may be None for an unbounded side.

Latent matches

Some inputs are only entities under a charitable reading. parse("on the 5th", …) returns just an ordinal by default. With with_latent=True it also surfaces a time entity for "the 5th of the next month" — flagged latent=True so callers can demote it:

parse("on the 5th", ctx, Options())                    # → [Ordinal(5)]
parse("on the 5th", ctx, Options(with_latent=True))    # → [Time(2013-03-05, latent=True)]

Supported dimensions

Dimension EN AR Notes
Numeral :white_check_mark: :white_check_mark: Cardinals, decimals, Arabic-Indic digits
Ordinal :white_check_mark: :white_check_mark:
Time :white_check_mark: :white_check_mark: Dates, clock times, holidays, intervals
Duration :white_check_mark: :white_check_mark:
Distance :white_check_mark: :white_check_mark:
Temperature :white_check_mark: :white_check_mark:
Quantity :white_check_mark: :white_check_mark:
Volume :white_check_mark: :white_check_mark:
AmountOfMoney :white_check_mark: :white_check_mark:
Email :white_check_mark: :white_check_mark: Locale-agnostic
URL :white_check_mark: :white_check_mark: Locale-agnostic
PhoneNumber :white_check_mark: :white_check_mark:
CreditCardNumber :white_check_mark: :white_check_mark: Locale-agnostic

Locale-agnostic dimensions (Email, URL, CreditCard) match across both Lang.EN and Lang.AR contexts.

Architecture

Puckling mirrors Duckling's parsing model in idiomatic, functional Python:

  • Rules are pure data: Rule(name, pattern, prod).
  • Patterns are tuples of RegexItem (regex over source text) and PredicateItem (predicates over existing tokens).
  • Productions are pure functions tuple[Token, ...] → Token | None.
  • The engine is a saturating fixed-point parser that applies rules iteratively until no new tokens appear.
  • Resolution is context-aware (reference time, locale) and dimension-specific.

All public types are @dataclass(frozen=True, slots=True) — no mutation. Parsed entity values are structured runtime dataclasses; access fields directly. Cross-dimension references go through predicates (is_numeral, is_grain, …), never imports, so each rule file stays independent.

Engine budgets

The saturating fixed-point parser is bounded by three caps to prevent runaway parses on pathological compositional inputs:

Options field Default Disable with
parse_timeout_ms 2000 None
max_tokens 10000 n/a
max_iterations 50 n/a

When any cap is hit, the engine returns the tokens it has accumulated so far (a valid, possibly partial parse). For offline corpus runs where you want unbounded analysis, pass Options(parse_timeout_ms=None).

Running scripts safely

Inline smoke tests should always be wrapped with the shell timeout so a runaway parse can't survive the calling shell:

timeout 5 uv run python -c "
from puckling import parse, Context, Locale, Lang, Options
import datetime as dt
ctx = Context(reference_time=dt.datetime.now(dt.UTC), locale=Locale(Lang.EN))
print(parse('tomorrow at 5pm', ctx, Options()))
"

The engine's own budget should be enough on its own, but the shell-level timeout is belt-and-suspenders against any future engine path that bypasses the budget check.

Development

  • Requires Python 3.13+.
  • Requires uv for dev dependencies.
uv sync --all-extras
uv run pytest

Adding a dimension or locale

To port a Duckling rule file, add:

src/puckling/dimensions/<dim>/<lang>/__init__.py
src/puckling/dimensions/<dim>/<lang>/rules.py     # exports RULES: tuple[Rule, ...]
src/puckling/dimensions/<dim>/<lang>/corpus.py    # exports CORPUS: tuple[Example, ...]
tests/dimensions/test_<dim>_<lang>.py

The registry auto-discovers any <dim>/<lang>/rules.py exporting RULES. No central registration list to update.

License

Apache-2.0, mirroring upstream Duckling.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

puckling-0.2.0.tar.gz (177.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

puckling-0.2.0-py3-none-any.whl (183.0 kB view details)

Uploaded Python 3

File details

Details for the file puckling-0.2.0.tar.gz.

File metadata

  • Download URL: puckling-0.2.0.tar.gz
  • Upload date:
  • Size: 177.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for puckling-0.2.0.tar.gz
Algorithm Hash digest
SHA256 dc84fd76a163ae550c094791f0535271309cc9faa4bc9688f5e19221352c3318
MD5 dab83a102716e118962c4d3c47001e71
BLAKE2b-256 3571673fb93103f2e7c6c10246fdfe4de26bd4e1bd1759fe7e5edcaa1bde02e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for puckling-0.2.0.tar.gz:

Publisher: publish.yml on Mazyod/puckling

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file puckling-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: puckling-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 183.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for puckling-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 770a7c8366b62745f3d0e209a7d7db7ea8fba35b1324567e6562369e6642bcc0
MD5 a8fb26ce2dab77167d5e51da381e280f
BLAKE2b-256 33bcec58ebdf0cadefec2e10096dc1d81af582bd45a37703ef2ae5707c6b0d69

See more details on using hashes here.

Provenance

The following attestation bundles were made for puckling-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Mazyod/puckling

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page