Skip to main content

A Python port of Facebook's Duckling: parse natural-language English and Arabic into structured numbers, dates, durations, money, and more.

Project description

Puckling

PyPI version Python Tests License: Apache 2.0

A Python port of Facebook Duckling, scoped to English and Arabic.

image

Puckling parses natural-language English and Arabic into structured values: numbers, ordinals, dates, durations, distances, temperatures, money, emails, URLs, phone numbers, and more.

The library has minimal dependencies (regex for PCRE-compatible Unicode patterns).

Installation

pip install puckling

Usage

import datetime as dt

from puckling import (
    AmountOfMoneyValue,
    Context,
    Lang,
    Locale,
    Options,
    TimeValue,
    parse,
)

ctx = Context(
    reference_time=dt.datetime(2013, 2, 12, 4, 30, tzinfo=dt.UTC),
    locale=Locale(Lang.EN),
)

for entity in parse("I'll meet you tomorrow at 5pm for $50", ctx, Options()):
    match entity.value:
        case TimeValue() as tv:
            print(entity.body, "→", tv.start_datetime(), "to", tv.end_datetime())
        case AmountOfMoneyValue(value=amount, currency=currency):
            print(entity.body, "→", amount, currency)

Switch Locale(Lang.EN) to Locale(Lang.AR) for Arabic input.

entity.value is one of the typed *Value dataclasses (AmountOfMoneyValue, DistanceValue, TimeValue, …) re-exported from puckling. Narrow with isinstance, match, or by passing dims=("amount_of_money",) to filter the parse to a single dimension. For TimeValue, start_datetime() and end_datetime() cover the instant / closed-interval / open-interval cases without an isinstance ladder; either may be None for an unbounded side.

Latent matches

Some inputs are only entities under a charitable reading. parse("on the 5th", …) returns just an ordinal by default. With with_latent=True it also surfaces a time entity for "the 5th of the next month" — flagged latent=True so callers can demote it:

parse("on the 5th", ctx, Options())                    # → [Ordinal(5)]
parse("on the 5th", ctx, Options(with_latent=True))    # → [Time(2013-03-05, latent=True)]

Supported dimensions

Dimension EN AR Notes
Numeral :white_check_mark: :white_check_mark: Cardinals, decimals, Arabic-Indic digits
Ordinal :white_check_mark: :white_check_mark:
Time :white_check_mark: :white_check_mark: Dates, clock times, holidays, intervals
Duration :white_check_mark: :white_check_mark:
Distance :white_check_mark: :white_check_mark:
Temperature :white_check_mark: :white_check_mark:
Quantity :white_check_mark: :white_check_mark:
Volume :white_check_mark: :white_check_mark:
AmountOfMoney :white_check_mark: :white_check_mark:
Email :white_check_mark: :white_check_mark: Locale-agnostic
URL :white_check_mark: :white_check_mark: Locale-agnostic
PhoneNumber :white_check_mark: :white_check_mark:
CreditCardNumber :white_check_mark: :white_check_mark: Locale-agnostic

Locale-agnostic dimensions (Email, URL, CreditCard) match across both Lang.EN and Lang.AR contexts.

Architecture

Puckling mirrors Duckling's parsing model in idiomatic, functional Python:

  • Rules are pure data: Rule(name, pattern, prod).
  • Patterns are tuples of RegexItem (regex over source text) and PredicateItem (predicates over existing tokens).
  • Productions are pure functions tuple[Token, ...] → Token | None.
  • The engine is a saturating fixed-point parser that applies rules iteratively until no new tokens appear.
  • Resolution is context-aware (reference time, locale) and dimension-specific.

All public types are @dataclass(frozen=True, slots=True) — no mutation. Parsed entity values are structured runtime dataclasses; access fields directly. Cross-dimension references go through predicates (is_numeral, is_grain, …), never imports, so each rule file stays independent.

Engine budgets

The saturating fixed-point parser is bounded by three caps to prevent runaway parses on pathological compositional inputs:

Options field Default Disable with
parse_timeout_ms 2000 None
max_tokens 10000 n/a
max_iterations 50 n/a

When any cap is hit, the engine returns the tokens it has accumulated so far (a valid, possibly partial parse). For offline corpus runs where you want unbounded analysis, pass Options(parse_timeout_ms=None).

Running scripts safely

Inline smoke tests should always be wrapped with the shell timeout so a runaway parse can't survive the calling shell:

timeout 5 uv run python -c "
from puckling import parse, Context, Locale, Lang, Options
import datetime as dt
ctx = Context(reference_time=dt.datetime.now(dt.UTC), locale=Locale(Lang.EN))
print(parse('tomorrow at 5pm', ctx, Options()))
"

The engine's own budget should be enough on its own, but the shell-level timeout is belt-and-suspenders against any future engine path that bypasses the budget check.

Development

  • Requires Python 3.13+.
  • Requires uv for dev dependencies.
uv sync --all-extras
uv run pytest

Adding a dimension or locale

To port a Duckling rule file, add:

src/puckling/dimensions/<dim>/<lang>/__init__.py
src/puckling/dimensions/<dim>/<lang>/rules.py     # exports RULES: tuple[Rule, ...]
src/puckling/dimensions/<dim>/<lang>/corpus.py    # exports CORPUS: tuple[Example, ...]
tests/dimensions/test_<dim>_<lang>.py

The registry auto-discovers any <dim>/<lang>/rules.py exporting RULES. No central registration list to update.

License

Apache-2.0, mirroring upstream Duckling.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

puckling-0.3.0.tar.gz (200.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

puckling-0.3.0-py3-none-any.whl (189.5 kB view details)

Uploaded Python 3

File details

Details for the file puckling-0.3.0.tar.gz.

File metadata

  • Download URL: puckling-0.3.0.tar.gz
  • Upload date:
  • Size: 200.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for puckling-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3af4956a1b9ed8775d95a51902ab9434e5a2a32688ce89083aecf6643e1068c0
MD5 e350ca220a90023a53da8a04a463f25b
BLAKE2b-256 55e7c05e1c74596278d66fb79a305dc4dee3c1d72a74aafbda1c29a8ac24008f

See more details on using hashes here.

Provenance

The following attestation bundles were made for puckling-0.3.0.tar.gz:

Publisher: publish.yml on Mazyod/puckling

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file puckling-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: puckling-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 189.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for puckling-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e4cbfb4b24eef365d0250212ebd2ccb5c1075627a9e10c9062a67e0f2ed69268
MD5 eec9f84a58714f951ba2d295ff71df4c
BLAKE2b-256 89a33ee53d974a985dc9498229dd5922ee8d06a33b6e7070ca23ecfab0ffadef

See more details on using hashes here.

Provenance

The following attestation bundles were made for puckling-0.3.0-py3-none-any.whl:

Publisher: publish.yml on Mazyod/puckling

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page