Skip to main content

A Python port of Facebook's Duckling: parse natural-language English and Arabic into structured numbers, dates, durations, money, and more.

Project description

Puckling

PyPI version Python Tests License: Apache 2.0

A Python port of Facebook Duckling, scoped to English and Arabic.

image

Puckling parses natural-language English and Arabic into structured values: numbers, ordinals, dates, durations, distances, temperatures, money, emails, URLs, phone numbers, and more.

The library has minimal dependencies (regex for PCRE-compatible Unicode patterns).

Installation

pip install puckling

Usage

import datetime as dt

from puckling import (
    AmountOfMoneyValue,
    Context,
    Lang,
    Locale,
    Options,
    TimeValue,
    parse,
)

ctx = Context(
    reference_time=dt.datetime(2013, 2, 12, 4, 30, tzinfo=dt.UTC),
    locale=Locale(Lang.EN),
)

for entity in parse("I'll meet you tomorrow at 5pm for $50", ctx, Options()):
    match entity.value:
        case TimeValue() as tv:
            print(entity.body, "→", tv.start_datetime(), "to", tv.end_datetime())
        case AmountOfMoneyValue(value=amount, currency=currency):
            print(entity.body, "→", amount, currency)

Switch Locale(Lang.EN) to Locale(Lang.AR) for Arabic input.

entity.value is one of the typed *Value dataclasses (AmountOfMoneyValue, DistanceValue, TimeValue, …) re-exported from puckling. Narrow with isinstance, match, or by passing dims=("amount_of_money",) to filter the parse to a single dimension. For TimeValue, start_datetime() and end_datetime() cover the instant / closed-interval / open-interval cases without an isinstance ladder; either may be None for an unbounded side.

Latent matches

Some inputs are only entities under a charitable reading. parse("on the 5th", …) returns just an ordinal by default. With with_latent=True it also surfaces a time entity for "the 5th of the next month" — flagged latent=True so callers can demote it:

parse("on the 5th", ctx, Options())                    # → [Ordinal(5)]
parse("on the 5th", ctx, Options(with_latent=True))    # → [Time(2013-03-05, latent=True)]

Supported dimensions

Dimension EN AR Notes
Numeral :white_check_mark: :white_check_mark: Cardinals, decimals, Arabic-Indic digits
Ordinal :white_check_mark: :white_check_mark:
Time :white_check_mark: :white_check_mark: Dates, clock times, holidays, intervals
Duration :white_check_mark: :white_check_mark:
Distance :white_check_mark: :white_check_mark:
Temperature :white_check_mark: :white_check_mark:
Quantity :white_check_mark: :white_check_mark:
Volume :white_check_mark: :white_check_mark:
AmountOfMoney :white_check_mark: :white_check_mark:
Email :white_check_mark: :white_check_mark: Locale-agnostic
URL :white_check_mark: :white_check_mark: Locale-agnostic
PhoneNumber :white_check_mark: :white_check_mark:
CreditCardNumber :white_check_mark: :white_check_mark: Locale-agnostic

Locale-agnostic dimensions (Email, URL, CreditCard) match across both Lang.EN and Lang.AR contexts.

Architecture

Puckling mirrors Duckling's parsing model in idiomatic, functional Python:

  • Rules are pure data: Rule(name, pattern, prod).
  • Patterns are tuples of RegexItem (regex over source text) and PredicateItem (predicates over existing tokens).
  • Productions are pure functions tuple[Token, ...] → Token | None.
  • The engine is a saturating fixed-point parser that applies rules iteratively until no new tokens appear.
  • Resolution is context-aware (reference time, locale) and dimension-specific.

All public types are @dataclass(frozen=True, slots=True) — no mutation. Parsed entity values are structured runtime dataclasses; access fields directly. Cross-dimension references go through predicates (is_numeral, is_grain, …), never imports, so each rule file stays independent.

Engine budgets

The saturating fixed-point parser is bounded by three caps to prevent runaway parses on pathological compositional inputs:

Options field Default Disable with
parse_timeout_ms 2000 None
max_tokens 10000 n/a
max_iterations 50 n/a

When any cap is hit, the engine returns the tokens it has accumulated so far (a valid, possibly partial parse). For offline corpus runs where you want unbounded analysis, pass Options(parse_timeout_ms=None).

Running scripts safely

Inline smoke tests should always be wrapped with the shell timeout so a runaway parse can't survive the calling shell:

timeout 5 uv run python -c "
from puckling import parse, Context, Locale, Lang, Options
import datetime as dt
ctx = Context(reference_time=dt.datetime.now(dt.UTC), locale=Locale(Lang.EN))
print(parse('tomorrow at 5pm', ctx, Options()))
"

The engine's own budget should be enough on its own, but the shell-level timeout is belt-and-suspenders against any future engine path that bypasses the budget check.

Development

  • Requires Python 3.13+.
  • Requires uv for dev dependencies.
uv sync --all-extras
uv run pytest

Adding a dimension or locale

To port a Duckling rule file, add:

src/puckling/dimensions/<dim>/<lang>/__init__.py
src/puckling/dimensions/<dim>/<lang>/rules.py     # exports RULES: tuple[Rule, ...]
src/puckling/dimensions/<dim>/<lang>/corpus.py    # exports CORPUS: tuple[Example, ...]
tests/dimensions/test_<dim>_<lang>.py

The registry auto-discovers any <dim>/<lang>/rules.py exporting RULES. No central registration list to update.

License

Apache-2.0, mirroring upstream Duckling.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

puckling-0.1.2.tar.gz (161.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

puckling-0.1.2-py3-none-any.whl (181.5 kB view details)

Uploaded Python 3

File details

Details for the file puckling-0.1.2.tar.gz.

File metadata

  • Download URL: puckling-0.1.2.tar.gz
  • Upload date:
  • Size: 161.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for puckling-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8e4422e08b77f23eed871d29a73dd2ba109dc3523167a09d4cb84eb447c6a09b
MD5 550e04d629021741fdc98fb21ea4ea77
BLAKE2b-256 397cc54e0ca4139c3d449e5323b89cd617a59209bf988ac28bbcdd87a2ff037d

See more details on using hashes here.

Provenance

The following attestation bundles were made for puckling-0.1.2.tar.gz:

Publisher: publish.yml on Mazyod/puckling

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file puckling-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: puckling-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 181.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for puckling-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8a1faf449864c69a7e4cee6ed975ab5715322f4df83e70f7a729ade845836a36
MD5 7a37dea1f3cc02327ed3c429981d9af2
BLAKE2b-256 1fcab1db82db5b2eb3a9cf561ac48af0c334ff1b16b19534625a4ddca96a3ad3

See more details on using hashes here.

Provenance

The following attestation bundles were made for puckling-0.1.2-py3-none-any.whl:

Publisher: publish.yml on Mazyod/puckling

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page