Skip to main content

Compiler front-end generator: grammar source to canonical JSON tables plus C/C++/Python/Lua drivers

Project description

uplox

Compiler front-end generator. Consumes a grammar specification and emits:

  1. A canonical JSON bundle describing the lexer DFA, parser tables, AST schema, and hook points.
  2. Driver skeletons in C, C++, Python, and Lua that load (or embed) those tables and run the front-end.

The JSON bundle is the contract. Backends are independent and replaceable. uplox itself never emits target-language code from grammar source directly — it always goes through the JSON.

Status

v2.0.0 — breaking change to the .uplox grammar source format: single-quote token literals ('(' not "("), <name> for every non-terminal reference (LHS and RHS), and a new %keyword_prefix / %keywords shortcut that collapses KW = "KW" boilerplate into a whitespace-separated list. v1 grammars do not parse under the new reader; all bundled examples and the self-host grammar were ported in this release. See docs/grammar_format.md for the current syntax and CHANGELOG.md for the full 1.x → 2.0.0 history.

Goals

  • Replace the hand-written front-ends in the uc* compiler family (uc80, uc386, uplm80, uada80, ucow, …) with a single generator.
  • Re-entrant by construction: every backend keeps all parser state in a uplox_<grammar>_ctx struct/object and prefixes generated symbols by grammar name. Two front-ends produced by uplox can link into the same binary without symbol collisions.
  • Pluggable hooks (pre-shift / pre-reduce / post-reduce / on-error) for context-sensitive concerns like name resolution, scoped symbol tables, and error recovery — without leaking those concerns into the grammar.
  • Lexer feedback for grammars that need it (the C typedef-name hack and friends) via a runtime token-filter callback paired with the TypedefTracker helper.

Module layout

src/uplox/
  spec/        grammar source reader, IR
  lex/         regex → NFA → DFA construction, minimization
  parse/       LR(1) item-set construction, table builder
    glr/       GLR extension
  ast/         node schema, default builder actions
  hooks/       pluggable parse-time callbacks (TypedefTracker, ScopedNameTable, …)
  tables/      JSON schema, canonical serializer
  gen/
    c/         C backend (re-entrant, context-struct based)
    cpp/       C++ backend (class per grammar, namespaced)
    py/        Python backend (embeds bundle, reuses runtime)
    lua/       Lua backend (single 5.3+ module per grammar)
  cli/         `uplox` command
tests/
examples/      .uplox grammars (calc, plm_subset, plm_full, c_subset, uplox_self, …)
docs/          DSL spec, per-backend notes

Phasing

All nine phases of the original plan landed in v1.0.0; the v1.1.0 polish round closed the items the v1.0.0 release notes called out as deferred at the cross-language level (token-filter ABI, post-reduce hook, GLR symmetry, self-host); v1.2.0 closed the post-1.0 deferred-grammar list for c_subset and lifted the action-body carve-out from the self-host grammar.

Phase Deliverable Shipped
1 Repo skeleton, pyproject, CI, license, grammar source format frozen 1.0.0
2 Lexer pipeline: regex → NFA → DFA → JSON, Python driver, tests 1.0.0
3 LR(1) parser builder, conflict reporting, JSON, Python driver 1.0.0
4 AST schema + default builder + hooks framework 1.0.0
5 Port the smallest uc* front-end as the first real grammar (plm_subset) 1.0.0
6 GLR extension 1.0.0
7 C backend (re-entrant) — the highest-value target 1.0.0
8 C++ and Lua backends 1.0.0
9 Port remaining uc* front-ends; declare v1 1.0.0
C / C++ / Lua: token-filter ABI + post-reduce hook 1.1.0
Self-host bootstrap (uplox_self.uplox) 1.1.0
LR(1) build perf (~30% on c_subset) 1.1.0
c_subset: variadic, multi-line #define, bit-fields 1.2.0
c_subset: designated initializers, compound literals 1.2.0
c_subset: full abstract declarators, _Generic, sizeof(type) 1.2.0
Lexer %balanced= for non-regular tokens; action bodies in uplox_self 1.2.0
%balanced= parity in JSON bundle + C / C++ / Lua / emitted-Python 1.3.0
Parse-error diagnostics (expected-token list, EOI rendering) parity 1.4.0
DSL rework: <name> non-terminals, '…' literals, %keywords 2.0.0

Install

pip install uplox        # PyPI distribution name

The PyPI distribution is uplox because the bare uplox name on PyPI is already taken by an unrelated matplotlib helper; the GitHub repo (avwohl/uplox) was renamed to match. The Python module, the CLI binary, and the .uplox grammar format all keep the original uplox name — pip install uplox installs a top-level import uplox and a uplox command.

Quick start

uplox version                                  # uplox 2.0.0 (schema 1)
uplox build  examples/calc.uplox -o calc.json   # grammar -> bundle
uplox check  examples/calc.uplox                # build + report conflicts
uplox parse  calc.json input.txt               # parse a file
uplox emit   calc.json --target=c   --out=gen  # emit C   driver
uplox emit   calc.json --target=cpp --out=gen  # emit C++ driver
uplox emit   calc.json --target=lua --out=gen  # emit Lua driver
uplox emit   calc.json --target=py  --out=gen  # emit Python driver

uplox build accepts --lex-only to omit the parse table when a host only needs the lexer; uplox parse accepts --glr to use the GLR runtime when the bundle preserves conflicts.

Lexer feedback (typedef-name and friends)

Some grammars need the parser to influence what the lexer returns — the canonical example is C's typedef foo; followed by foo x;, where foo lexes as a different terminal in the second position than the first. uplox handles this with a token filter callback the host installs alongside its post-reduce hook:

  • The post-reduce hook fires after every reduction; the host updates whatever state it needs (a typedef set, a scope stack, …).
  • The token filter receives every freshly fetched lookahead (and re-runs after every reduce) and returns the (possibly rewritten) terminal kind.

Both callbacks are exposed in all four backends. The Python runtime also ships uplox.hooks.TypedefTracker, a turnkey helper that implements the full classical hack on top of these primitives.

Bundled example grammars

  • calc — arithmetic expressions, the smoke test.
  • ambig_expr — classically ambiguous, exercises GLR.
  • scoped — block-scoped IDs, exercises hooks.
  • plm_subset — the first real grammar (Phase 5). Calc-style PL/M with declarations, IF/THEN/ELSE, DO blocks.
  • plm_full — extends plm_subset to the constructs Digital Research's BDOS uses (LITERALLY, INITIAL, AT, BASED, iterative DO TO/BY, DO CASE, structures). Parses the first 162 lines of the real bdos.plm source end-to-end.
  • c_subset — a meaningful subset of C: function definitions, full statement repertoire (if/else/while/for/do-while/switch/break/continue/goto/return), expressions with C precedence, struct/union/enum, casts, the typedef-name lexer hack, and a preprocessor skip. Parses 21 of uc80's example C programs (vendored as fixtures).
  • uplox_self — the .uplox DSL described in itself. Parses every other example grammar in this list, including its own definition — action bodies and all (the lexer's %balanced= extension matches { ... } runs by counting nested braces).

Documentation

License

GPL-3.0-or-later. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uplox-3.0.0.tar.gz (133.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uplox-3.0.0-py3-none-any.whl (104.0 kB view details)

Uploaded Python 3

File details

Details for the file uplox-3.0.0.tar.gz.

File metadata

  • Download URL: uplox-3.0.0.tar.gz
  • Upload date:
  • Size: 133.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for uplox-3.0.0.tar.gz
Algorithm Hash digest
SHA256 18112da6d6d8a0119a938abce5a2ba4a406eb5659459d63bcf9e3e30bc9a1618
MD5 d65c121291f790c85af821a9dd7aee49
BLAKE2b-256 b5e5fee59b187b36adcbbe92241a693bcdc1c19d701bb5a658a82d048e244311

See more details on using hashes here.

Provenance

The following attestation bundles were made for uplox-3.0.0.tar.gz:

Publisher: publish.yml on avwohl/uplox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file uplox-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: uplox-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 104.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for uplox-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9688d332566a041da596078eb3a54e8107e33d4907b1366ce62631e066d23464
MD5 f1d789ab6eb8748dfd204cc0eb641b2a
BLAKE2b-256 7b78700f085d8575c8c408ad40bf9d16790d4b24626f67aa56038ccb32e30d7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for uplox-3.0.0-py3-none-any.whl:

Publisher: publish.yml on avwohl/uplox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page