Compiler front-end generator: grammar source to canonical JSON tables plus C/C++/Python/Lua drivers
Project description
uplox
Compiler front-end generator. Consumes a grammar specification and emits:
- A canonical JSON bundle describing the lexer DFA, parser tables, AST schema, and hook points.
- Driver skeletons in C, C++, Python, and Lua that load (or embed) those tables and run the front-end.
The JSON bundle is the contract. Backends are independent and replaceable. uplox itself never emits target-language code from grammar source directly — it always goes through the JSON.
Status
v2.0.0 — breaking change to the .uplox grammar source format:
single-quote token literals ('(' not "("), <name> for every
non-terminal reference (LHS and RHS), and a new %keyword_prefix /
%keywords shortcut that collapses KW = "KW" boilerplate into a
whitespace-separated list. v1 grammars do not parse under the new
reader; all bundled examples and the self-host grammar were ported in
this release. See docs/grammar_format.md
for the current syntax and CHANGELOG.md for the full
1.x → 2.0.0 history.
Goals
- Replace the hand-written front-ends in the
uc*compiler family (uc80, uc386, uplm80, uada80, ucow, …) with a single generator. - Re-entrant by construction: every backend keeps all parser state in a
uplox_<grammar>_ctxstruct/object and prefixes generated symbols by grammar name. Two front-ends produced by uplox can link into the same binary without symbol collisions. - Pluggable hooks (pre-shift / pre-reduce / post-reduce / on-error) for context-sensitive concerns like name resolution, scoped symbol tables, and error recovery — without leaking those concerns into the grammar.
- Lexer feedback for grammars that need it (the C typedef-name hack and friends) via a runtime token-filter callback paired with the
TypedefTrackerhelper.
Module layout
src/uplox/
spec/ grammar source reader, IR
lex/ regex → NFA → DFA construction, minimization
parse/ LR(1) item-set construction, table builder
glr/ GLR extension
ast/ node schema, default builder actions
hooks/ pluggable parse-time callbacks (TypedefTracker, ScopedNameTable, …)
tables/ JSON schema, canonical serializer
gen/
c/ C backend (re-entrant, context-struct based)
cpp/ C++ backend (class per grammar, namespaced)
py/ Python backend (embeds bundle, reuses runtime)
lua/ Lua backend (single 5.3+ module per grammar)
cli/ `uplox` command
tests/
examples/ .uplox grammars (calc, plm_subset, plm_full, c_subset, uplox_self, …)
docs/ DSL spec, per-backend notes
Phasing
All nine phases of the original plan landed in v1.0.0; the v1.1.0 polish round closed the items the v1.0.0 release notes called out as deferred at the cross-language level (token-filter ABI, post-reduce hook, GLR symmetry, self-host); v1.2.0 closed the post-1.0 deferred-grammar list for c_subset and lifted the action-body carve-out from the self-host grammar.
| Phase | Deliverable | Shipped |
|---|---|---|
| 1 | Repo skeleton, pyproject, CI, license, grammar source format frozen | 1.0.0 |
| 2 | Lexer pipeline: regex → NFA → DFA → JSON, Python driver, tests | 1.0.0 |
| 3 | LR(1) parser builder, conflict reporting, JSON, Python driver | 1.0.0 |
| 4 | AST schema + default builder + hooks framework | 1.0.0 |
| 5 | Port the smallest uc* front-end as the first real grammar (plm_subset) |
1.0.0 |
| 6 | GLR extension | 1.0.0 |
| 7 | C backend (re-entrant) — the highest-value target | 1.0.0 |
| 8 | C++ and Lua backends | 1.0.0 |
| 9 | Port remaining uc* front-ends; declare v1 | 1.0.0 |
| — | C / C++ / Lua: token-filter ABI + post-reduce hook | 1.1.0 |
| — | Self-host bootstrap (uplox_self.uplox) |
1.1.0 |
| — | LR(1) build perf (~30% on c_subset) | 1.1.0 |
| — | c_subset: variadic, multi-line #define, bit-fields |
1.2.0 |
| — | c_subset: designated initializers, compound literals | 1.2.0 |
| — | c_subset: full abstract declarators, _Generic, sizeof(type) |
1.2.0 |
| — | Lexer %balanced= for non-regular tokens; action bodies in uplox_self |
1.2.0 |
| — | %balanced= parity in JSON bundle + C / C++ / Lua / emitted-Python |
1.3.0 |
| — | Parse-error diagnostics (expected-token list, EOI rendering) parity | 1.4.0 |
| — | DSL rework: <name> non-terminals, '…' literals, %keywords |
2.0.0 |
Install
pip install uplox # PyPI distribution name
The PyPI distribution is uplox because the bare uplox name on PyPI
is already taken by an unrelated matplotlib helper; the GitHub repo
(avwohl/uplox) was renamed to
match. The Python module, the CLI binary, and the .uplox grammar
format all keep the original uplox name — pip install uplox
installs a top-level import uplox and a uplox command.
Quick start
uplox version # uplox 2.0.0 (schema 1)
uplox build examples/calc.uplox -o calc.json # grammar -> bundle
uplox check examples/calc.uplox # build + report conflicts
uplox parse calc.json input.txt # parse a file
uplox emit calc.json --target=c --out=gen # emit C driver
uplox emit calc.json --target=cpp --out=gen # emit C++ driver
uplox emit calc.json --target=lua --out=gen # emit Lua driver
uplox emit calc.json --target=py --out=gen # emit Python driver
uplox build accepts --lex-only to omit the parse table when a host
only needs the lexer; uplox parse accepts --glr to use the GLR
runtime when the bundle preserves conflicts.
Lexer feedback (typedef-name and friends)
Some grammars need the parser to influence what the lexer returns —
the canonical example is C's typedef foo; followed by foo x;,
where foo lexes as a different terminal in the second position than
the first. uplox handles this with a token filter callback the host
installs alongside its post-reduce hook:
- The post-reduce hook fires after every reduction; the host updates whatever state it needs (a typedef set, a scope stack, …).
- The token filter receives every freshly fetched lookahead (and re-runs after every reduce) and returns the (possibly rewritten) terminal kind.
Both callbacks are exposed in all four backends. The Python runtime
also ships uplox.hooks.TypedefTracker, a turnkey helper that
implements the full classical hack on top of these primitives.
Bundled example grammars
calc— arithmetic expressions, the smoke test.ambig_expr— classically ambiguous, exercises GLR.scoped— block-scoped IDs, exercises hooks.plm_subset— the first real grammar (Phase 5). Calc-style PL/M with declarations, IF/THEN/ELSE, DO blocks.plm_full— extendsplm_subsetto the constructs Digital Research's BDOS uses (LITERALLY, INITIAL, AT, BASED, iterative DO TO/BY, DO CASE, structures). Parses the first 162 lines of the realbdos.plmsource end-to-end.c_subset— a meaningful subset of C: function definitions, full statement repertoire (if/else/while/for/do-while/switch/break/continue/goto/return), expressions with C precedence, struct/union/enum, casts, the typedef-name lexer hack, and a preprocessor skip. Parses 21 of uc80's example C programs (vendored as fixtures).uplox_self— the .uplox DSL described in itself. Parses every other example grammar in this list, including its own definition — action bodies and all (the lexer's%balanced=extension matches{ ... }runs by counting nested braces).
Documentation
docs/grammar_format.md— the.uploxDSL spec.docs/c_backend.md— generated C API.docs/cpp_backend.md— generated C++ API.docs/lua_backend.md— generated Lua module.docs/py_backend.md— generated Python module.CHANGELOG.md— versioned release notes.
License
GPL-3.0-or-later. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uplox-3.0.0.tar.gz.
File metadata
- Download URL: uplox-3.0.0.tar.gz
- Upload date:
- Size: 133.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18112da6d6d8a0119a938abce5a2ba4a406eb5659459d63bcf9e3e30bc9a1618
|
|
| MD5 |
d65c121291f790c85af821a9dd7aee49
|
|
| BLAKE2b-256 |
b5e5fee59b187b36adcbbe92241a693bcdc1c19d701bb5a658a82d048e244311
|
Provenance
The following attestation bundles were made for uplox-3.0.0.tar.gz:
Publisher:
publish.yml on avwohl/uplox
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
uplox-3.0.0.tar.gz -
Subject digest:
18112da6d6d8a0119a938abce5a2ba4a406eb5659459d63bcf9e3e30bc9a1618 - Sigstore transparency entry: 1484955597
- Sigstore integration time:
-
Permalink:
avwohl/uplox@cba9a64dbb27d842f6d91c74e897d7b63617a86b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/avwohl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cba9a64dbb27d842f6d91c74e897d7b63617a86b -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file uplox-3.0.0-py3-none-any.whl.
File metadata
- Download URL: uplox-3.0.0-py3-none-any.whl
- Upload date:
- Size: 104.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9688d332566a041da596078eb3a54e8107e33d4907b1366ce62631e066d23464
|
|
| MD5 |
f1d789ab6eb8748dfd204cc0eb641b2a
|
|
| BLAKE2b-256 |
7b78700f085d8575c8c408ad40bf9d16790d4b24626f67aa56038ccb32e30d7c
|
Provenance
The following attestation bundles were made for uplox-3.0.0-py3-none-any.whl:
Publisher:
publish.yml on avwohl/uplox
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
uplox-3.0.0-py3-none-any.whl -
Subject digest:
9688d332566a041da596078eb3a54e8107e33d4907b1366ce62631e066d23464 - Sigstore transparency entry: 1484955658
- Sigstore integration time:
-
Permalink:
avwohl/uplox@cba9a64dbb27d842f6d91c74e897d7b63617a86b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/avwohl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cba9a64dbb27d842f6d91c74e897d7b63617a86b -
Trigger Event:
workflow_dispatch
-
Statement type: