Deterministic semantic compiler core: SIR → RLang → ProofBundle with stable hashes.
Project description
Semantic Compiler
A compiler that bridges natural language descriptions to verifiable code through a semantic intermediate representation (SIR).
SIR v0.1 Scope and Version Boundary
This section defines the current stable scope of SIR v0.1, clarifies what is intentionally out-of-scope, explains the compiler maturity level, and outlines the roadmap for future versions.
Part A — What SIR v0.1 Supports (Current Stable Scope)
SIR v0.1 provides a formally defined, deterministic compilation pipeline for scalar integer decision rules. The current stable scope includes:
Input Semantics:
- Single scalar integer input (
Int) - Pipeline signature:
pipeline main(Int) -> Int
Conditional Logic:
- Comparison operators:
gt,ge,lt,le,eq,neqon scalar input - Boolean combiners:
all(),any(),not()with nested combinations - Pure predicate expressions operating on the scalar input value
Output Semantics:
- Scalar constant integer outputs
- Deterministic value assignment via
SetOutputsteps
Deterministic Pipeline Guarantees: The compilation pipeline enforces strict determinism at every stage:
Natural Language → SIR (scalar) → RLang → IR → Runtime → Proof → HMASTER
For any valid SIR v0.1 pipeline and input value, the following invariants hold:
- Same input → same SIR representation
- Same SIR → same RLang source code (byte-for-byte identical)
- Same RLang → same canonical IR
- Same IR → same execution trace (TRP)
- Same execution → same proof bundle (HMASTER, HRICH)
Everything within this scope is fully supported, deterministic, and proof-producing. The compiler guarantees cryptographic verification compatibility with the RLang backend.
Part B — What is Out-of-Scope for SIR v0.1
The following language features are intentionally unimplemented in SIR v0.1. These are not bugs, but planned extensions for future versions:
Record Types and Multi-Field Inputs:
- Record types (e.g.,
{income: Int, age: Int}) - Field access operations (e.g.,
transaction.amount,user.age) - Multi-field decision rules (e.g., "if income > 50000 and age < 30")
Output Type Limitations:
- String or enum outputs
- Record-type outputs
- Negative literal outputs (e.g.,
-1,-50)
Control Flow Limitations:
- Complex control flow beyond scalar Boolean guards
- Loops, pattern matching, or other advanced constructs
Error Messages Explained:
When the compiler encounters out-of-scope constructs, it produces clear error messages:
-
"Attempted field access on non-record type Int"— Occurs when SIR contains field references (e.g.,"income","age") but the pipeline input type isIntrather than a record type. This is correct behavior: SIR v0.1 enforces scalar-only semantics. -
"ParseError: Expected '(' ..."— Occurs when SIR contains unsupported operators or syntax that cannot be lowered to RLang v0.2.x. The compiler correctly rejects constructs outside the v0.1 boundary.
These errors indicate that the compiler is enforcing version boundaries correctly, not that the system is broken.
Part C — Compiler Maturity Level
The SIR v0.1 compiler is production-ready within its defined scope. The following guarantees are verified and stable:
Determinism:
- Compilation is deterministic: identical SIR produces identical RLang source code
- Execution is deterministic: identical inputs produce identical outputs
- Proof generation is deterministic: identical executions produce identical proof bundles
IR Canonicalization:
- IR canonicalization is stable and produces byte-for-byte identical JSON for identical programs
- Hash stability is guaranteed: same program → same HMASTER hash
TRP Logging:
- Trace of Reasoning Process (TRP) recording is stable and deterministic
- Branch-level and step-level traces are complete and canonical
Proof System:
- Proof bundles (HMASTER, H_IR, HRICH) are correct and match runtime execution
- Cryptographic verification is compatible with BoR (Blockchain of Reasoning) standards
Test Coverage:
- Determinism tests pass: same input → same output across multiple runs
- Hash stability tests pass: same program → same HMASTER across compiler versions
- Integration tests verify end-to-end pipeline correctness
Version Stack:
SIR v0.1 → RLang v0.2.x → IR v0.2.x → BoR Proof → HMASTER
The current version behaves like a formally defined v0.1 language with stable semantics, deterministic compilation, and cryptographic proof generation.
Part D — Roadmap to SIR v0.2 (Records)
SIR v0.2 will extend the language with record types and multi-field decision rules. This is a natural semantic extension, not a patch or workaround.
Planned Extensions:
SIR v0.2:
- Record input types:
{field1: Type1, field2: Type2, ...} - Field access in predicates:
{"op": "gt", "args": ["income", 50000]} - Multi-field Boolean combinations:
{"combiner": "all", "terms": [pred1, pred2]}where predicates reference different fields
RLang v0.3:
- Record type declarations:
Record { income: Int, age: Int } - Field access expressions:
__value.income,__value.age - Record literal construction for outputs
Pipeline Lowering:
- SIR→RLang lowering will become multi-field aware
- Field references will be correctly mapped to RLang record access
- Type inference will propagate record types through the pipeline
Semantic Boundary: The semantic boundary between SIR v0.1 and v0.2 is clean: v0.1 handles scalar-only semantics, v0.2 adds structured data. This extension preserves all v0.1 guarantees while enabling richer decision rules.
Multi-field rules like "if income > 50000 and age < 30 then return 'young-high' else return 'other'" will become valid SIR v0.2 programs that compile to RLang v0.3 and produce deterministic proof bundles.
Part E — Negative Literal Naming Fix (Future Enhancement)
Negative literal outputs (e.g., -1, -50) currently fail during RLang code generation due to function name generation constraints.
Root Cause:
The RLang emitter generates function names using the pattern ret_<value>. For negative values, this produces invalid identifiers like ret_-1, which violates RLang syntax rules (function names cannot contain hyphens in this position).
Planned Fix: The RLang emitter will be updated to map negative literals to valid function names:
-1→ret_neg_1-50→ret_neg_50- Positive values remain unchanged:
1→ret_1
This is a pure naming fix in the code generation layer. It does not affect SIR semantics, IR structure, or proof generation. The fix will be implemented in the SIR→RLang lowering pipeline without breaking existing functionality.
Part F — Semantic Boundary Diagram
The semantic boundary defines where natural language descriptions are converted into structured, verifiable code:
┌─────────────────────────────────────────────────────────────┐
│ Natural Language │
│ "If value > 10 then return 1 else return 0" │
└───────────────────────┬─────────────────────────────────────┘
│ LLM Bridge
↓
┌─────────────────────────────────────────────────────────────┐
│ SIR v0.1 (Semantic Boundary) │
│ • Scalar Int input │
│ • Comparison operators (gt, lt, ge, le, eq, neq) │
│ • Boolean combiners (all, any, not) │
│ • Scalar Int output │
└───────────────────────┬─────────────────────────────────────┘
│ Deterministic Compilation
↓
┌─────────────────────────────────────────────────────────────┐
│ RLang v0.2.x │
│ • Canonical source code │
│ • Deterministic formatting │
└───────────────────────┬─────────────────────────────────────┘
│ RLang Compiler (PyPI)
↓
┌─────────────────────────────────────────────────────────────┐
│ IR v0.2.x │
│ • Canonical intermediate representation │
│ • Hash-stable serialization │
└───────────────────────┬─────────────────────────────────────┘
│ Runtime Execution
↓
┌─────────────────────────────────────────────────────────────┐
│ Proof Bundle │
│ • HMASTER (program IR hash) │
│ • HRICH (execution trace hash) │
│ • TRP (Trace of Reasoning Process) │
└─────────────────────────────────────────────────────────────┘
The semantic boundary (SIR v0.1) enforces strict type safety and deterministic semantics. Natural language that describes constructs outside this boundary (e.g., multi-field rules, string outputs) will be rejected with clear error messages until SIR v0.2 extends the boundary.
Architecture
TEXT (LLM output)
↓
SIR (Semantic IR) <-- validated, structured, semantic JSON/dict
↓
RLang source <-- compiled deterministically
↓
RLang Compiler (PyPI) <-- already published
↓
IR + Proof
Installation
Installing from PyPI
pip install semantic-compiler-core
Development Installation
- Install dependencies:
pip install -e .
- Create a
.envfile based on.env.example:
cp .env.example .env
- Add your OpenAI API key to
.env:
OPENAI_API_KEY="your-actual-key-here"
Usage
Natural Language → SIR (requires LLM)
from semantic_compiler.llm_bridge import text_to_sir
# Convert natural language to SIR
sir = text_to_sir("If the user's age is greater than 18, approve the request.")
Quick Start (Deterministic Core)
The deterministic core library (semantic-compiler-core) provides a clean API for SIR → RLang → ProofBundle compilation without any LLM dependencies:
from semantic_compiler_core import compile_sir_to_proof
sir = {
"type": "DecisionPipeline",
"name": "main",
"input_name": "value",
"steps": [
{
"type": "Decision",
"condition": {"op": "gt", "args": ["value", 10]},
"then_steps": [{"type": "SetOutput", "value": 1}],
"else_steps": [{"type": "SetOutput", "value": 0}],
}
],
}
bundle = compile_sir_to_proof(sir, input_value=10)
print(bundle["hashes"]["HMASTER"])
This uses ONLY the deterministic path: SIR → RLang → ProofBundle → hashes.
No LLM calls are made in this flow.
Project Structure
semantic_compiler/- Main packageinit_llm.py- LLM initialization using langchain-openaillm_bridge.py- Natural language → SIR conversionsir_model.py- SIR dataclasses (to be implemented)sir_validator.py- SIR validation (to be implemented)sir_to_rlang.py- SIR → RLang compiler (to be implemented)rlang_runtime.py- Wrapper around PyPI rlang-compiler
examples/- Example inputs and outputstests/- Test suite
Development Status
This is the initial project skeleton. Next steps:
- Implement full SIR v0.1 dataclasses
- Complete SIR validation
- Implement SIR → RLang compiler
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semantic_compiler_core-0.1.0.tar.gz.
File metadata
- Download URL: semantic_compiler_core-0.1.0.tar.gz
- Upload date:
- Size: 26.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9e8cfaa6804d118c7b7c0bc6477893c0216c780f582049dbb2c02b82fd980d8
|
|
| MD5 |
53ef2f9e50184f2334f9cc40d7e53e78
|
|
| BLAKE2b-256 |
3c7448e3936ae9acf12f8fd4856a6471e685c981d4cd429cf24f0b4e574f1645
|
File details
Details for the file semantic_compiler_core-0.1.0-py3-none-any.whl.
File metadata
- Download URL: semantic_compiler_core-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f4caebaf1e6dea995f2f0d7468674e1e2467ce05ab902e1e45ad7d30805e57b
|
|
| MD5 |
f61d538e64b306aa987b23484b69b189
|
|
| BLAKE2b-256 |
a922e1d50e9371a8a697baedb29494c599670db126c29853b91caa8fe21b16b4
|