Deterministic RLang compiler with cryptographic proof generation for BoR (Blockchain of Reasoning)
Project description
RLang Compiler — Deterministic Reasoning Pipeline with Cryptographic Proof Generation
A first-principles compiler that translates RLang source code into executable reasoning pipelines with cryptographic proof generation compatible with the BoR (Blockchain of Reasoning) system. This compiler provides bit-for-bit deterministic execution suitable for trustless verification and cryptographic auditing.
Installation: pip install rlang-compiler
Documentation: See docs/compiler_physics.md for formal specification
Playbook: See docs/compiler_expansion_playbook.md for extension guidelines
Quick Onboarding Guide (Start Here)
What RLang Is
RLang is a deterministic domain-specific language (DSL) designed for building verifiable reasoning pipelines. The compiler translates RLang source code into a canonical intermediate representation (IR) that serves as the "physics layer" for deterministic execution. Every program execution produces a cryptographically verifiable proof bundle compatible with the BoR (Blockchain of Reasoning) system, enabling trustless verification of computation results.
The compiler enforces three non-negotiable invariants: deterministic semantics (same input always produces same output), deterministic proof shape (same execution always produces same trace), and single-source specification (canonical representation ensures hash stability). These invariants are analogous to physical laws—they cannot be violated without breaking fundamental guarantees.
Installation and Setup
Install via PyPI:
pip install rlang-compiler
Install for local development:
git clone https://github.com/your-org/Compiler_implementation.git
cd Compiler_implementation
pip install -e .[dev,test]
./run_all.sh
Minimal Working Example
Create a file examples/basic.rlang:
fn inc(x: Int) -> Int;
pipeline main(Int) -> Int {
inc
}
Compile and inspect output:
rlangc examples/basic.rlang --out out/basic.json
The output JSON contains the canonical IR representation of your program.
Proof Generation and Verification
Generate a proof bundle:
./verify_bundle.sh
Verify with BoR CLI:
borp verify-bundle --bundle out/rich_proof_bundle.json
These commands compile an RLang program, execute it with a provided input, generate a cryptographic proof bundle containing execution traces (TRP), and verify the bundle's integrity using BoR-compatible hashing (HMASTER, HRICH).
Python API Quickstart
from rlang.bor import run_program_with_proof
source = """
fn inc(x: Int) -> Int;
pipeline main(Int) -> Int { inc }
"""
bundle = run_program_with_proof(
source=source,
input_value=10,
fn_registry={"inc": lambda x: x + 1}
)
print("Output:", bundle.output_value) # 11
Determinism Demonstration (10-second test)
from rlang.bor import run_program_with_proof
import hashlib
import json
src = """
fn inc(x: Int) -> Int;
pipeline main(Int) -> Int { inc }
"""
def compute_hash():
b = run_program_with_proof(src, 42, fn_registry={"inc": lambda x: x + 1})
j = json.dumps(b.to_dict(), sort_keys=True)
return hashlib.sha256(j.encode()).hexdigest()
h1 = compute_hash()
h2 = compute_hash()
assert h1 == h2 # Always true: deterministic execution
print("Determinism verified:", h1 == h2)
This works because RLang execution is purely functional and deterministic—same program and input always produce identical proof bundles, enabling cryptographic verification.
End-to-End Compiler & Proof Flow
RLang Source
|
v
[Parser] → [Resolver] → [Type Checker]
|
v
[IR Lowering] → [Canonical JSON]
|
v
[Execution Engine] → [Proof Bundle] → [HRICH]
Or as a Mermaid diagram:
flowchart LR
A[RLang Source] --> B[Parser]
B --> C[Resolver]
C --> D[Type Checker]
D --> E[IR Lowering]
E --> F[Canonical JSON]
F --> G[Execution & Proof Generation]
G --> H[HRICH Verification]
Where to Go Next
- Language Specification: See
docs/language.mdfor complete RLang syntax and semantics - Compiler Architecture: See Architecture Overview for component classification and modification rules
- Proof System Documentation: See
docs/proof-system.mdfor BoR integration details - Developer Workflows: See Extension Guidelines for adding new language features
- Tests and Golden Files: See Testing & Verification for test suite structure
Table of Contents
- First Principles: The Three Non-Negotiable Invariants
- Architecture Overview
- Language Semantics (Formal)
- IR Specification: The Physics Layer
- Canonicalization Specification
- Execution Semantics
- Proof System Architecture
- The Untouchable Core (Frozen Physics)
- Expandable Surfaces (Safe to Extend)
- Quick Start
- API Reference
- Testing & Verification
- Extension Guidelines
1. First Principles: The Three Non-Negotiable Invariants
The RLang compiler is built on three non-negotiable invariants that define the "physics layer" of deterministic computation. These invariants are analogous to physical laws—they cannot be violated without breaking fundamental guarantees.
Invariant 1: Deterministic Semantics Invariant
Formal Definition:
For any RLang program P and input value x, there exists a unique output value y such that:
Eval(P, x) = y
This must hold regardless of:
- Execution environment (OS, hardware, Python version)
- Execution time (today vs. tomorrow)
- Execution order (if multiple valid orders exist, they must be equivalent)
- Random number generators (none allowed)
- External state (none allowed)
Mathematical Properties:
- Functionality:
∀P, x. ∃!y. Eval(P, x) = y - Idempotency:
Eval(P, x) = Eval(P, x)(always) - Compositionality:
Eval(P₁; P₂, x) = Eval(P₂, Eval(P₁, x))
Violation Examples:
FORBIDDEN: Using time.time() in function registry
FORBIDDEN: Reading from /dev/urandom
FORBIDDEN: Non-deterministic iteration order
FORBIDDEN: Floating-point operations that vary by platform
ALLOWED: Pure mathematical operations
ALLOWED: Deterministic string operations
ALLOWED: Fixed-order list operations
Invariant 2: Deterministic Proof Shape Invariant
Formal Definition:
For any RLang program P and input value x, there exists a unique execution trace trace such that:
TRP(P, x) = trace
The trace must be:
- Complete: Every step execution is recorded
- Ordered: Steps appear in execution order
- Deterministic: Same execution → same trace
- Canonical: Trace structure is stable across serializations
Trace Structure (TRP v1):
trace = {
"steps": [
{
"index": int, # 0-based step index
"step_name": str, # Function name
"template_id": str, # Template reference
"input": Any, # Input snapshot
"output": Any # Output snapshot
},
...
],
"branches": [
{
"index": int, # IF step index
"path": "then" | "else",
"condition_value": bool
},
...
]
}
Hash Invariants:
Hash(canonical(P)) = H_IR # Program IR hash
Hash(trace) = HRICH # Execution trace hash
Hash(H_IR | HRICH) = HMASTER # Master hash
Violation Examples:
FORBIDDEN: Recording steps in non-deterministic order
FORBIDDEN: Including timestamps in trace
FORBIDDEN: Non-deterministic trace serialization
FORBIDDEN: Omitting steps from trace
ALLOWED: Recording all steps in execution order
ALLOWED: Canonical JSON serialization
ALLOWED: Deterministic branch recording
Invariant 3: Single-Source Specification Invariant
Formal Definition:
For any RLang program P, there exists a unique canonical representation canonical(P) such that:
canonical(P₁) = canonical(P₂) ⟺ P₁ ≡ P₂
Where ≡ denotes semantic equivalence.
Canonical Representation Rules:
- Key Ordering: All dictionary keys must be sorted alphabetically
- Value Normalization: Floats normalized, integers preferred where possible
- Structure Stability: Same structure → same JSON string
- Encoding Stability: UTF-8, no BOM, consistent line endings
Hash Stability:
Hash(canonical(P)) = H_IR
This hash must be stable across:
- Different compiler versions (if semantics unchanged)
- Different platforms
- Different Python versions
- Different serialization libraries
Violation Examples:
FORBIDDEN: Non-deterministic key ordering
FORBIDDEN: Platform-dependent float formatting
FORBIDDEN: Non-canonical JSON serialization
FORBIDDEN: Including compiler metadata in canonical form
ALLOWED: Alphabetically sorted keys
ALLOWED: Normalized float representation
ALLOWED: Consistent JSON formatting
2. Architecture Overview
Compilation Pipeline
┌─────────┐
│ Source │
│ Code │
└────┬────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ FRONTEND (EXTENSION-SAFE) │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Lexer │───▶│ Parser │───▶│ Resolver │ │
│ │ │ │ │ │ │ │
│ │ PLUGGABLE│ │ PLUGGABLE│ │ PLUGGABLE│ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Type Checker │ │
│ │ │ │
│ │ EXTENSION- │ │
│ │ SAFE │ │
│ └──────────────┘ │
└──────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ MIDDLE-END (SAFE BUT STRICT) │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ Lowering │ │
│ │ │ │
│ │ MUST REMAIN │ │
│ │ DETERMINISTIC│ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ IR │ │
│ │ │ │
│ │ PHYSICS │ │
│ │ LAYER │ │
│ └──────┬───────┘ │
└─────────────────────────┼─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ BACKEND (VERY SENSITIVE) │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Canonicalizer│───▶│ Executor │ │
│ │ │ │ │ │
│ │ FIXED │ │ MUST REMAIN │ │
│ │ │ │ DETERMINISTIC│ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Canonical │ │ Proof Trace │ │
│ │ JSON │ │ (TRP v1) │ │
│ │ │ │ │ │
│ │ FIXED │ │ FIXED │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ └──────────┬─────────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Hashing │ │
│ │ │ │
│ │ HMASTER/ │ │
│ │ HRICH │ │
│ │ │ │
│ │ FIXED │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
Component Classification
| Component | Classification | Rationale |
|---|---|---|
| Lexer | PLUGGABLE |
Tokenization is syntax-level; can extend for new keywords/symbols |
| Parser | PLUGGABLE |
AST construction is syntax-level; can add new AST nodes |
| Resolver | PLUGGABLE |
Symbol resolution is syntax-level; can extend symbol table |
| Type Checker | EXTENSION-SAFE |
Type checking must remain deterministic but can add new types |
| Lowering | MUST REMAIN DETERMINISTIC |
IR generation must preserve semantics deterministically |
| IR | PHYSICS LAYER |
IR structure defines execution model; changes break proofs |
| Canonicalizer | FIXED |
Canonical JSON rules cannot change without breaking hashes |
| Executor | MUST REMAIN DETERMINISTIC |
Execution semantics must remain deterministic |
| Proof System | FIXED |
TRP structure is frozen; extensions via versioning |
| Hashing | FIXED |
Hash algorithms and structure are frozen |
3. Language Semantics (Formal)
Type System
Primitive Types
RLang defines five primitive types:
Int: 64-bit signed integers (Pythonint, unbounded)Float: IEEE 754 double-precision floating-point (Pythonfloat)String: UTF-8 encoded strings (Pythonstr)Bool: Boolean valuestrue/false(Pythonbool)Unit: Unit type (PythonNone)
Type Semantics:
Type ::= Int | Float | String | Bool | Unit
Type Equivalence:
Two types T₁ and T₂ are equivalent (T₁ ≡ T₂) if:
- Both are primitive and have the same name, OR
- Both are generic with same name and equivalent type arguments
Type Aliases
Type aliases provide semantic meaning:
type UserId = Int;
type Email = String;
Semantics:
type_alias ::= type IDENTIFIER = TypeExpr;
Type aliases are transparent during type checking—they resolve to their underlying types.
Expressions
Literal Expressions
Literal ::= INTEGER | FLOAT | STRING | BOOLEAN
Evaluation:
Eval(42) = 42
Eval(3.14) = 3.14
Eval("hello") = "hello"
Eval(true) = True
Eval(false) = False
Identifier Expressions
Identifier ::= IDENTIFIER
Special Identifiers:
__value: Current pipeline value (runtime context)
Evaluation:
Eval(__value, ctx) = ctx.current_value
Binary Operations
BinaryOp ::= Expr OP Expr
OP ::= + | - | * | / | > | < | >= | <= | == | !=
Arithmetic Operations:
Eval(e₁ + e₂, ctx) = Eval(e₁, ctx) + Eval(e₂, ctx)
Eval(e₁ - e₂, ctx) = Eval(e₁, ctx) - Eval(e₂, ctx)
Eval(e₁ * e₂, ctx) = Eval(e₁, ctx) * Eval(e₂, ctx)
Eval(e₁ / e₂, ctx) = Eval(e₁, ctx) / Eval(e₂, ctx) [if Eval(e₂, ctx) ≠ 0]
Comparison Operations:
Eval(e₁ > e₂, ctx) = Eval(e₁, ctx) > Eval(e₂, ctx)
Eval(e₁ < e₂, ctx) = Eval(e₁, ctx) < Eval(e₂, ctx)
Eval(e₁ >= e₂, ctx) = Eval(e₁, ctx) >= Eval(e₂, ctx)
Eval(e₁ <= e₂, ctx) = Eval(e₁, ctx) <= Eval(e₂, ctx)
Eval(e₁ == e₂, ctx) = Eval(e₁, ctx) == Eval(e₂, ctx)
Eval(e₁ != e₂, ctx) = Eval(e₁, ctx) != Eval(e₂, ctx)
Type Rules:
- Arithmetic:
Int + Int → Int,Float + Float → Float,Int + Float → Float - Comparison:
T × T → Bool(for comparable types)
Function Calls
Call ::= IDENTIFIER ( Expr₁, ..., Exprₙ )
Evaluation:
Eval(f(e₁, ..., eₙ), ctx) = fn_registry[f](Eval(e₁, ctx), ..., Eval(eₙ, ctx))
Type Rules:
f : T₁ × ... × Tₙ → T
e₁ : T₁, ..., eₙ : Tₙ
─────────────────────────
f(e₁, ..., eₙ) : T
Conditional Expressions (v0.2+)
IfExpr ::= if ( Expr ) { Steps } [ else { Steps } ]
Evaluation:
Eval(if (c) { s₁ } else { s₂ }, ctx) =
if Eval(c, ctx) then Eval(s₁, ctx) else Eval(s₂, ctx)
Type Rules:
c : Bool
s₁ : T
s₂ : T
─────────────────────────
if (c) { s₁ } else { s₂ } : T
Determinism Requirement:
The condition c must be a pure expression—no side effects, no randomness, no time-dependent operations.
Pipeline Semantics
Pipeline Definition
Pipeline ::= pipeline IDENTIFIER ( Type ) -> Type { Steps }
Steps ::= Step₁ -> Step₂ -> ... -> Stepₙ
Evaluation:
Eval(pipeline main(T_in) -> T_out { s₁ -> ... -> sₙ }, x) =
Eval(sₙ, Eval(sₙ₋₁, ..., Eval(s₁, x)...))
Composition:
Eval(s₁ -> s₂, x) = Eval(s₂, Eval(s₁, x))
Step Semantics
Function Step:
Eval(f, x) = fn_registry[f](x)
Conditional Step:
Eval(if (c) { s₁ } else { s₂ }, x) =
if Eval(c, x) then Eval(s₁, x) else Eval(s₂, x)
Deterministic Requirements
No Randomness
FORBIDDEN:
- Random number generation
- Non-deterministic algorithms
- Probabilistic data structures
No I/O
FORBIDDEN:
- File system access
- Network operations
- Standard input/output
- Environment variables (except compile-time)
No Time Dependence
FORBIDDEN:
- Timestamps
- System time
- Date/time operations
Fixed Evaluation Order
REQUIRED:
- Left-to-right evaluation
- Sequential pipeline execution
- Deterministic branch selection
4. IR Specification: The Physics Layer
The Intermediate Representation (IR) is the physics layer of RLang. It defines:
- What can be executed: Only IR nodes can appear in execution traces
- How execution proceeds: IR structure determines execution order
- What is provable: Only IR-level operations generate proof records
IR Invariants:
- Purity: Every IR node is pure (no side effects)
- Determinism: IR evaluation is deterministic
- Canonicalizability: Every IR node can be serialized to canonical JSON
- Completeness: All semantic constructs must lower to IR
Current IR Node Types (v0.2.2)
IRExpr
Base class for all expressions in IR.
@dataclass(frozen=True)
class IRExpr:
kind: str # "literal" | "identifier" | "binary_op" | "call" | "boolean_and" | "boolean_or" | "boolean_not" | "record" | "field_access" | "list"
# ... fields depend on kind
Kinds:
literal: Literal valuesidentifier: Variable references (e.g.,__value)binary_op: Binary operations (+,-,*,/,>,<, etc.)call: Function callsboolean_and: Boolean AND (&&)boolean_or: Boolean OR (||)boolean_not: Boolean NOT (!)record: Record construction{ field1: expr1, ... }field_access: Field accessobj.fieldlist: List construction[expr1, expr2, ...]
IRIf
Conditional execution node.
@dataclass(frozen=True)
class IRIf:
condition: IRExpr
then_steps: list[PipelineStepIR]
else_steps: list[PipelineStepIR]
Semantics:
- Condition must evaluate to
Bool - Both branches must produce same output type
- Execution is deterministic based on condition value
PipelineStepIR
Single step in a pipeline.
@dataclass(frozen=True)
class PipelineStepIR:
index: int
name: str
template_id: str
arg_types: list[str]
input_type: str | None
output_type: str | None
PipelineIR
Complete pipeline definition.
@dataclass(frozen=True)
class PipelineIR:
id: str
name: str
input_type: str | None
output_type: str | None
steps: list[PipelineStepIR | IRIf]
Rules for Adding New IR Nodes
Every new IR node MUST:
- Be Pure: No side effects, no hidden state
- Be Deterministic: Same inputs → same outputs
- Be Canonicalizable: Implement
to_dict()with sorted keys - Have Fixed Evaluation Order: No non-deterministic iteration
- Preserve Type Information: Include type annotations
Example: Adding IRRecord (v0.3)
@dataclass(frozen=True)
class IRRecord:
"""IR representation of a record construction."""
fields: dict[str, IRExpr] # Field name → expression
def to_dict(self) -> dict[str, Any]:
"""Canonical dictionary representation."""
return {
"fields": {
k: v.to_dict()
for k, v in sorted(self.fields.items()) # Sorted!
},
"kind": "record"
}
Key Point: Record fields must be sorted alphabetically to ensure canonical representation.
5. Canonicalization Specification
Canonical JSON is the stable serialization format that ensures:
- Same data structure → same JSON string
- Same JSON string → same hash
- Deterministic across platforms and Python versions
Key Ordering Rule
RULE: All dictionary keys must be sorted alphabetically.
Implementation:
def canonical_dumps(obj: Any) -> str:
return json.dumps(obj, sort_keys=True, separators=(",", ":"), ensure_ascii=False)
Example:
{"b": 2, "a": 1} → '{"a":1,"b":2}'
Why This Matters:
Non-deterministic key ordering breaks hash stability:
# WRONG
{"b": 2, "a": 1} → hash₁
{"a": 1, "b": 2} → hash₂ # Different hash!
# CORRECT
{"b": 2, "a": 1} → '{"a":1,"b":2}' → hash
{"a": 1, "b": 2} → '{"a":1,"b":2}' → hash # Same hash!
Float Normalization Rule
RULE: Floats must be normalized to ensure platform-independent representation.
Implementation:
def _normalize_floats(obj: Any) -> Any:
if isinstance(obj, float):
if obj.is_integer():
return int(obj) # 3.0 → 3
return round(obj, 10) # Round to 10 decimal places
elif isinstance(obj, dict):
return {k: _normalize_floats(v) for k, v in obj.items()}
elif isinstance(obj, list):
return [_normalize_floats(item) for item in obj]
return obj
Whitespace Rule
RULE: Minimal whitespace (compact JSON) unless indentation is explicitly requested.
Implementation:
# Compact (default)
json.dumps(obj, separators=(",", ":")) # No spaces
# Pretty (for debugging)
json.dumps(obj, indent=2) # 2-space indentation
Encoding Rule
RULE: UTF-8 encoding, no BOM, consistent line endings.
Implementation:
canonical_json.encode("utf-8")
What Breaks Determinism
FORBIDDEN:
- Non-deterministic key ordering
- Platform-dependent float representation
- Non-canonical JSON serialization
- Including metadata in canonical form
- Non-deterministic whitespace
REQUIRED:
- Alphabetically sorted keys
- Normalized floats
- Canonical JSON serialization
- Pure data structures only
- Consistent encoding
6. Execution Semantics
RLang execution is purely functional and deterministic:
- No mutable state
- No side effects
- No I/O operations
- No randomness
Function Application
Semantics:
Apply(f, x) = fn_registry[f](x)
Requirements:
fn_registry[f]must be a pure function- No side effects allowed
- Deterministic output for same input
Step Execution
Sequential Execution:
Execute([s₁, ..., sₙ], x₀) =
let x₁ = Execute(s₁, x₀) in
let x₂ = Execute(s₂, x₁) in
...
let xₙ = Execute(sₙ, xₙ₋₁) in
xₙ
Trace Recording:
Each step execution produces a StepExecutionRecord:
StepExecutionRecord(
index=i,
step_name=name,
template_id=template_id,
input_snapshot=xᵢ,
output_snapshot=xᵢ₊₁
)
Conditional Execution
Branch Selection:
Execute(IRIf(condition=c, then_steps=t, else_steps=e), x) =
if Eval(c, x) then
Execute(t, x)
else
Execute(e, x)
Branch Recording:
Each conditional execution produces a BranchExecutionRecord:
BranchExecutionRecord(
index=i,
path="then" | "else",
condition_value=bool
)
Determinism:
Same condition value → same branch path → same execution trace.
7. Proof System Architecture
TRP v1 (Current)
TRP (Trace of Reasoning Process) is the execution trace format.
Structure
PipelineProofBundle(
version: str,
language: str,
entry_pipeline: str | None,
program_ir: PrimaryProgramIR,
input_value: Any,
output_value: Any,
steps: List[StepExecutionRecord],
branches: List[BranchExecutionRecord]
)
Step Records
StepExecutionRecord(
index: int, # 0-based step index
step_name: str, # Function name
template_id: str, # Template reference
input_snapshot: Any, # Input value
output_snapshot: Any # Output value
)
Branch Records
BranchExecutionRecord(
index: int, # IF step index
path: str, # "then" | "else"
condition_value: bool # Condition evaluation result
)
Hashing Model
HMASTER
Definition:
HMASTER = Hash(canonical(program_ir))
Computation:
def compute_HMASTER(program_ir: PrimaryProgramIR) -> str:
canonical_json = program_ir.to_json()
return hashlib.sha256(canonical_json.encode("utf-8")).hexdigest()
Invariant:
Same program IR → same HMASTER.
HRICH
Definition:
HRICH = Hash(canonical(proof_bundle))
Computation:
def compute_HRICH(proof_bundle: PipelineProofBundle) -> str:
# Convert to rich bundle format
rich_bundle = {
"primary": {
"master": HMASTER,
"steps": [step.to_dict() for step in proof_bundle.steps],
"branches": [branch.to_dict() for branch in proof_bundle.branches]
},
"H_RICH": None # Computed below
}
# Compute subproof hashes
subproof_hashes = compute_subproof_hashes(subproofs)
# Compute HRICH from subproof hashes
HRICH = compute_HRICH_from_subproof_hashes(subproof_hashes)
return HRICH
Subproof Hashes:
subproof_hashes = {
"DIP": Hash(DIP_subproof),
"DP": Hash(DP_subproof),
"PEP": Hash(PEP_subproof),
"PoPI": Hash(PoPI_subproof),
"CCP": Hash(CCP_subproof),
"CMIP": Hash(CMIP_subproof),
"PP": Hash(PP_subproof),
"TRP": Hash(TRP_subproof)
}
HRICH Computation:
HRICH = SHA256(
sorted(subproof_hashes.values()).join("|")
)
Invariant:
Same execution trace → same HRICH.
8. The Untouchable Core (Frozen Physics)
These components MUST NEVER BE MODIFIED without breaking determinism guarantees:
| Component | Frozen? | Why? |
|---|---|---|
| Canonical JSON Rules | YES | Breaks HMASTER stability |
| Hash Algorithms | YES | Breaks verification |
| TRP Structure Rules | YES | Breaks proof compatibility |
| Branch Decision Semantics | YES | Breaks determinism |
| Deterministic Data Structures | YES | Breaks execution determinism |
| No Non-Deterministic Iteration | YES | Breaks execution determinism |
| No Mutation in IR | YES | Breaks purity |
Partially Frozen Components
These components can be extended but must preserve determinism:
| Component | Frozen? | Why? |
|---|---|---|
| AST → IR Lowering | PARTIAL | Must remain deterministic |
| Type System | PARTIAL | Can add types, but rules must be deterministic |
| Executor | PARTIAL | Semantics must remain deterministic |
| Parser | NO | Extensions allowed (new syntax) |
| Resolver | NO | Extensions allowed (new symbols) |
Modification Rules
Canonical JSON
NEVER CHANGE:
- Key sorting algorithm
- Float normalization rules
- JSON encoding (UTF-8)
- Whitespace rules
ALLOWED:
- Adding new fields to existing structures (if canonicalized correctly)
Hash Algorithms
NEVER CHANGE:
- SHA-256 algorithm
- Hash computation order
- Subproof hash structure
ALLOWED:
- Adding new hash types (with new names)
- Extending hash inputs (additive only)
TRP Structure
NEVER CHANGE:
- Step record structure (v1)
- Branch record structure (v1)
- Record field names
ALLOWED:
- Adding new record types (TRP v2)
- Extending existing records (additive fields)
9. Expandable Surfaces (Safe to Extend)
Frontend Extensions
Lexer
Safe to Add:
- New keywords
- New operators
- New literal types
- New comment styles
Parser
Safe to Add:
- New AST nodes
- New expression forms
- New statement types
Resolver
Safe to Add:
- New symbol kinds
- New scoping rules
- New name resolution strategies
Middle-End Extensions
Type System
Safe to Add:
- New primitive types
- New generic types
- New type constructors
Lowering
Safe to Add:
- New AST → IR lowering rules
- New IR node types (following IR invariants)
Backend Extensions
Executor
Safe to Add:
- New execution strategies
- New optimization passes
- New proof recording formats
Proof System
Safe to Add:
- New proof record types (TRP v2)
- New subproof types
- New verification strategies
Extension Guidelines
Before Adding:
- Verify determinism (same input → same output)
- Verify canonicalizability (can serialize to JSON)
- Verify purity (no side effects)
- Add tests (determinism tests required)
- Update documentation
After Adding:
- Run full test suite
- Verify hash stability
- Update golden files
- Document extension
10. Quick Start
Installation
pip install rlang-compiler
Basic Usage
Example 1: Simple Pipeline
fn inc(x: Int) -> Int;
pipeline main(Int) -> Int { inc }
Compile:
rlangc examples/simple.rlang --out out/simple.json
Example 2: Conditional Execution
fn double(x: Int) -> Int;
fn half(x: Int) -> Int;
pipeline main(Int) -> Int {
if (__value > 10) {
double
} else {
half
}
}
Example 3: Proof Generation
from rlang.bor import run_program_with_proof, RLangBoRCrypto
source = """
fn inc(x: Int) -> Int;
pipeline main(Int) -> Int { inc }
"""
bundle = run_program_with_proof(
source=source,
input_value=10,
fn_registry={"inc": lambda x: x + 1}
)
crypto = RLangBoRCrypto(bundle)
rich = crypto.to_rich_bundle()
print("HMASTER:", rich.rich["primary"]["master"])
print("HRICH:", rich.rich["H_RICH"])
Verification
# Generate proof bundle
python verify_proof_bundle.py
# Verify with BoR CLI
borp verify-bundle --bundle out/rich_proof_bundle.json
11. API Reference
Core Compiler API
from rlang import compile_source_to_ir, compile_source_to_json
# Compile to IR
result = compile_source_to_ir(
source="fn inc(x: Int) -> Int; pipeline main(Int) -> Int { inc }",
version="v0",
language="rlang"
)
# Compile to JSON
json_str = compile_source_to_json(
source="fn inc(x: Int) -> Int; pipeline main(Int) -> Int { inc }"
)
Proof Generation API
from rlang.bor import run_program_with_proof, RLangBoRCrypto
# Generate proof bundle
bundle = run_program_with_proof(
source=source,
input_value=10,
fn_registry={"inc": lambda x: x + 1}
)
# Convert to rich bundle
crypto = RLangBoRCrypto(bundle)
rich_bundle = crypto.to_rich_bundle()
CLI Usage
# Compile to stdout
rlangc program.rlang
# Compile to file
rlangc program.rlang --out output.json
# Specify entry pipeline
rlangc program.rlang --entry main --out output.json
12. Testing & Verification
Test Suite
The compiler includes 190+ tests covering:
- Lexer (tokenization, comments, floats)
- Parser (AST construction, operator precedence)
- Type Checker (type inference, type aliases, control flow)
- IR (lowering, primary IR construction)
- Emitter (end-to-end compilation)
- CLI (command-line interface)
- BoR Integration (proof generation, crypto hashing, CLI compatibility)
- Determinism (SHA256 comparison, tamper detection)
Running Tests
# Run all tests
pytest -q --disable-warnings
# Run specific test file
pytest tests/test_parser.py -v
# Run with coverage
pytest --cov=rlang
Determinism Verification
# Run deterministic test suite
./next_tests.sh
# Verify proof bundles
./verify_bundle.sh
Release Audit
# Run comprehensive release audit
./scripts/run_release_audit.sh
The audit checks:
- Environment reset
- Static code consistency
- Full test suite
- Determinism tests
- Golden file verification
- Canonical JSON boundary audit
- IR shape stability
- TRP audit
- Hash boundary tests
- CLI verification
- Packaging readiness
13. Extension Guidelines
For detailed extension guidelines, see docs/compiler_expansion_playbook.md.
Quick Checklist
When adding a new feature:
- Update grammar in
docs/compiler_physics.md - Add lexer tokens
- Add parser AST nodes
- Add resolver logic
- Add type checking rules
- Add IR node (if needed)
- Add lowering rules
- Add execution logic
- Verify canonicalization
- Add proof recording (if needed)
- Add comprehensive tests
- Update golden files
- Update documentation
Test Matrix
For each new construct:
- Parser tests (basic, nested, empty, invalid, edge cases)
- Typechecker tests (valid, invalid, inference, nested, edge cases)
- Lowering tests (basic, nested, deterministic, edge cases)
- IR tests (structure, canonical, deterministic, edge cases)
- Executor tests (basic, proof, deterministic, edge cases)
- Determinism tests (IR, H_IR, TRP, HRICH, cross-platform)
- Canonical JSON tests (sorted keys, float normalization, stable representation)
- Proof stability tests (branching, loops, collections, pattern matching)
References
- Formal Specification:
docs/compiler_physics.md— Complete deterministic execution & proof architecture specification - Extension Playbook:
docs/compiler_expansion_playbook.md— Implementation checklists, test matrices & modularization guide - Language Specification:
docs/language.md— RLang language syntax and semantics - Proof System:
docs/proof-system.md— BoR proof system integration
Status
Compiler: Fully functional (190+ tests passing)
Control Flow: Deterministic if/else in pipelines with type-checked branches
Proof Generation: Complete and deterministic, including branch-aware TRP subproofs
BoR Integration: Verified with borp verify-bundle
Determinism: Bit-for-bit reproducible including branch traces
Security: Tamper detection working for both steps and branches
Version: 0.2.2 (published to PyPI)
License: MIT License
Author: Kushagra Bhatnagar
Last Updated: November 2025
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rlang_compiler-0.2.3.tar.gz.
File metadata
- Download URL: rlang_compiler-0.2.3.tar.gz
- Upload date:
- Size: 177.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd5c2ae47daa9111732c43f14e22c2b45bdede22d5546e5ff7afa08dce66f57a
|
|
| MD5 |
48f2465f18dcd93af97dbc87abb47e36
|
|
| BLAKE2b-256 |
2f20c6a5d2c2d0a557ef826df2b47dca499cac68716ecedc27bde011558e1771
|
File details
Details for the file rlang_compiler-0.2.3-py3-none-any.whl.
File metadata
- Download URL: rlang_compiler-0.2.3-py3-none-any.whl
- Upload date:
- Size: 71.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2db8f0e97d9400b7252fd70ddadb111999b717fd5b0561573b2d561f4592639f
|
|
| MD5 |
248fb7128fbf3dca172e71eeb73379e5
|
|
| BLAKE2b-256 |
0a10b9732b65f4ace8a2354f1b26391f457e756e938db369698394f3fcd02ea9
|