Skip to main content

Deterministic RLang compiler with cryptographic proof generation for BoR (Blockchain of Reasoning)

Project description

RLang Compiler — Deterministic Reasoning Pipeline with Cryptographic Proof Generation

Version Build Status Determinism BoR Verification Tests

A first-principles compiler that translates RLang source code into executable reasoning pipelines with cryptographic proof generation compatible with the BoR (Blockchain of Reasoning) system. This compiler provides bit-for-bit deterministic execution suitable for trustless verification and cryptographic auditing.

Installation: pip install rlang-compiler
Documentation: See docs/compiler_physics.md for formal specification
Playbook: See docs/compiler_expansion_playbook.md for extension guidelines


Quick Onboarding Guide (Start Here)

What RLang Is

RLang is a deterministic domain-specific language (DSL) designed for building verifiable reasoning pipelines. The compiler translates RLang source code into a canonical intermediate representation (IR) that serves as the "physics layer" for deterministic execution. Every program execution produces a cryptographically verifiable proof bundle compatible with the BoR (Blockchain of Reasoning) system, enabling trustless verification of computation results.

The compiler enforces three non-negotiable invariants: deterministic semantics (same input always produces same output), deterministic proof shape (same execution always produces same trace), and single-source specification (canonical representation ensures hash stability). These invariants are analogous to physical laws—they cannot be violated without breaking fundamental guarantees.

Installation and Setup

Install via PyPI:

pip install rlang-compiler

Install for local development:

git clone https://github.com/your-org/Compiler_implementation.git
cd Compiler_implementation
pip install -e .[dev,test]
./run_all.sh

Minimal Working Example

Create a file examples/basic.rlang:

fn inc(x: Int) -> Int;

pipeline main(Int) -> Int {
  inc
}

Compile and inspect output:

rlangc examples/basic.rlang --out out/basic.json

The output JSON contains the canonical IR representation of your program.

Proof Generation and Verification

Generate a proof bundle:

./verify_bundle.sh

Verify with BoR CLI:

borp verify-bundle --bundle out/rich_proof_bundle.json

These commands compile an RLang program, execute it with a provided input, generate a cryptographic proof bundle containing execution traces (TRP), and verify the bundle's integrity using BoR-compatible hashing (HMASTER, HRICH).

Python API Quickstart

from rlang.bor import run_program_with_proof

source = """
fn inc(x: Int) -> Int;
pipeline main(Int) -> Int { inc }
"""

bundle = run_program_with_proof(
    source=source,
    input_value=10,
    fn_registry={"inc": lambda x: x + 1}
)

print("Output:", bundle.output_value)  # 11

Determinism Demonstration (10-second test)

from rlang.bor import run_program_with_proof
import hashlib
import json

src = """
fn inc(x: Int) -> Int;
pipeline main(Int) -> Int { inc }
"""

def compute_hash():
    b = run_program_with_proof(src, 42, fn_registry={"inc": lambda x: x + 1})
    j = json.dumps(b.to_dict(), sort_keys=True)
    return hashlib.sha256(j.encode()).hexdigest()

h1 = compute_hash()
h2 = compute_hash()
assert h1 == h2  # Always true: deterministic execution
print("Determinism verified:", h1 == h2)

This works because RLang execution is purely functional and deterministic—same program and input always produce identical proof bundles, enabling cryptographic verification.

End-to-End Compiler & Proof Flow

RLang Source
    |
    v
[Parser] → [Resolver] → [Type Checker]
    |
    v
[IR Lowering] → [Canonical JSON]
    |
    v
[Execution Engine] → [Proof Bundle] → [HRICH]

Or as a Mermaid diagram:

flowchart LR
    A[RLang Source] --> B[Parser]
    B --> C[Resolver]
    C --> D[Type Checker]
    D --> E[IR Lowering]
    E --> F[Canonical JSON]
    F --> G[Execution & Proof Generation]
    G --> H[HRICH Verification]

Where to Go Next


Table of Contents

  1. First Principles: The Three Non-Negotiable Invariants
  2. Architecture Overview
  3. Language Semantics (Formal)
  4. IR Specification: The Physics Layer
  5. Canonicalization Specification
  6. Execution Semantics
  7. Proof System Architecture
  8. The Untouchable Core (Frozen Physics)
  9. Expandable Surfaces (Safe to Extend)
  10. Quick Start
  11. API Reference
  12. Testing & Verification
  13. Extension Guidelines

1. First Principles: The Three Non-Negotiable Invariants

The RLang compiler is built on three non-negotiable invariants that define the "physics layer" of deterministic computation. These invariants are analogous to physical laws—they cannot be violated without breaking fundamental guarantees.

Invariant 1: Deterministic Semantics Invariant

Formal Definition:

For any RLang program P and input value x, there exists a unique output value y such that:

Eval(P, x) = y

This must hold regardless of:

  • Execution environment (OS, hardware, Python version)
  • Execution time (today vs. tomorrow)
  • Execution order (if multiple valid orders exist, they must be equivalent)
  • Random number generators (none allowed)
  • External state (none allowed)

Mathematical Properties:

  • Functionality: ∀P, x. ∃!y. Eval(P, x) = y
  • Idempotency: Eval(P, x) = Eval(P, x) (always)
  • Compositionality: Eval(P₁; P₂, x) = Eval(P₂, Eval(P₁, x))

Violation Examples:

FORBIDDEN: Using time.time() in function registry
FORBIDDEN: Reading from /dev/urandom
FORBIDDEN: Non-deterministic iteration order
FORBIDDEN: Floating-point operations that vary by platform

ALLOWED: Pure mathematical operations
ALLOWED: Deterministic string operations
ALLOWED: Fixed-order list operations

Invariant 2: Deterministic Proof Shape Invariant

Formal Definition:

For any RLang program P and input value x, there exists a unique execution trace trace such that:

TRP(P, x) = trace

The trace must be:

  • Complete: Every step execution is recorded
  • Ordered: Steps appear in execution order
  • Deterministic: Same execution → same trace
  • Canonical: Trace structure is stable across serializations

Trace Structure (TRP v1):

trace = {
    "steps": [
        {
            "index": int,           # 0-based step index
            "step_name": str,        # Function name
            "template_id": str,     # Template reference
            "input": Any,           # Input snapshot
            "output": Any           # Output snapshot
        },
        ...
    ],
    "branches": [
        {
            "index": int,           # IF step index
            "path": "then" | "else",
            "condition_value": bool
        },
        ...
    ]
}

Hash Invariants:

Hash(canonical(P)) = H_IR          # Program IR hash
Hash(trace) = HRICH                 # Execution trace hash
Hash(H_IR | HRICH) = HMASTER        # Master hash

Violation Examples:

FORBIDDEN: Recording steps in non-deterministic order
FORBIDDEN: Including timestamps in trace
FORBIDDEN: Non-deterministic trace serialization
FORBIDDEN: Omitting steps from trace

ALLOWED: Recording all steps in execution order
ALLOWED: Canonical JSON serialization
ALLOWED: Deterministic branch recording

Invariant 3: Single-Source Specification Invariant

Formal Definition:

For any RLang program P, there exists a unique canonical representation canonical(P) such that:

canonical(P₁) = canonical(P₂) ⟺ P₁ ≡ P₂

Where denotes semantic equivalence.

Canonical Representation Rules:

  1. Key Ordering: All dictionary keys must be sorted alphabetically
  2. Value Normalization: Floats normalized, integers preferred where possible
  3. Structure Stability: Same structure → same JSON string
  4. Encoding Stability: UTF-8, no BOM, consistent line endings

Hash Stability:

Hash(canonical(P)) = H_IR

This hash must be stable across:

  • Different compiler versions (if semantics unchanged)
  • Different platforms
  • Different Python versions
  • Different serialization libraries

Violation Examples:

FORBIDDEN: Non-deterministic key ordering
FORBIDDEN: Platform-dependent float formatting
FORBIDDEN: Non-canonical JSON serialization
FORBIDDEN: Including compiler metadata in canonical form

ALLOWED: Alphabetically sorted keys
ALLOWED: Normalized float representation
ALLOWED: Consistent JSON formatting


2. Architecture Overview

Compilation Pipeline

┌─────────┐
│ Source  │
│  Code   │
└────┬────┘
     │
     ▼
┌─────────────────────────────────────────────────────────────┐
│                    FRONTEND (EXTENSION-SAFE)                │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐             │
│  │  Lexer   │───▶│  Parser  │───▶│ Resolver │             │
│  │          │    │          │    │          │             │
│  │ PLUGGABLE│    │ PLUGGABLE│    │ PLUGGABLE│             │
│  └──────────┘    └──────────┘    └──────────┘             │
│                                                              │
│                          │                                   │
│                          ▼                                   │
│                  ┌──────────────┐                            │
│                  │ Type Checker │                            │
│                  │              │                            │
│                  │ EXTENSION-   │                            │
│                  │ SAFE         │                            │
│                  └──────────────┘                            │
└──────────────────────────┬───────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│              MIDDLE-END (SAFE BUT STRICT)                    │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│                  ┌──────────────┐                            │
│                  │   Lowering   │                            │
│                  │              │                            │
│                  │ MUST REMAIN  │                            │
│                  │ DETERMINISTIC│                            │
│                  └──────┬───────┘                            │
│                         │                                     │
│                         ▼                                     │
│                  ┌──────────────┐                            │
│                  │      IR      │                            │
│                  │              │                            │
│                  │   PHYSICS   │                            │
│                  │    LAYER    │                            │
│                  └──────┬───────┘                            │
└─────────────────────────┼─────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│              BACKEND (VERY SENSITIVE)                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────┐    ┌──────────────┐                      │
│  │ Canonicalizer│───▶│   Executor   │                      │
│  │              │    │              │                      │
│  │    FIXED     │    │ MUST REMAIN  │                      │
│  │              │    │ DETERMINISTIC│                      │
│  └──────┬───────┘    └──────┬───────┘                      │
│         │                    │                               │
│         ▼                    ▼                               │
│  ┌──────────────┐    ┌──────────────┐                      │
│  │   Canonical  │    │  Proof Trace │                      │
│  │     JSON     │    │   (TRP v1)   │                      │
│  │              │    │              │                      │
│  │    FIXED     │    │    FIXED     │                      │
│  └──────┬───────┘    └──────┬───────┘                      │
│         │                    │                               │
│         └──────────┬─────────┘                               │
│                    ▼                                         │
│            ┌──────────────┐                                  │
│            │   Hashing    │                                  │
│            │              │                                  │
│            │ HMASTER/     │                                  │
│            │ HRICH        │                                  │
│            │              │                                  │
│            │    FIXED     │                                  │
│            └──────────────┘                                  │
└─────────────────────────────────────────────────────────────┘

Component Classification

Component Classification Rationale
Lexer PLUGGABLE Tokenization is syntax-level; can extend for new keywords/symbols
Parser PLUGGABLE AST construction is syntax-level; can add new AST nodes
Resolver PLUGGABLE Symbol resolution is syntax-level; can extend symbol table
Type Checker EXTENSION-SAFE Type checking must remain deterministic but can add new types
Lowering MUST REMAIN DETERMINISTIC IR generation must preserve semantics deterministically
IR PHYSICS LAYER IR structure defines execution model; changes break proofs
Canonicalizer FIXED Canonical JSON rules cannot change without breaking hashes
Executor MUST REMAIN DETERMINISTIC Execution semantics must remain deterministic
Proof System FIXED TRP structure is frozen; extensions via versioning
Hashing FIXED Hash algorithms and structure are frozen

3. Language Semantics (Formal)

Type System

Primitive Types

RLang defines five primitive types:

  • Int: 64-bit signed integers (Python int, unbounded)
  • Float: IEEE 754 double-precision floating-point (Python float)
  • String: UTF-8 encoded strings (Python str)
  • Bool: Boolean values true / false (Python bool)
  • Unit: Unit type (Python None)

Type Semantics:

Type ::= Int | Float | String | Bool | Unit

Type Equivalence:

Two types T₁ and T₂ are equivalent (T₁ ≡ T₂) if:

  • Both are primitive and have the same name, OR
  • Both are generic with same name and equivalent type arguments

Type Aliases

Type aliases provide semantic meaning:

type UserId = Int;
type Email = String;

Semantics:

type_alias ::= type IDENTIFIER = TypeExpr;

Type aliases are transparent during type checking—they resolve to their underlying types.

Expressions

Literal Expressions

Literal ::= INTEGER | FLOAT | STRING | BOOLEAN

Evaluation:

Eval(42) = 42
Eval(3.14) = 3.14
Eval("hello") = "hello"
Eval(true) = True
Eval(false) = False

Identifier Expressions

Identifier ::= IDENTIFIER

Special Identifiers:

  • __value: Current pipeline value (runtime context)

Evaluation:

Eval(__value, ctx) = ctx.current_value

Binary Operations

BinaryOp ::= Expr OP Expr
OP ::= + | - | * | / | > | < | >= | <= | == | !=

Arithmetic Operations:

Eval(e₁ + e₂, ctx) = Eval(e₁, ctx) + Eval(e₂, ctx)
Eval(e₁ - e₂, ctx) = Eval(e₁, ctx) - Eval(e₂, ctx)
Eval(e₁ * e₂, ctx) = Eval(e₁, ctx) * Eval(e₂, ctx)
Eval(e₁ / e₂, ctx) = Eval(e₁, ctx) / Eval(e₂, ctx)  [if Eval(e₂, ctx) ≠ 0]

Comparison Operations:

Eval(e₁ > e₂, ctx) = Eval(e₁, ctx) > Eval(e₂, ctx)
Eval(e₁ < e₂, ctx) = Eval(e₁, ctx) < Eval(e₂, ctx)
Eval(e₁ >= e₂, ctx) = Eval(e₁, ctx) >= Eval(e₂, ctx)
Eval(e₁ <= e₂, ctx) = Eval(e₁, ctx) <= Eval(e₂, ctx)
Eval(e₁ == e₂, ctx) = Eval(e₁, ctx) == Eval(e₂, ctx)
Eval(e₁ != e₂, ctx) = Eval(e₁, ctx) != Eval(e₂, ctx)

Type Rules:

  • Arithmetic: Int + Int → Int, Float + Float → Float, Int + Float → Float
  • Comparison: T × T → Bool (for comparable types)

Function Calls

Call ::= IDENTIFIER ( Expr₁, ..., Exprₙ )

Evaluation:

Eval(f(e₁, ..., eₙ), ctx) = fn_registry[f](Eval(e₁, ctx), ..., Eval(eₙ, ctx))

Type Rules:

f : T₁ × ... × Tₙ → T
e₁ : T₁, ..., eₙ : Tₙ
─────────────────────────
f(e₁, ..., eₙ) : T

Conditional Expressions (v0.2+)

IfExpr ::= if ( Expr ) { Steps } [ else { Steps } ]

Evaluation:

Eval(if (c) { s₁ } else { s₂ }, ctx) = 
    if Eval(c, ctx) then Eval(s₁, ctx) else Eval(s₂, ctx)

Type Rules:

c : Bool
s₁ : T
s₂ : T
─────────────────────────
if (c) { s₁ } else { s₂ } : T

Determinism Requirement:

The condition c must be a pure expression—no side effects, no randomness, no time-dependent operations.

Pipeline Semantics

Pipeline Definition

Pipeline ::= pipeline IDENTIFIER ( Type ) -> Type { Steps }
Steps ::= Step₁ -> Step₂ -> ... -> Stepₙ

Evaluation:

Eval(pipeline main(T_in) -> T_out { s₁ -> ... -> sₙ }, x) =
    Eval(sₙ, Eval(sₙ₋₁, ..., Eval(s₁, x)...))

Composition:

Eval(s₁ -> s₂, x) = Eval(s₂, Eval(s₁, x))

Step Semantics

Function Step:

Eval(f, x) = fn_registry[f](x)

Conditional Step:

Eval(if (c) { s₁ } else { s₂ }, x) =
    if Eval(c, x) then Eval(s₁, x) else Eval(s₂, x)

Deterministic Requirements

No Randomness

FORBIDDEN:

  • Random number generation
  • Non-deterministic algorithms
  • Probabilistic data structures

No I/O

FORBIDDEN:

  • File system access
  • Network operations
  • Standard input/output
  • Environment variables (except compile-time)

No Time Dependence

FORBIDDEN:

  • Timestamps
  • System time
  • Date/time operations

Fixed Evaluation Order

REQUIRED:

  • Left-to-right evaluation
  • Sequential pipeline execution
  • Deterministic branch selection

4. IR Specification: The Physics Layer

The Intermediate Representation (IR) is the physics layer of RLang. It defines:

  1. What can be executed: Only IR nodes can appear in execution traces
  2. How execution proceeds: IR structure determines execution order
  3. What is provable: Only IR-level operations generate proof records

IR Invariants:

  1. Purity: Every IR node is pure (no side effects)
  2. Determinism: IR evaluation is deterministic
  3. Canonicalizability: Every IR node can be serialized to canonical JSON
  4. Completeness: All semantic constructs must lower to IR

Current IR Node Types (v0.2.2)

IRExpr

Base class for all expressions in IR.

@dataclass(frozen=True)
class IRExpr:
    kind: str  # "literal" | "identifier" | "binary_op" | "call" | "boolean_and" | "boolean_or" | "boolean_not" | "record" | "field_access" | "list"
    # ... fields depend on kind

Kinds:

  1. literal: Literal values
  2. identifier: Variable references (e.g., __value)
  3. binary_op: Binary operations (+, -, *, /, >, <, etc.)
  4. call: Function calls
  5. boolean_and: Boolean AND (&&)
  6. boolean_or: Boolean OR (||)
  7. boolean_not: Boolean NOT (!)
  8. record: Record construction { field1: expr1, ... }
  9. field_access: Field access obj.field
  10. list: List construction [expr1, expr2, ...]

IRIf

Conditional execution node.

@dataclass(frozen=True)
class IRIf:
    condition: IRExpr
    then_steps: list[PipelineStepIR]
    else_steps: list[PipelineStepIR]

Semantics:

  • Condition must evaluate to Bool
  • Both branches must produce same output type
  • Execution is deterministic based on condition value

PipelineStepIR

Single step in a pipeline.

@dataclass(frozen=True)
class PipelineStepIR:
    index: int
    name: str
    template_id: str
    arg_types: list[str]
    input_type: str | None
    output_type: str | None

PipelineIR

Complete pipeline definition.

@dataclass(frozen=True)
class PipelineIR:
    id: str
    name: str
    input_type: str | None
    output_type: str | None
    steps: list[PipelineStepIR | IRIf]

Rules for Adding New IR Nodes

Every new IR node MUST:

  1. Be Pure: No side effects, no hidden state
  2. Be Deterministic: Same inputs → same outputs
  3. Be Canonicalizable: Implement to_dict() with sorted keys
  4. Have Fixed Evaluation Order: No non-deterministic iteration
  5. Preserve Type Information: Include type annotations

Example: Adding IRRecord (v0.3)

@dataclass(frozen=True)
class IRRecord:
    """IR representation of a record construction."""
    fields: dict[str, IRExpr]  # Field name → expression
    
    def to_dict(self) -> dict[str, Any]:
        """Canonical dictionary representation."""
        return {
            "fields": {
                k: v.to_dict() 
                for k, v in sorted(self.fields.items())  # Sorted!
            },
            "kind": "record"
        }

Key Point: Record fields must be sorted alphabetically to ensure canonical representation.


5. Canonicalization Specification

Canonical JSON is the stable serialization format that ensures:

  • Same data structure → same JSON string
  • Same JSON string → same hash
  • Deterministic across platforms and Python versions

Key Ordering Rule

RULE: All dictionary keys must be sorted alphabetically.

Implementation:

def canonical_dumps(obj: Any) -> str:
    return json.dumps(obj, sort_keys=True, separators=(",", ":"), ensure_ascii=False)

Example:

{"b": 2, "a": 1}  '{"a":1,"b":2}'

Why This Matters:

Non-deterministic key ordering breaks hash stability:

# WRONG
{"b": 2, "a": 1}  hash
{"a": 1, "b": 2}  hash  # Different hash!

# CORRECT
{"b": 2, "a": 1}  '{"a":1,"b":2}'  hash
{"a": 1, "b": 2}  '{"a":1,"b":2}'  hash  # Same hash!

Float Normalization Rule

RULE: Floats must be normalized to ensure platform-independent representation.

Implementation:

def _normalize_floats(obj: Any) -> Any:
    if isinstance(obj, float):
        if obj.is_integer():
            return int(obj)  # 3.0 → 3
        return round(obj, 10)  # Round to 10 decimal places
    elif isinstance(obj, dict):
        return {k: _normalize_floats(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [_normalize_floats(item) for item in obj]
    return obj

Whitespace Rule

RULE: Minimal whitespace (compact JSON) unless indentation is explicitly requested.

Implementation:

# Compact (default)
json.dumps(obj, separators=(",", ":"))  # No spaces

# Pretty (for debugging)
json.dumps(obj, indent=2)  # 2-space indentation

Encoding Rule

RULE: UTF-8 encoding, no BOM, consistent line endings.

Implementation:

canonical_json.encode("utf-8")

What Breaks Determinism

FORBIDDEN:

  1. Non-deterministic key ordering
  2. Platform-dependent float representation
  3. Non-canonical JSON serialization
  4. Including metadata in canonical form
  5. Non-deterministic whitespace

REQUIRED:

  1. Alphabetically sorted keys
  2. Normalized floats
  3. Canonical JSON serialization
  4. Pure data structures only
  5. Consistent encoding

6. Execution Semantics

RLang execution is purely functional and deterministic:

  • No mutable state
  • No side effects
  • No I/O operations
  • No randomness

Function Application

Semantics:

Apply(f, x) = fn_registry[f](x)

Requirements:

  1. fn_registry[f] must be a pure function
  2. No side effects allowed
  3. Deterministic output for same input

Step Execution

Sequential Execution:

Execute([s₁, ..., sₙ], x₀) =
    let x₁ = Execute(s₁, x₀) in
    let x₂ = Execute(s₂, x₁) in
    ...
    let xₙ = Execute(sₙ, xₙ₋₁) in
    xₙ

Trace Recording:

Each step execution produces a StepExecutionRecord:

StepExecutionRecord(
    index=i,
    step_name=name,
    template_id=template_id,
    input_snapshot=xᵢ,
    output_snapshot=xᵢ₊₁
)

Conditional Execution

Branch Selection:

Execute(IRIf(condition=c, then_steps=t, else_steps=e), x) =
    if Eval(c, x) then
        Execute(t, x)
    else
        Execute(e, x)

Branch Recording:

Each conditional execution produces a BranchExecutionRecord:

BranchExecutionRecord(
    index=i,
    path="then" | "else",
    condition_value=bool
)

Determinism:

Same condition value → same branch path → same execution trace.


7. Proof System Architecture

TRP v1 (Current)

TRP (Trace of Reasoning Process) is the execution trace format.

Structure

PipelineProofBundle(
    version: str,
    language: str,
    entry_pipeline: str | None,
    program_ir: PrimaryProgramIR,
    input_value: Any,
    output_value: Any,
    steps: List[StepExecutionRecord],
    branches: List[BranchExecutionRecord]
)

Step Records

StepExecutionRecord(
    index: int,           # 0-based step index
    step_name: str,        # Function name
    template_id: str,      # Template reference
    input_snapshot: Any,   # Input value
    output_snapshot: Any   # Output value
)

Branch Records

BranchExecutionRecord(
    index: int,           # IF step index
    path: str,            # "then" | "else"
    condition_value: bool  # Condition evaluation result
)

Hashing Model

HMASTER

Definition:

HMASTER = Hash(canonical(program_ir))

Computation:

def compute_HMASTER(program_ir: PrimaryProgramIR) -> str:
    canonical_json = program_ir.to_json()
    return hashlib.sha256(canonical_json.encode("utf-8")).hexdigest()

Invariant:

Same program IR → same HMASTER.

HRICH

Definition:

HRICH = Hash(canonical(proof_bundle))

Computation:

def compute_HRICH(proof_bundle: PipelineProofBundle) -> str:
    # Convert to rich bundle format
    rich_bundle = {
        "primary": {
            "master": HMASTER,
            "steps": [step.to_dict() for step in proof_bundle.steps],
            "branches": [branch.to_dict() for branch in proof_bundle.branches]
        },
        "H_RICH": None  # Computed below
    }
    
    # Compute subproof hashes
    subproof_hashes = compute_subproof_hashes(subproofs)
    
    # Compute HRICH from subproof hashes
    HRICH = compute_HRICH_from_subproof_hashes(subproof_hashes)
    
    return HRICH

Subproof Hashes:

subproof_hashes = {
    "DIP": Hash(DIP_subproof),
    "DP": Hash(DP_subproof),
    "PEP": Hash(PEP_subproof),
    "PoPI": Hash(PoPI_subproof),
    "CCP": Hash(CCP_subproof),
    "CMIP": Hash(CMIP_subproof),
    "PP": Hash(PP_subproof),
    "TRP": Hash(TRP_subproof)
}

HRICH Computation:

HRICH = SHA256(
    sorted(subproof_hashes.values()).join("|")
)

Invariant:

Same execution trace → same HRICH.


8. The Untouchable Core (Frozen Physics)

These components MUST NEVER BE MODIFIED without breaking determinism guarantees:

Component Frozen? Why?
Canonical JSON Rules YES Breaks HMASTER stability
Hash Algorithms YES Breaks verification
TRP Structure Rules YES Breaks proof compatibility
Branch Decision Semantics YES Breaks determinism
Deterministic Data Structures YES Breaks execution determinism
No Non-Deterministic Iteration YES Breaks execution determinism
No Mutation in IR YES Breaks purity

Partially Frozen Components

These components can be extended but must preserve determinism:

Component Frozen? Why?
AST → IR Lowering PARTIAL Must remain deterministic
Type System PARTIAL Can add types, but rules must be deterministic
Executor PARTIAL Semantics must remain deterministic
Parser NO Extensions allowed (new syntax)
Resolver NO Extensions allowed (new symbols)

Modification Rules

Canonical JSON

NEVER CHANGE:

  • Key sorting algorithm
  • Float normalization rules
  • JSON encoding (UTF-8)
  • Whitespace rules

ALLOWED:

  • Adding new fields to existing structures (if canonicalized correctly)

Hash Algorithms

NEVER CHANGE:

  • SHA-256 algorithm
  • Hash computation order
  • Subproof hash structure

ALLOWED:

  • Adding new hash types (with new names)
  • Extending hash inputs (additive only)

TRP Structure

NEVER CHANGE:

  • Step record structure (v1)
  • Branch record structure (v1)
  • Record field names

ALLOWED:

  • Adding new record types (TRP v2)
  • Extending existing records (additive fields)

9. Expandable Surfaces (Safe to Extend)

Frontend Extensions

Lexer

Safe to Add:

  • New keywords
  • New operators
  • New literal types
  • New comment styles

Parser

Safe to Add:

  • New AST nodes
  • New expression forms
  • New statement types

Resolver

Safe to Add:

  • New symbol kinds
  • New scoping rules
  • New name resolution strategies

Middle-End Extensions

Type System

Safe to Add:

  • New primitive types
  • New generic types
  • New type constructors

Lowering

Safe to Add:

  • New AST → IR lowering rules
  • New IR node types (following IR invariants)

Backend Extensions

Executor

Safe to Add:

  • New execution strategies
  • New optimization passes
  • New proof recording formats

Proof System

Safe to Add:

  • New proof record types (TRP v2)
  • New subproof types
  • New verification strategies

Extension Guidelines

Before Adding:

  1. Verify determinism (same input → same output)
  2. Verify canonicalizability (can serialize to JSON)
  3. Verify purity (no side effects)
  4. Add tests (determinism tests required)
  5. Update documentation

After Adding:

  1. Run full test suite
  2. Verify hash stability
  3. Update golden files
  4. Document extension

10. Quick Start

Installation

pip install rlang-compiler

Basic Usage

Example 1: Simple Pipeline

fn inc(x: Int) -> Int;

pipeline main(Int) -> Int { inc }

Compile:

rlangc examples/simple.rlang --out out/simple.json

Example 2: Conditional Execution

fn double(x: Int) -> Int;
fn half(x: Int) -> Int;

pipeline main(Int) -> Int {
  if (__value > 10) {
    double
  } else {
    half
  }
}

Example 3: Proof Generation

from rlang.bor import run_program_with_proof, RLangBoRCrypto

source = """
fn inc(x: Int) -> Int;
pipeline main(Int) -> Int { inc }
"""

bundle = run_program_with_proof(
    source=source,
    input_value=10,
    fn_registry={"inc": lambda x: x + 1}
)

crypto = RLangBoRCrypto(bundle)
rich = crypto.to_rich_bundle()

print("HMASTER:", rich.rich["primary"]["master"])
print("HRICH:", rich.rich["H_RICH"])

Verification

# Generate proof bundle
python verify_proof_bundle.py

# Verify with BoR CLI
borp verify-bundle --bundle out/rich_proof_bundle.json

11. API Reference

Core Compiler API

from rlang import compile_source_to_ir, compile_source_to_json

# Compile to IR
result = compile_source_to_ir(
    source="fn inc(x: Int) -> Int; pipeline main(Int) -> Int { inc }",
    version="v0",
    language="rlang"
)

# Compile to JSON
json_str = compile_source_to_json(
    source="fn inc(x: Int) -> Int; pipeline main(Int) -> Int { inc }"
)

Proof Generation API

from rlang.bor import run_program_with_proof, RLangBoRCrypto

# Generate proof bundle
bundle = run_program_with_proof(
    source=source,
    input_value=10,
    fn_registry={"inc": lambda x: x + 1}
)

# Convert to rich bundle
crypto = RLangBoRCrypto(bundle)
rich_bundle = crypto.to_rich_bundle()

CLI Usage

# Compile to stdout
rlangc program.rlang

# Compile to file
rlangc program.rlang --out output.json

# Specify entry pipeline
rlangc program.rlang --entry main --out output.json

12. Testing & Verification

Test Suite

The compiler includes 190+ tests covering:

  • Lexer (tokenization, comments, floats)
  • Parser (AST construction, operator precedence)
  • Type Checker (type inference, type aliases, control flow)
  • IR (lowering, primary IR construction)
  • Emitter (end-to-end compilation)
  • CLI (command-line interface)
  • BoR Integration (proof generation, crypto hashing, CLI compatibility)
  • Determinism (SHA256 comparison, tamper detection)

Running Tests

# Run all tests
pytest -q --disable-warnings

# Run specific test file
pytest tests/test_parser.py -v

# Run with coverage
pytest --cov=rlang

Determinism Verification

# Run deterministic test suite
./next_tests.sh

# Verify proof bundles
./verify_bundle.sh

Release Audit

# Run comprehensive release audit
./scripts/run_release_audit.sh

The audit checks:

  • Environment reset
  • Static code consistency
  • Full test suite
  • Determinism tests
  • Golden file verification
  • Canonical JSON boundary audit
  • IR shape stability
  • TRP audit
  • Hash boundary tests
  • CLI verification
  • Packaging readiness

13. Extension Guidelines

For detailed extension guidelines, see docs/compiler_expansion_playbook.md.

Quick Checklist

When adding a new feature:

  1. Update grammar in docs/compiler_physics.md
  2. Add lexer tokens
  3. Add parser AST nodes
  4. Add resolver logic
  5. Add type checking rules
  6. Add IR node (if needed)
  7. Add lowering rules
  8. Add execution logic
  9. Verify canonicalization
  10. Add proof recording (if needed)
  11. Add comprehensive tests
  12. Update golden files
  13. Update documentation

Test Matrix

For each new construct:

  • Parser tests (basic, nested, empty, invalid, edge cases)
  • Typechecker tests (valid, invalid, inference, nested, edge cases)
  • Lowering tests (basic, nested, deterministic, edge cases)
  • IR tests (structure, canonical, deterministic, edge cases)
  • Executor tests (basic, proof, deterministic, edge cases)
  • Determinism tests (IR, H_IR, TRP, HRICH, cross-platform)
  • Canonical JSON tests (sorted keys, float normalization, stable representation)
  • Proof stability tests (branching, loops, collections, pattern matching)

References


Status

Compiler: Fully functional (190+ tests passing)
Control Flow: Deterministic if/else in pipelines with type-checked branches
Proof Generation: Complete and deterministic, including branch-aware TRP subproofs
BoR Integration: Verified with borp verify-bundle
Determinism: Bit-for-bit reproducible including branch traces
Security: Tamper detection working for both steps and branches
Version: 0.2.2 (published to PyPI)


License: MIT License
Author: Kushagra Bhatnagar
Last Updated: November 2025

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rlang_compiler-0.2.3.tar.gz (177.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rlang_compiler-0.2.3-py3-none-any.whl (71.9 kB view details)

Uploaded Python 3

File details

Details for the file rlang_compiler-0.2.3.tar.gz.

File metadata

  • Download URL: rlang_compiler-0.2.3.tar.gz
  • Upload date:
  • Size: 177.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for rlang_compiler-0.2.3.tar.gz
Algorithm Hash digest
SHA256 cd5c2ae47daa9111732c43f14e22c2b45bdede22d5546e5ff7afa08dce66f57a
MD5 48f2465f18dcd93af97dbc87abb47e36
BLAKE2b-256 2f20c6a5d2c2d0a557ef826df2b47dca499cac68716ecedc27bde011558e1771

See more details on using hashes here.

File details

Details for the file rlang_compiler-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: rlang_compiler-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 71.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for rlang_compiler-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2db8f0e97d9400b7252fd70ddadb111999b717fd5b0561573b2d561f4592639f
MD5 248fb7128fbf3dca172e71eeb73379e5
BLAKE2b-256 0a10b9732b65f4ace8a2354f1b26391f457e756e938db369698394f3fcd02ea9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page