Skip to main content

Deterministic semantic compiler core: SIR → RLang → ProofBundle with stable hashes.

Project description

Semantic Compiler

A compiler that bridges natural language descriptions to verifiable code through a semantic intermediate representation (SIR).

SIR v0.1 Scope and Version Boundary

This section defines the current stable scope of SIR v0.1, clarifies what is intentionally out-of-scope, explains the compiler maturity level, and outlines the roadmap for future versions.

Part A — What SIR v0.1 Supports (Current Stable Scope)

SIR v0.1 provides a formally defined, deterministic compilation pipeline for scalar integer decision rules. The current stable scope includes:

Input Semantics:

  • Single scalar integer input (Int)
  • Pipeline signature: pipeline main(Int) -> Int

Conditional Logic:

  • Comparison operators: gt, ge, lt, le, eq, neq on scalar input
  • Boolean combiners: all(), any(), not() with nested combinations
  • Pure predicate expressions operating on the scalar input value

Output Semantics:

  • Scalar constant integer outputs
  • Deterministic value assignment via SetOutput steps

Deterministic Pipeline Guarantees: The compilation pipeline enforces strict determinism at every stage:

Natural Language → SIR (scalar) → RLang → IR → Runtime → Proof → HMASTER

For any valid SIR v0.1 pipeline and input value, the following invariants hold:

  • Same input → same SIR representation
  • Same SIR → same RLang source code (byte-for-byte identical)
  • Same RLang → same canonical IR
  • Same IR → same execution trace (TRP)
  • Same execution → same proof bundle (HMASTER, HRICH)

Everything within this scope is fully supported, deterministic, and proof-producing. The compiler guarantees cryptographic verification compatibility with the RLang backend.

Part B — What is Out-of-Scope for SIR v0.1

The following language features are intentionally unimplemented in SIR v0.1. These are not bugs, but planned extensions for future versions:

Record Types and Multi-Field Inputs:

  • Record types (e.g., {income: Int, age: Int})
  • Field access operations (e.g., transaction.amount, user.age)
  • Multi-field decision rules (e.g., "if income > 50000 and age < 30")

Output Type Limitations:

  • String or enum outputs
  • Record-type outputs
  • Negative literal outputs (e.g., -1, -50)

Control Flow Limitations:

  • Complex control flow beyond scalar Boolean guards
  • Loops, pattern matching, or other advanced constructs

Error Messages Explained:

When the compiler encounters out-of-scope constructs, it produces clear error messages:

  • "Attempted field access on non-record type Int" — Occurs when SIR contains field references (e.g., "income", "age") but the pipeline input type is Int rather than a record type. This is correct behavior: SIR v0.1 enforces scalar-only semantics.

  • "ParseError: Expected '(' ..." — Occurs when SIR contains unsupported operators or syntax that cannot be lowered to RLang v0.2.x. The compiler correctly rejects constructs outside the v0.1 boundary.

These errors indicate that the compiler is enforcing version boundaries correctly, not that the system is broken.

Part C — Compiler Maturity Level

The SIR v0.1 compiler is production-ready within its defined scope. The following guarantees are verified and stable:

Determinism:

  • Compilation is deterministic: identical SIR produces identical RLang source code
  • Execution is deterministic: identical inputs produce identical outputs
  • Proof generation is deterministic: identical executions produce identical proof bundles

IR Canonicalization:

  • IR canonicalization is stable and produces byte-for-byte identical JSON for identical programs
  • Hash stability is guaranteed: same program → same HMASTER hash

TRP Logging:

  • Trace of Reasoning Process (TRP) recording is stable and deterministic
  • Branch-level and step-level traces are complete and canonical

Proof System:

  • Proof bundles (HMASTER, H_IR, HRICH) are correct and match runtime execution
  • Cryptographic verification is compatible with BoR (Blockchain of Reasoning) standards

Test Coverage:

  • Determinism tests pass: same input → same output across multiple runs
  • Hash stability tests pass: same program → same HMASTER across compiler versions
  • Integration tests verify end-to-end pipeline correctness

Version Stack:

SIR v0.1 → RLang v0.2.x → IR v0.2.x → BoR Proof → HMASTER

The current version behaves like a formally defined v0.1 language with stable semantics, deterministic compilation, and cryptographic proof generation.

Part D — Roadmap to SIR v0.2 (Records)

SIR v0.2 will extend the language with record types and multi-field decision rules. This is a natural semantic extension, not a patch or workaround.

Planned Extensions:

SIR v0.2:

  • Record input types: {field1: Type1, field2: Type2, ...}
  • Field access in predicates: {"op": "gt", "args": ["income", 50000]}
  • Multi-field Boolean combinations: {"combiner": "all", "terms": [pred1, pred2]} where predicates reference different fields

RLang v0.3:

  • Record type declarations: Record { income: Int, age: Int }
  • Field access expressions: __value.income, __value.age
  • Record literal construction for outputs

Pipeline Lowering:

  • SIR→RLang lowering will become multi-field aware
  • Field references will be correctly mapped to RLang record access
  • Type inference will propagate record types through the pipeline

Semantic Boundary: The semantic boundary between SIR v0.1 and v0.2 is clean: v0.1 handles scalar-only semantics, v0.2 adds structured data. This extension preserves all v0.1 guarantees while enabling richer decision rules.

Multi-field rules like "if income > 50000 and age < 30 then return 'young-high' else return 'other'" will become valid SIR v0.2 programs that compile to RLang v0.3 and produce deterministic proof bundles.

Part E — Negative Literal Naming Fix (Future Enhancement)

Negative literal outputs (e.g., -1, -50) currently fail during RLang code generation due to function name generation constraints.

Root Cause: The RLang emitter generates function names using the pattern ret_<value>. For negative values, this produces invalid identifiers like ret_-1, which violates RLang syntax rules (function names cannot contain hyphens in this position).

Planned Fix: The RLang emitter will be updated to map negative literals to valid function names:

  • -1ret_neg_1
  • -50ret_neg_50
  • Positive values remain unchanged: 1ret_1

This is a pure naming fix in the code generation layer. It does not affect SIR semantics, IR structure, or proof generation. The fix will be implemented in the SIR→RLang lowering pipeline without breaking existing functionality.

Part F — Semantic Boundary Diagram

The semantic boundary defines where natural language descriptions are converted into structured, verifiable code:

┌─────────────────────────────────────────────────────────────┐
│                    Natural Language                         │
│  "If value > 10 then return 1 else return 0"              │
└───────────────────────┬─────────────────────────────────────┘
                        │ LLM Bridge
                        ↓
┌─────────────────────────────────────────────────────────────┐
│              SIR v0.1 (Semantic Boundary)                  │
│  • Scalar Int input                                         │
│  • Comparison operators (gt, lt, ge, le, eq, neq)          │
│  • Boolean combiners (all, any, not)                       │
│  • Scalar Int output                                        │
└───────────────────────┬─────────────────────────────────────┘
                        │ Deterministic Compilation
                        ↓
┌─────────────────────────────────────────────────────────────┐
│                    RLang v0.2.x                             │
│  • Canonical source code                                    │
│  • Deterministic formatting                                 │
└───────────────────────┬─────────────────────────────────────┘
                        │ RLang Compiler (PyPI)
                        ↓
┌─────────────────────────────────────────────────────────────┐
│                    IR v0.2.x                                │
│  • Canonical intermediate representation                    │
│  • Hash-stable serialization                                │
└───────────────────────┬─────────────────────────────────────┘
                        │ Runtime Execution
                        ↓
┌─────────────────────────────────────────────────────────────┐
│                    Proof Bundle                             │
│  • HMASTER (program IR hash)                               │
│  • HRICH (execution trace hash)                            │
│  • TRP (Trace of Reasoning Process)                        │
└─────────────────────────────────────────────────────────────┘

The semantic boundary (SIR v0.1) enforces strict type safety and deterministic semantics. Natural language that describes constructs outside this boundary (e.g., multi-field rules, string outputs) will be rejected with clear error messages until SIR v0.2 extends the boundary.

Architecture

TEXT (LLM output)
      ↓
  SIR (Semantic IR)  <-- validated, structured, semantic JSON/dict
      ↓
  RLang source       <-- compiled deterministically
      ↓
RLang Compiler (PyPI) <-- already published
      ↓
   IR + Proof

Installation

Installing from PyPI

pip install semantic-compiler-core

Development Installation

  1. Install dependencies:
pip install -e .
  1. Create a .env file based on .env.example:
cp .env.example .env
  1. Add your OpenAI API key to .env:
OPENAI_API_KEY="your-actual-key-here"

Usage

Natural Language → SIR (requires LLM)

from semantic_compiler.llm_bridge import text_to_sir

# Convert natural language to SIR
sir = text_to_sir("If the user's age is greater than 18, approve the request.")

Quick Start (Deterministic Core)

The deterministic core library (semantic-compiler-core) provides a clean API for SIR → RLang → ProofBundle compilation without any LLM dependencies:

from semantic_compiler_core import compile_sir_to_proof

sir = {
    "type": "DecisionPipeline",
    "name": "main",
    "input_name": "value",
    "steps": [
        {
            "type": "Decision",
            "condition": {"op": "gt", "args": ["value", 10]},
            "then_steps": [{"type": "SetOutput", "value": 1}],
            "else_steps": [{"type": "SetOutput", "value": 0}],
        }
    ],
}

bundle = compile_sir_to_proof(sir, input_value=10)
print(bundle["hashes"]["HMASTER"])

This uses ONLY the deterministic path: SIR → RLang → ProofBundle → hashes.

No LLM calls are made in this flow.

Project Structure

  • semantic_compiler/ - Main package
    • init_llm.py - LLM initialization using langchain-openai
    • llm_bridge.py - Natural language → SIR conversion
    • sir_model.py - SIR dataclasses (to be implemented)
    • sir_validator.py - SIR validation (to be implemented)
    • sir_to_rlang.py - SIR → RLang compiler (to be implemented)
    • rlang_runtime.py - Wrapper around PyPI rlang-compiler
  • examples/ - Example inputs and outputs
  • tests/ - Test suite

Development Status

This is the initial project skeleton. Next steps:

  • Implement full SIR v0.1 dataclasses
  • Complete SIR validation
  • Implement SIR → RLang compiler

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_compiler_core-0.1.0.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_compiler_core-0.1.0-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file semantic_compiler_core-0.1.0.tar.gz.

File metadata

  • Download URL: semantic_compiler_core-0.1.0.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for semantic_compiler_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d9e8cfaa6804d118c7b7c0bc6477893c0216c780f582049dbb2c02b82fd980d8
MD5 53ef2f9e50184f2334f9cc40d7e53e78
BLAKE2b-256 3c7448e3936ae9acf12f8fd4856a6471e685c981d4cd429cf24f0b4e574f1645

See more details on using hashes here.

File details

Details for the file semantic_compiler_core-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for semantic_compiler_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f4caebaf1e6dea995f2f0d7468674e1e2467ce05ab902e1e45ad7d30805e57b
MD5 f61d538e64b306aa987b23484b69b189
BLAKE2b-256 a922e1d50e9371a8a697baedb29494c599670db126c29853b91caa8fe21b16b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page