Skip to main content

Deterministic VM for LLM program execution

Project description

CI PyPI Python License

Execution governance runtime with deterministic receipts and replayable traces.
LLMs are signal generators. Execution authority belongs to the runtime.


What nano-vm produces

Most AI frameworks answer: how do you coordinate agents?
Almost nobody answers: what proof do you have that execution happened correctly?

nano-vm answers the second question by producing verifiable artifacts at every layer:

Artifact What it is How it's produced
Trace Step-by-step execution record with SHA-256 Merkle chain Runtime — one entry per step
ExecutionReceipt Minimal decision state derived from Trace TraceAnalyzer.receipt() — deterministic projection
GovernanceEnvelope Per-step policy hash + canonical state snapshot Runtime — stored in SQLite WAL

The core contract:

Receipt = f(Trace)

The receipt is a deterministic, recomputable projection of the trace. It never contains information that isn't already in the trace. It never requires the LLM to generate it.

Program
  ↓
Validator        ← static analysis before execution
  ↓
ExecutionVM      ← FSM transition authority
  ↓
Trace            ← authoritative execution record
  ↓
TraceAnalyzer    ← post-hoc interpretation
  ↓
Receipt          ← minimal state for continuation decisions

Install

pip install llm-nano-vm
pip install llm-nano-vm[litellm]   # for LLM provider support

Quick Start — Tool Pipeline

LLMs are not required. nano-vm runs as a pure deterministic workflow engine.

from nano_vm import ExecutionVM, Program

program = Program.from_dict({
    "name": "payment_flow",
    "steps": [
        {"id": "reserve",  "type": "tool", "tool": "reserve_funds"},
        {"id": "capture",  "type": "tool", "tool": "capture_payment"},
        {"id": "receipt",  "type": "tool", "tool": "send_receipt"},
    ]
})

vm = ExecutionVM(tools={
    "reserve_funds":   reserve_funds,
    "capture_payment": capture_payment,
    "send_receipt":    send_receipt,
})

trace = await vm.run(program)
print(trace.status)  # SUCCESS

The runtime still guarantees: deterministic ordering, replayable execution, trace visibility, transition enforcement, idempotent re-execution across restarts.


Quick Start — LLM Pipeline

from nano_vm import ExecutionVM, Program
from nano_vm.adapters import LiteLLMAdapter

program = Program.from_dict({
    "name": "customer_refund",
    "steps": [
        {
            "id": "analyze",
            "type": "llm",
            "prompt": "Is this a valid refund request? Reply 'yes' or 'no'.\nRequest: $user_input",
            "output_key": "decision",
            "allowed_outputs": ["yes", "no"],   # runtime enum gate — not a prompt hint
        },
        {
            "id": "guardrail",
            "type": "condition",
            "condition": "'yes' in '$decision'",
            "then": "process_refund",
            "otherwise": "reject",
        },
        {"id": "process_refund", "type": "tool", "tool": "issue_refund",    "is_terminal": True},
        {"id": "reject",         "type": "tool", "tool": "send_rejection",  "is_terminal": True},
    ],
})

vm = ExecutionVM(
    llm=LiteLLMAdapter("openai/gpt-4o-mini"),
    tools={"issue_refund": ..., "send_rejection": ...},
)

trace = await vm.run(program, context={"user_input": "I was charged twice"})
print(trace.status)           # SUCCESS
print(trace.total_cost_usd()) # e.g. 0.000034

The guardrail step cannot be skipped, reordered, or overridden by the model.


Suspend / Resume

Return "PENDING" from any tool to suspend execution:

async def initiate_payment(**kwargs) -> str:
    await register_webhook(kwargs["order_id"])
    return "PENDING"   # FSM → SUSPENDED, cursor persisted

FSM transition: RUNNING → SUSPENDED → RUNNING → SUCCESS

This enables: payment settlement, courier confirmation, approval systems, human-in-the-loop, webhook orchestration. The process can restart. The cursor survives.

trace = await vm.run(program, context={"order_id": "123"})
assert trace.status.name == "SUSPENDED"

trace = await vm.resume_with_program(
    program=program,
    trace_id=trace.trace_id,
    webhook_event={"type": "payment.confirmed", "order_id": "123"},
)
assert trace.status.name == "SUCCESS"

Note: "PENDING" is a reserved FSM sentinel. Use "REQUIRES_ACTION" or "AWAITING_3DS" for domain-specific states.


LLM Output Enforcement — allowed_outputs

Validates the model's raw output against an explicit enum before it enters the FSM context. This is a runtime gate, not a prompt hint.

{
    "id": "classify",
    "type": "llm",
    "prompt": "Classify the request. Reply ONLY with: refund / query / other",
    "output_key": "category",
    "allowed_outputs": ["refund", "query", "other"],
    "on_error": "skip",   # output → "refund" (first element) on mismatch
}
on_error On mismatch
fail (default) VMErrortrace.FAILED
skip output replaced with allowed_outputs[0]
retry retry up to max_retries; VMError if exhausted

Per-step LLM timeout:

{
    "id": "classify",
    "type": "llm",
    "timeout_seconds": 10.0,
    "on_timeout": "fail",   # or "fallback" → allowed_outputs[0] or ''
}

ProgramValidator — Pre-flight Static Analysis

Validates a program before execution. Catches structural issues that would cause runtime failures.

from nano_vm import ProgramValidator

validator = ProgramValidator(program)
report = validator.validate()

print(report.is_valid())   # False if any ERROR-severity issue found
for issue in report.issues:
    print(issue.severity, issue.code, issue.message)

Checks performed:

Check Severity Description
missing_targets ERROR branch targets that reference non-existent steps
unreachable_steps ERROR steps unreachable from any execution path
cycle_detection ERROR cycles that would cause infinite loops
no_failure_terminal WARNING no reachable terminal step with a failure outcome

is_valid() returns True only when no ERROR-severity issues exist. WARNING is informational — simple linear programs without failure terminals are valid.


TraceAnalyzer — Post-hoc Execution Analysis

Analyzes a completed trace. Pure post-processing — no changes to the runtime state.

from nano_vm import TraceAnalyzer

analyzer = TraceAnalyzer(trace)
report = analyzer.report()     # TraceHealthReport — lazy, cached

print(report.rollback_density)          # 0.0 – 1.0
print(report.tool_churn_rate)           # 0.0 – 1.0
print(report.path_variance)             # 0.0 – 1.0
print(report.transition_entropy)        # bits
print(report.invariant_violation_rate)  # 0.0 – 1.0

Alert thresholds (informational — do not interrupt FSM):

Metric Alert threshold Signal
rollback_density > 0.3 excessive compensating transitions
tool_churn_rate > 0.4 unstable tool selection
path_variance > 0.5 non-deterministic branching
transition_entropy > 2.5 bits ~5+ unique transition pairs
invariant_violation_rate > 0.2 repeated constraint failures

Alerts are warnings, not errors. The FSM is never interrupted by the analyzer.


ExecutionReceipt — Receipt = f(Trace)

The receipt is a deterministic projection of the trace. It contains the minimal state needed for a continuation decision — whether to resume, replay, or escalate.

receipt = analyzer.receipt()   # lazy, cached; recomputable at any time

print(receipt.trace_id)
print(receipt.final_status)         # TraceStatus
print(receipt.resumable)            # bool
print(receipt.replayable)           # bool
print(receipt.failed_steps)         # int
print(receipt.retried_steps)        # int
print(receipt.rejected_transitions) # tuple[RejectedTransition, ...]
print(receipt.health)               # TraceHealthReport

RejectedTransition captures each failed step with its reason and timestamp:

for rt in receipt.rejected_transitions:
    print(rt.step_id, rt.reason, rt.timestamp)

Contract invariants:

  • Receipt ⊆ Trace — the receipt never contains information outside the trace
  • No LLM generation — the receipt is computed, never summarized
  • No operator-specific fields — the receipt is infrastructure, not application logic
  • Recomputable — given the same trace, receipt() always returns the same result

Evaluator Blindness by Design

Benchmark research across 10 frontier models found that every model changes behavior when it detects it's being monitored. The best model scored 84% on "acts the same whether watched or not."

The ProjectionLayer gives the LLM only a target=LLM projection of state. Governance metadata — GovernanceEnvelope, canonical_hash, policy_hash, retry counters — never reaches the prompt. The model cannot observe its own audit trail.

Evaluator blindness is structural, not configured.


FSM Transition Model

Current state Event Next state
RUNNING tool success RUNNING
RUNNING tool returns "PENDING" SUSPENDED
RUNNING tool error (on_error=fail) FAILED
RUNNING tool error (on_error=skip) RUNNING (output=None)
RUNNING condition branch taken RUNNING (jump to then/otherwise)
RUNNING max_steps / max_tokens exceeded BUDGET_EXCEEDED
RUNNING max_stalled_steps exceeded STALLED
RUNNING no more steps SUCCESS
SUSPENDED resume_with_program() called RUNNING (from cursor)
terminal absorbing (no further transitions)

Terminal states: SUCCESS, FAILED, BUDGET_EXCEEDED, STALLED.

Failure states are first-class outcomes. FAILED, NEED_HELP, INSUFFICIENT_DATA, POLICY_BLOCKED are legitimate terminal states equivalent to SUCCESS — not exceptions to be swallowed.


Program DSL

Four step types:

Type Purpose
llm call the model; result stored in output_key
tool call a Python function; return "PENDING" to suspend
condition branch on an expression; then / otherwise
parallel run independent sub-steps concurrently via asyncio.gather

Step fields:

Field Default Description
on_error fail fail · skip · retry
max_retries 3 total attempts; backoff: 1s, 2s, 4s… cap 30s
max_concurrency None parallel blocks only
is_terminal False return SUCCESS after this step (leaf nodes)
next_step None jump to named step instead of returning SUCCESS
allowed_outputs None LLM-only — accepted output enum; ValidationError if empty
timeout_seconds None LLM-only — asyncio.wait_for timeout in seconds
on_timeout 'fail' 'fail' · 'fallback' (→ allowed_outputs[0] or '')

Program budget options:

Option Default Description
max_steps None BUDGET_EXCEEDED if exceeded
max_stalled_steps None STALLED after N consecutive no-op fingerprints
max_tokens None BUDGET_EXCEEDED when total tokens exceed limit

Variable interpolation

Syntax Resolves to
$key value from initial context (typed — int/dict/list preserved)
$step_id.output output of a previous step
$step_id.output.field field within a step's dict output

Condition expressions

⚠ ASTEngine replaces eval(). Conditions are parsed into a validated JSON AST and evaluated by a pure, sandboxed interpreter. No Python builtins are accessible.

Supported: ==, !=, >, <, in, not in, and, or, not, contains, dotted-path $var.field.

Not supported: method calls (.lower(), .strip()), arithmetic, parentheses grouping. Using an unsupported form raises ASTEvalError at parse time.

# ❌ WRONG — method call raises ASTEvalError
{"condition": "'yes' in '$decision'.lower()"}

# ✅ CORRECT
{"condition": "'yes' in '$decision'"}

MCP Integration

nano-vm pairs with nano-vm-mcp — an MCP gateway that exposes run_program, get_trace, list_programs, get_program, delete_program over stdio or SSE transport with bearer auth, SQLite WAL persistence, and GovernanceEnvelope audit trail.

Claude Code / MCP Client
        ↓
  nano-vm-mcp              ← decides how execution is allowed to proceed
        ↓
  ExecutionVM              ← transition authority
        ↓
  GovernanceEnvelope       ← proves it happened

GovernanceEnvelope

Each successful execution step produces a GovernanceEnvelope stored in SQLite WAL:

Field Description
execution_id Session / trace identifier
step_id Step index within the execution
policy_hash SHA-256 of the active PolicySnapshot
canonical_snapshot_hash Merkle/delta hash of CanonicalState
payload Projected (sanitized) step output

CapabilityRef and GDPR Tombstoning

Sensitive values are stored as CapabilityRef tokens (vault://secret/<id>). On a GDPR erasure event, the ref is tombstoned. All subsequent projections return [REDACTED_TOMBSTONE], preserving the hash chain. Forensic auditability survives erasure.


Observability

trace.trace_id              # UUID4 — stable for OTel propagation
trace.status                # TraceStatus.SUCCESS | FAILED | SUSPENDED | BUDGET_EXCEEDED | STALLED
trace.final_output
trace.total_tokens()        # O(1) incremental accumulator
trace.total_cost_usd()      # requires LiteLLMAdapter
trace.state_snapshots       # list[(step_index, sha256_hex)]

for step in trace.steps:
    print(step.step_id, step.status, step.duration_ms, step.usage)

Testing — Deterministic by Design

from nano_vm import ExecutionVM, Program, TraceStatus
from nano_vm.adapters import MockLLMAdapter

vm = ExecutionVM(llm=MockLLMAdapter("yes"))   # always returns "yes"

# Per-call sequence
vm = ExecutionVM(llm=MockLLMAdapter(["SAFE", "yes"]))

# Per-prompt substring mapping
vm = ExecutionVM(llm=MockLLMAdapter({
    "Classify": "SAFE",
    "eligible": "yes",
    "__default__": "ok",
}))

trace = await vm.run(program, context={"user_input": "refund"})
assert trace.status == TraceStatus.SUCCESS

Same input → same step sequence. No API key required.

State Determinism vs Semantic Determinism: nano-vm guarantees step execution order, no skipping, reproducible trace structure — regardless of LLM output. It does not guarantee that LLM text is identical across runs. Use MockLLMAdapter for both.


Performance

Suite Result
FSM invariant suite 13/13 · 1,020,000 ops · 0 violations
Integration suite 10/10 · 1,096,500 ops · 0 violations
10k stress 14,327 graphs/sec · 0.70 s/run
MoMo PoC v4 9/9 PASS
Stripe PoC v1 9/9 PASS
ID Scenario Mean TPS p95 avg
BM-INT-01 Refund pipeline 2,300/s 0.66 ms
BM-INT-02 Double-execution guard 2,400/s 0.67 ms
BM-INT-03 Budget enforcement 1,100/s 331 ms
BM-INT-04 Parallel throughput 436/s 542 ms
BM-INT-05 MCP store round-trip 3,000/s 0.42 ms
BM-INT-06 GovernanceEnvelope 1,300/s 171 ms
BM-INT-07 Crash consistency 7/s 233 ms
BM-INT-08 Replay equivalence 1,300/s 1.30 ms
BM-INT-09 Adversarial retries 2,400/s 0.64 ms
BM-INT-10 Long-horizon 30/s 3,606 ms

Environment: QEMU/KVM · Intel Xeon E5-2697A v4 · 2 cores · Python 3.12 · Mock adapter.


Comparison

LangChain CrewAI Temporal nano-vm
LLM-native
Deterministic execution
Replayable traces partial minimal
Deterministic receipt
Suspend/resume partial partial
Runtime guardrails partial
LLM output enforcement
Pre-flight validator
Evaluator blindness
Lightweight / embedded

vs Marvin / DSPy: those optimize what the LLM produces. nano-vm controls when and whether steps run — orthogonal concerns, composable.

vs Temporal / Cadence: Temporal solves durable execution for distributed systems. nano-vm solves governed execution for LLM workflows — embedded, no infrastructure, Python-native.


When to Use

Use nano-vm when:

  • workflow structure is known in advance
  • correctness and auditability matter (fintech, compliance, enterprise)
  • you need a reproducible trace for debugging or audit
  • guardrails must be enforced at the system level, not in the prompt
  • async orchestration with suspend/resume is required
  • you need a deterministic receipt to prove what happened

Do not use when:

  • workflow must be discovered fully at runtime
  • the task is open-ended creative reasoning
  • fully autonomous multi-agent coordination is required

Roadmap

Done:

  • Deterministic FSM runtime (v0.1)
  • parallel steps — asyncio.gather (v0.2.0)
  • retry policy + max_concurrency (v0.3.0)
  • Budget guards: max_steps, max_stalled_steps, max_tokens (v0.4.0)
  • state_snapshots — sha256 fingerprint per step (v0.4.0)
  • Planner — intent → Program in 1 LLM call (v0.5.0)
  • FSM invariant stress suite — 13/13 · 1,020,000 ops (v0.6.0)
  • suspend / resume"PENDING" sentinel + CursorRepository (v0.7.0)
  • BudgetInterrupt + _emit_interrupt() hook (v0.7.0)
  • Trace.trace_id — UUID4, OTel-ready (v0.7.0)
  • erase() — GDPR tombstoning with hash-chain preservation (v0.7.0)
  • ASTEngineeval() removed; sandboxed condition evaluator (v0.7.0)
  • Integration benchmark suite — 10/10 · 1,096,500 ops (v0.7.3)
  • Step.is_terminal, Step.next_step — branch semantics (v0.7.4)
  • ASTEngine METHOD_CALL guard — ASTEvalError at parse time (v0.7.5)
  • py.typed marker — PEP 561 (v0.7.4)
  • MCP server — nano-vm-mcp with GovernanceEnvelope, CapabilityRef, SSE + stdio
  • Step.allowed_outputs — LLM output validation against enum (v0.8.0)
  • Step.timeout_seconds + on_timeout — per-step LLM timeout (v0.8.0)
  • inspect.iscoroutinefunction — Python 3.14 deprecation fix (v0.8.2)
  • TraceAnalyzer — rollback density, tool churn rate, path variance, transition entropy, invariant violation rate (v0.8.3)
  • LLMAdapter.complete()str | tuple[str, dict | None] Protocol (v0.8.4)
  • ProgramValidator — missing targets, unreachable steps, cycle detection, no_failure_terminal (v0.8.5)
  • ExecutionReceipt + RejectedTransitionReceipt = f(Trace) contract (v0.8.5)

Upcoming:

  • OpenTelemetry span per FSM step
  • nano-vm-mcp: GovernedToolExecutor circuit breaker
  • vm.step() incremental execution endpoint
  • depends_on + TopologicalSorter — declarative dependency graph

Contact & Support

Author: @ale007xd on Telegram · @ale007xd on X

USDT (TON)

USDT (TON): UQCakyytrEGBikOi3eYMpveGHXDB1-fd6lcuQC9VvKqMrI-9


License

MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_nano_vm-0.8.6.tar.gz (155.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_nano_vm-0.8.6-py3-none-any.whl (48.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_nano_vm-0.8.6.tar.gz.

File metadata

  • Download URL: llm_nano_vm-0.8.6.tar.gz
  • Upload date:
  • Size: 155.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_nano_vm-0.8.6.tar.gz
Algorithm Hash digest
SHA256 1e222204599a21e2a4902baf37db9647fc562e32380d47ece095d6ed76e9cfa9
MD5 0fda41b1441a1cc9b7e023f41180483d
BLAKE2b-256 76c8c6dd25e5823e2d0f8087ac28589241e5e53fbc8aa67a88122c4608aae526

See more details on using hashes here.

File details

Details for the file llm_nano_vm-0.8.6-py3-none-any.whl.

File metadata

  • Download URL: llm_nano_vm-0.8.6-py3-none-any.whl
  • Upload date:
  • Size: 48.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_nano_vm-0.8.6-py3-none-any.whl
Algorithm Hash digest
SHA256 332b469a11f38ac18ba1b240f15766611f90d71f1a26ace74489ca4a2e6234d7
MD5 917a376104fbe0a2fa3091aa5fc1d2e1
BLAKE2b-256 f2141a87bf5afa2289ba7b5cfe8d51a62c7dc00328af8b03d86e541d9433f4ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page