llm-nano-vm

Deterministic VM for LLM program execution

These details have not been verified by PyPI

Project links

Project description

Python License

Deterministic execution runtime for stateful workflows.
Replayable. Observable. Enforcement-first.
LLM support is optional.

Temporal-lite for deterministic AI and business process execution.

What nano-vm Is

nano-vm is a deterministic execution runtime built around finite-state-machine semantics.

It orchestrates financial workflows, webhook-driven async processes, approval pipelines, event-driven automation, retry-safe orchestration, LLM pipelines, and governance-bound execution graphs.

The runtime — not the model, not the tool, not the prompt — controls state transitions.

Core invariant:

δ(S, E) → S'

Where S is current execution state, E is a validated event, S' is the next deterministic state.

Why nano-vm Exists

Most workflow engines optimize scheduling. Most AI frameworks optimize prompting. nano-vm optimizes execution correctness.

The system guarantees: deterministic transitions, replayable traces, exactly-once execution invariants, resumable async workflows, explicit governance boundaries, and runtime-level enforcement.

The FSM runtime is the source of truth. LLMs are optional.

Mental Model

events / webhooks / tools / LLMs
              ↓
        ExecutionVM
              ↓
        deterministic FSM
              ↓
        replayable trace

Formally:

nondeterminism ∈ signal generation
determinism    ∈ runtime execution

Core Execution Pipeline

E  = Signal(input)      → raw event
E' = Validator(E)       → validated event
A  = FSM(S, E')         → allowed transitions
a* = Policy(A, C)       → selected transition
S' = δ(S, a*)           → next state

Layer	Role	Deterministic
Signal	LLM / webhook / API / user input	❌
Validator	schema + policy validation	✅
FSM	transition authority	✅
Policy	transition selection	✅
Tool executor	side effects	enforced

Using nano-vm Without LLMs

LLMs are not required. nano-vm can operate as a pure deterministic workflow engine.

from nano_vm import ExecutionVM, Program

program = Program.from_dict({
    "name": "payment_flow",
    "steps": [
        {"id": "reserve",  "type": "tool", "tool": "reserve_funds"},
        {"id": "capture",  "type": "tool", "tool": "capture_payment"},
        {"id": "receipt",  "type": "tool", "tool": "send_receipt"},
    ]
})

vm = ExecutionVM(tools={
    "reserve_funds":   reserve_funds,
    "capture_payment": capture_payment,
    "send_receipt":    send_receipt,
})

trace = await vm.run(program)
print(trace.status)  # SUCCESS

No LLM. The runtime still guarantees: deterministic ordering, replayable execution, trace visibility, transition enforcement, exactly-once semantics.

Suspend / Resume — Async Business Processes

Return the sentinel "PENDING" from any tool to suspend execution:

async def initiate_payment(**kwargs) -> str:
    await register_webhook(kwargs["order_id"])
    return "PENDING"   # FSM → SUSPENDED, cursor persisted

FSM transition: RUNNING → SUSPENDED → RUNNING → SUCCESS

This enables: payment settlement, courier confirmation, approval systems, human-in-the-loop, webhook orchestration.

from nano_vm.vm import ExecutionVM

vm = ExecutionVM(
    tools={"initiate_payment": initiate_payment, "finalize_order": finalize_order},
)

trace = await vm.run(program, context={"order_id": "123"})
assert trace.status.name == "SUSPENDED"

trace = await vm.resume_with_program(
    program=program,
    trace_id=trace.trace_id,
    webhook_event={"type": "payment.confirmed", "order_id": "123"},
)
assert trace.status.name == "SUCCESS"

Note: "PENDING" is a reserved FSM sentinel. Do not return it as a domain status. Use "REQUIRES_ACTION", "AWAITING_3DS", or any other string for domain-specific states.

FSM Transition Model

ExecutionVM is a deterministic finite state machine.

Current state	Event	Next state
`RUNNING`	tool success	`RUNNING`
`RUNNING`	tool returns `"PENDING"`	`SUSPENDED`
`RUNNING`	tool error (`on_error=fail`)	`FAILED`
`RUNNING`	tool error (`on_error=skip`)	`RUNNING` (output=`None`)
`RUNNING`	condition branch taken	`RUNNING` (jump to `then`/`otherwise`)
`RUNNING`	`max_steps` / `max_tokens` exceeded	`BUDGET_EXCEEDED`
`RUNNING`	`max_stalled_steps` exceeded	`STALLED`
`RUNNING`	no more steps	`SUCCESS`
`SUSPENDED`	`resume_with_program()` called	`RUNNING` (from cursor)
terminal	—	absorbing (no further transitions)

Terminal states: SUCCESS, FAILED, BUDGET_EXCEEDED, STALLED.

Install

pip install llm-nano-vm
pip install llm-nano-vm[litellm]   # for LLM provider support

Quick Start — LLM Pipeline (guardrail that never skips)

from nano_vm import ExecutionVM, Program
from nano_vm.adapters import LiteLLMAdapter

program = Program.from_dict({
    "name": "customer_refund",
    "steps": [
        {
            "id": "analyze",
            "type": "llm",
            "prompt": "Is this a valid refund request? Reply 'yes' or 'no'.\nRequest: $user_input",
            "output_key": "decision",
            "allowed_outputs": ["yes", "no"],   # v0.8.0 — enum guard
        },
        {
            "id": "guardrail",
            "type": "condition",
            "condition": "'yes' in '$decision'",
            "then": "process_refund",
            "otherwise": "reject",
        },
        {"id": "process_refund", "type": "tool", "tool": "issue_refund",    "is_terminal": True},
        {"id": "reject",         "type": "tool", "tool": "send_rejection",  "is_terminal": True},
    ],
})

vm = ExecutionVM(
    llm=LiteLLMAdapter("openai/gpt-4o-mini"),
    tools={"issue_refund": ..., "send_rejection": ...},
)

trace = await vm.run(program, context={"user_input": "I was charged twice"})
print(trace.status)           # SUCCESS
print(trace.total_cost_usd()) # e.g. 0.000034

The guardrail step cannot be skipped, reordered, or overridden by the model.

Program DSL

Four step types:

Type	Purpose
`llm`	call the model; result stored in `output_key`
`tool`	call a Python function; return `"PENDING"` to suspend
`condition`	branch on an expression; `then` / `otherwise`
`parallel`	run independent sub-steps concurrently via `asyncio.gather`

Step fields (v0.8.0):

Field	Default	Description
`on_error`	`fail`	`fail` · `skip` · `retry`
`max_retries`	`3`	total attempts; backoff: 1s, 2s, 4s… cap 30s
`max_concurrency`	`None`	parallel blocks only
`is_terminal`	`False`	return `SUCCESS` after this step (leaf nodes)
`next_step`	`None`	jump to named step instead of returning `SUCCESS`
`allowed_outputs`	`None`	LLM-only — accepted output enum; `ValidationError` if empty
`timeout_seconds`	`None`	LLM-only — `asyncio.wait_for` timeout in seconds
`on_timeout`	`'fail'`	`'fail'` · `'fallback'` (→ `allowed_outputs[0]` or `''`)

Program budget options (v0.4.0+):

Option	Default	Description
`max_steps`	`None`	`BUDGET_EXCEEDED` if exceeded
`max_stalled_steps`	`None`	`STALLED` after N consecutive no-op fingerprints
`max_tokens`	`None`	`BUDGET_EXCEEDED` when total tokens exceed limit

Variable interpolation

Syntax	Resolves to
`$key`	value from initial context (typed — int/dict/list preserved)
`$step_id.output`	output of a previous step
`$step_id.output.field`	field within a step's dict output

Condition expressions — Security

⚠ ASTEngine replaces eval(). Conditions are parsed into a validated JSON AST and evaluated by a pure, sandboxed interpreter. No Python builtins are accessible.

Supported operators: ==, !=, >, <, in, not in, and, or, not, contains, dotted-path $var.field.

Not supported: method calls (.lower(), .strip(), etc.), arithmetic, parentheses grouping. Using an unsupported form raises ASTEvalError at parse time with an explicit message (v0.7.5+).

Rules for safe use:

Condition logic must be authored by you, not generated from untrusted input at runtime.
LLM output may appear as a value being tested ('yes' in '$decision'), never as the condition expression itself.
If you need case-insensitive matching, control the LLM output format via the prompt (Reply ONLY with: yes or no) rather than calling .lower() in the condition.
Use allowed_outputs (v0.8.0) to enforce exact output values at the step level before the condition is evaluated.

# ❌ WRONG — method call raises ASTEvalError (v0.7.5+)
{"condition": "'yes' in '$decision'.lower()"}

# ✅ CORRECT — pure value comparison
{"condition": "'yes' in '$decision'"}

# ✅ BEST (v0.8.0) — enum guard + condition
{
    "id": "analyze", "type": "llm", "prompt": "...",
    "allowed_outputs": ["yes", "no"],
    "on_error": "skip",          # safe default on unexpected output
}

LLM Output Validation — `allowed_outputs` (v0.8.0)

Validates the model's raw output string against an explicit enum at the step level.

{
    "id": "classify",
    "type": "llm",
    "prompt": "Classify the request. Reply ONLY with: refund / query / other",
    "output_key": "category",
    "allowed_outputs": ["refund", "query", "other"],
    "on_error": "skip",          # output → "refund" (first element) on mismatch
}

Behaviour by on_error:

`on_error`	On mismatch
`fail` (default)	`VMError` → `trace.FAILED`
`skip`	output replaced with `allowed_outputs[0]`
`retry`	retry up to `max_retries`; `VMError` if exhausted

Constraints: non-empty list required; llm steps only.

LLM Step Timeout — `timeout_seconds` (v0.8.0)

Prevents a hung LLM call from stalling the entire FSM.

{
    "id": "analyze",
    "type": "llm",
    "prompt": "...",
    "allowed_outputs": ["approve", "reject"],
    "timeout_seconds": 5.0,
    "on_timeout": "fallback",    # → "approve" (allowed_outputs[0])
}

on_timeout=fail (default) transitions to FAILED. on_timeout=fallback substitutes allowed_outputs[0] if set, otherwise ''.

Branch Semantics (v0.7.4)

Condition branch targets are terminal by default. Use next_step for inline continuation:

# Branch target is terminal — FSM returns SUCCESS after notify_success
{"id": "notify_success", "type": "tool", "tool": "send_email", "is_terminal": True}

# Branch target continues to poll_payment
{"id": "create_payment", "type": "tool", "tool": "create_payment_intent",
 "next_step": "poll_payment"}

Terminal leaf steps (notify_*, reject_*, alert_*) must be placed before any inline chain steps in the flat steps array and marked is_terminal: true.

Parallel Execution

{
    "id": "fetch",
    "type": "parallel",
    "max_concurrency": 5,
    "on_error": "skip",
    "parallel_steps": [
        {"id": "weather", "type": "tool", "tool": "get_weather", "args": {"city": "$city"}},
        {"id": "news",    "type": "tool", "tool": "get_news",    "args": {"topic": "$topic"}},
    ],
}

Wall-clock time = slowest sub-step. Partial result: failed sub-step with on_error: skip produces None, not an exception.

Observability

trace.trace_id              # UUID4 — stable for OTel propagation
trace.status                # TraceStatus.SUCCESS | FAILED | SUSPENDED | BUDGET_EXCEEDED | STALLED
trace.final_output
trace.total_tokens()        # O(1) incremental accumulator
trace.total_cost_usd()      # requires LiteLLMAdapter
trace.state_snapshots       # list[(step_index, sha256_hex)]

for step in trace.steps:
    print(step.step_id, step.status, step.duration_ms, step.usage)

Budget Interrupts

from nano_vm.vm import ExecutionVM, InterruptType

class InstrumentedVM(ExecutionVM):
    async def _emit_interrupt(self, interrupt_type: InterruptType) -> None:
        await notify_operator(f"interrupt: {interrupt_type.value}")

Budget exhaustion (BudgetInterrupt) fires before the next step executes. The LLM cannot observe or influence it.

Testing — Deterministic by Design

from nano_vm import ExecutionVM, Program, TraceStatus
from nano_vm.adapters import MockLLMAdapter

vm = ExecutionVM(llm=MockLLMAdapter("yes"))   # always returns "yes"

# Per-call sequence
vm = ExecutionVM(llm=MockLLMAdapter(["SAFE", "yes"]))

# Per-prompt substring mapping
vm = ExecutionVM(llm=MockLLMAdapter({
    "Classify": "SAFE",
    "eligible": "yes",
    "__default__": "ok",
}))

trace = await vm.run(program, context={"user_input": "refund"})
assert trace.status == TraceStatus.SUCCESS

Same input → same step sequence. No API key required.

State Determinism vs Semantic Determinism

nano-vm guarantees State Determinism — step execution order, no skipping, reproducible trace structure — regardless of LLM output. It does not guarantee Semantic Determinism (LLM text may differ across runs even at temperature=0.0). Use MockLLMAdapter for both.

Planner (Optional)

from nano_vm import Planner

planner = Planner(llm=adapter, max_retries=2, temperature=0.0)
program = await planner.generate(
    "Fetch latest AI news, summarize, classify by topic",
    available_tools=["fetch_rss", "summarize", "classify"],
    context_keys=["user_id"],
)
trace = await vm.run(program)

Exactly 1 LLM call → validated Program. Planner output is probabilistic; execution remains deterministic. Review Planner-generated programs before deploying to production.

MCP Integration

nano-vm pairs with nano-vm-mcp — an MCP server that exposes run_program, get_trace, list_programs, get_program, delete_program over stdio or SSE transport with bearer auth and SQLite WAL persistence.

Architecture

MCP Client
  → nano-vm-mcp (Gateway)
      → GovernedRunProgramHandler   ← PolicySnapshot, CapabilityRef
          → llm-nano-vm (Kernel)    ← deterministic FSM, ASTEngine, ProjectionLayer
      → GovernanceEnvelope store    ← SQLite WAL, append-only audit log

GovernanceEnvelope

Each successful execution step produces a GovernanceEnvelope (frozen Pydantic model) stored in the governance_envelopes table:

Field	Description
`execution_id`	Session / trace identifier
`step_id`	Step index within the execution
`policy_hash`	SHA-256 of the active `PolicySnapshot`
`canonical_snapshot_hash`	Merkle/delta hash of `CanonicalState`
`payload`	Projected (sanitized) step output

CapabilityRef and GDPR Tombstoning

Sensitive values are stored as CapabilityRef tokens (vault://secret/<id>). On a GDPR erasure event, the ref is tombstoned (is_tombstone=True). All subsequent projections return [REDACTED_TOMBSTONE], preserving the hash chain.

Custom Adapter

class MyAdapter:
    async def complete(self, messages: list[dict], **kwargs) -> str:
        ...  # call any LLM API

Built-in via [litellm] extra:

LiteLLMAdapter("groq/llama-3.3-70b-versatile")
LiteLLMAdapter("openrouter/llama-3.3-70b-instruct:free")
LiteLLMAdapter("ollama/llama3")
LiteLLMAdapter("openai/gpt-4o-mini")

Performance

The VM introduces near-zero overhead. The bottleneck is the LLM API or external I/O.

v0.8.0 stress suite (432/432 tests · 0 violations)

Suite	Result
MoMo PoC v4	9/9 PASS
Stripe PoC v1	9/9 PASS
FSM invariant suite (v0.6.0)	13/13 · 1,020,000 ops · 0 violations
Integration suite (v0.7.3)	10/10 · 1,096,500 ops · 0 violations
10k stress (v0.7.0)	14,327 graphs/sec · 0.70 s/run

Integration benchmark detail (v0.7.3)

Environment: QEMU/KVM · Intel Xeon E5-2697A v4 · 2 cores · Python 3.12 · Mock adapter.

ID	Scenario	Mean TPS	p95 avg
BM-INT-01	Refund pipeline	2,300/s	0.66 ms
BM-INT-02	Double-execution guard	2,400/s	0.67 ms
BM-INT-03	Budget enforcement	1,100/s	331 ms
BM-INT-04	Parallel throughput	436/s	542 ms
BM-INT-05	MCP store round-trip	3,000/s	0.42 ms
BM-INT-06	GovernanceEnvelope	1,300/s	171 ms
BM-INT-07	Crash consistency	7/s	233 ms
BM-INT-08	Replay equivalence	1,300/s	1.30 ms
BM-INT-09	Adversarial retries	2,400/s	0.64 ms
BM-INT-10	Long-horizon	30/s	3,606 ms

Reproduce:

python benchmarks/stress_test.py
python benchmarks/benchmark_v030.py
python benchmarks/benchmark_v040.py
python benchmarks/run_all.py
python benchmarks/benchmark_double.py
python benchmarks/benchmark_nano_vm.py
python benchmarks/benchmark_stress_060
python benchmarks/benchmark_integration.py

Comparison

	LangChain	CrewAI	Temporal	nano-vm
LLM-native	✅	✅	❌	✅
Deterministic FSM	❌	❌	✅	✅
Replayable traces	partial	minimal	✅	✅
Suspend/resume	partial	partial	✅	✅
Runtime guardrails	❌	❌	partial	✅
Lightweight	❌	❌	❌	✅
Business workflows	partial	❌	✅	✅
AI workflows	✅	✅	partial	✅

vs Marvin / DSPy: those optimize what the LLM produces (structured outputs, prompt tuning). nano-vm controls when and whether steps run — orthogonal concerns, composable.

When to Use

Use nano-vm when:

workflow structure is known in advance
correctness and auditability matter (fintech, compliance, enterprise)
you need a reproducible trace for debugging or logging
guardrails must be enforced at the system level, not in the prompt
async orchestration with suspend/resume is required

Do NOT use when:

workflow must be discovered fully at runtime
the task is open-ended creative reasoning
fully autonomous multi-agent coordination is required

Roadmap

Done:

Deterministic FSM runtime (v0.1)
parallel steps — asyncio.gather (v0.2.0)
retry policy + max_concurrency (v0.3.0)
Budget guards: max_steps, max_stalled_steps, max_tokens (v0.4.0)
state_snapshots — sha256 fingerprint per step (v0.4.0)
Planner — intent → Program in 1 LLM call (v0.5.0)
FSM invariant stress suite — 13/13 · 1,020,000 ops (v0.6.0)
suspend / resume — "PENDING" sentinel + CursorRepository (v0.7.0)
BudgetInterrupt + _emit_interrupt() hook (v0.7.0)
VaultStepResult + VaultStepMetadata — MCP-compatible DTOs (v0.7.0)
Trace.trace_id — UUID4, OTel-ready (v0.7.0)
erase() — GDPR tombstoning with hash-chain preservation (v0.7.0)
ASTEngine — eval() removed; sandboxed condition evaluator (v0.7.0)
Integration benchmark suite — 10/10 · 1,096,500 ops (v0.7.3)
Step.is_terminal, Step.next_step — branch semantics (v0.7.4)
$step_id.output / $step_id.output.field resolution fix (v0.7.4)
_resolve typed return + multi-segment dotted path (v0.7.4)
ASTEngine METHOD_CALL guard — ASTEvalError at parse time (v0.7.5)
py.typed marker — PEP 561 (v0.7.4)
MCP server — nano-vm-mcp with GovernanceEnvelope, CapabilityRef, SSE + stdio
Step.allowed_outputs — LLM output validation against enum (v0.8.0)
Step.timeout_seconds + on_timeout — per-step LLM timeout (v0.8.0)

Upcoming — DSL hardening (v0.8.x):

ProgramValidator — static analysis: unreachable steps, missing targets, cycle detection

Upcoming — execution graph (v0.8.x):

depends_on + TopologicalSorter — declarative dependency graph over parallel

Upcoming — observability (v0.8.x):

OpenTelemetry span per FSM step
Incremental counters in Trace: llm_calls, tool_calls, retries_total

Upcoming — gateway (v0.9.x):

nano-vm-mcp: StateContext SQLite persistence — close inter-session duplicate risk
nano-vm-mcp: idempotency_store — inter-session exactly-once guarantee
nano-vm-mcp: GovernedToolExecutor + circuit breaker
Blueprint registry — resume() without explicit program argument
replan_on_interrupt — Planner-driven continuation on budget interrupts

Contact & Support

Author: @ale007xd on Telegram · @ale007xd on X

USDT (TON): UQCakyytrEGBikOi3eYMpveGHXDB1-fd6lcuQC9VvKqMrI-9

License

MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.6

Jun 11, 2026

0.8.5

Jun 9, 2026

0.8.4

Jun 4, 2026

0.8.3

Jun 3, 2026

0.8.2

May 29, 2026

This version

0.8.0

May 22, 2026

0.7.5

May 18, 2026

0.7.4

May 16, 2026

0.7.3

May 14, 2026

0.6.0

May 3, 2026

0.5.0

Apr 30, 2026

0.4.0

Apr 28, 2026

0.3.0

Apr 28, 2026

0.2.0

Apr 27, 2026

0.1.4

Apr 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_nano_vm-0.8.0.tar.gz (1.2 MB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_nano_vm-0.8.0-py3-none-any.whl (40.1 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file llm_nano_vm-0.8.0.tar.gz.

File metadata

Download URL: llm_nano_vm-0.8.0.tar.gz
Upload date: May 22, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_nano_vm-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`d225bff5ede3cf78367e8a705354312137fbf18df2025d01b27bebc2a4ecdb02`
MD5	`2e69e47463721cb5b7370a489524f75f`
BLAKE2b-256	`b75393921b776fcca8c942e6f298b7719376de648dcceb034e16b0ce77f42f64`

See more details on using hashes here.

File details

Details for the file llm_nano_vm-0.8.0-py3-none-any.whl.

File metadata

Download URL: llm_nano_vm-0.8.0-py3-none-any.whl
Upload date: May 22, 2026
Size: 40.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_nano_vm-0.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c084868e5f92327b59ec67fa1d8545effa34e06c5dce64ddc3367a8847a24382`
MD5	`07a4b6e557aaa314d21ee5ff8f9c1e48`
BLAKE2b-256	`fca996d15cbbeed37416f17d74eb54fbfb8712206c14310471e58691bd3dffee`

See more details on using hashes here.

llm-nano-vm 0.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What nano-vm Is

Why nano-vm Exists

Mental Model

Core Execution Pipeline

Using nano-vm Without LLMs

Suspend / Resume — Async Business Processes

FSM Transition Model

Install

Quick Start — LLM Pipeline (guardrail that never skips)

Program DSL

Variable interpolation

Condition expressions — Security

LLM Output Validation — allowed_outputs (v0.8.0)

LLM Step Timeout — timeout_seconds (v0.8.0)

Branch Semantics (v0.7.4)

Parallel Execution

Observability

Budget Interrupts

Testing — Deterministic by Design

State Determinism vs Semantic Determinism

Planner (Optional)

MCP Integration

Architecture

GovernanceEnvelope

CapabilityRef and GDPR Tombstoning

Custom Adapter

Performance

v0.8.0 stress suite (432/432 tests · 0 violations)

Integration benchmark detail (v0.7.3)

Comparison

When to Use

Roadmap

Contact & Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

LLM Output Validation — `allowed_outputs` (v0.8.0)

LLM Step Timeout — `timeout_seconds` (v0.8.0)