Deterministic VM for LLM program execution
Project description
Deterministic execution runtime for stateful workflows.
Replayable. Observable. Enforcement-first.
LLM support is optional.
Temporal-lite for deterministic AI and business process execution.
What nano-vm Is
nano-vm is a deterministic execution runtime built around finite-state-machine semantics.
It orchestrates financial workflows, webhook-driven async processes, approval pipelines, event-driven automation, retry-safe orchestration, LLM pipelines, and governance-bound execution graphs.
The runtime — not the model, not the tool, not the prompt — controls state transitions.
Core invariant:
δ(S, E) → S'
Where S is current execution state, E is a validated event, S' is the next deterministic state.
Why nano-vm Exists
Most workflow engines optimize scheduling. Most AI frameworks optimize prompting. nano-vm optimizes execution correctness.
The system guarantees: deterministic transitions, replayable traces, exactly-once execution invariants, resumable async workflows, explicit governance boundaries, and runtime-level enforcement.
The FSM runtime is the source of truth. LLMs are optional.
Mental Model
events / webhooks / tools / LLMs
↓
ExecutionVM
↓
deterministic FSM
↓
replayable trace
Formally:
nondeterminism ∈ signal generation
determinism ∈ runtime execution
Core Execution Pipeline
E = Signal(input) → raw event
E' = Validator(E) → validated event
A = FSM(S, E') → allowed transitions
a* = Policy(A, C) → selected transition
S' = δ(S, a*) → next state
| Layer | Role | Deterministic |
|---|---|---|
| Signal | LLM / webhook / API / user input | ❌ |
| Validator | schema + policy validation | ✅ |
| FSM | transition authority | ✅ |
| Policy | transition selection | ✅ |
| Tool executor | side effects | enforced |
Using nano-vm Without LLMs
LLMs are not required. nano-vm can operate as a pure deterministic workflow engine.
from nano_vm import ExecutionVM, Program
program = Program.from_dict({
"name": "payment_flow",
"steps": [
{"id": "reserve", "type": "tool", "tool": "reserve_funds"},
{"id": "capture", "type": "tool", "tool": "capture_payment"},
{"id": "receipt", "type": "tool", "tool": "send_receipt"},
]
})
vm = ExecutionVM(tools={
"reserve_funds": reserve_funds,
"capture_payment": capture_payment,
"send_receipt": send_receipt,
})
trace = await vm.run(program)
print(trace.status) # SUCCESS
No LLM. The runtime still guarantees: deterministic ordering, replayable execution, trace visibility, transition enforcement, exactly-once semantics.
Suspend / Resume — Async Business Processes
Return the sentinel "PENDING" from any tool to suspend execution:
async def initiate_payment(**kwargs) -> str:
await register_webhook(kwargs["order_id"])
return "PENDING" # FSM → SUSPENDED, cursor persisted
FSM transition: RUNNING → SUSPENDED → RUNNING → SUCCESS
This enables: payment settlement, courier confirmation, approval systems, human-in-the-loop, webhook orchestration.
from nano_vm.vm import ExecutionVM
vm = ExecutionVM(
tools={"initiate_payment": initiate_payment, "finalize_order": finalize_order},
)
trace = await vm.run(program, context={"order_id": "123"})
assert trace.status.name == "SUSPENDED"
trace = await vm.resume_with_program(
program=program,
trace_id=trace.trace_id,
webhook_event={"type": "payment.confirmed", "order_id": "123"},
)
assert trace.status.name == "SUCCESS"
Note:
"PENDING"is a reserved FSM sentinel. Do not return it as a domain status. Use"REQUIRES_ACTION","AWAITING_3DS", or any other string for domain-specific states.
FSM Transition Model
ExecutionVM is a deterministic finite state machine.
| Current state | Event | Next state |
|---|---|---|
RUNNING |
tool success | RUNNING |
RUNNING |
tool returns "PENDING" |
SUSPENDED |
RUNNING |
tool error (on_error=fail) |
FAILED |
RUNNING |
tool error (on_error=skip) |
RUNNING (output=None) |
RUNNING |
condition branch taken | RUNNING (jump to then/otherwise) |
RUNNING |
max_steps / max_tokens exceeded |
BUDGET_EXCEEDED |
RUNNING |
max_stalled_steps exceeded |
STALLED |
RUNNING |
no more steps | SUCCESS |
SUSPENDED |
resume_with_program() called |
RUNNING (from cursor) |
| terminal | — | absorbing (no further transitions) |
Terminal states: SUCCESS, FAILED, BUDGET_EXCEEDED, STALLED.
Install
pip install llm-nano-vm
pip install llm-nano-vm[litellm] # for LLM provider support
Quick Start — LLM Pipeline (guardrail that never skips)
from nano_vm import ExecutionVM, Program
from nano_vm.adapters import LiteLLMAdapter
program = Program.from_dict({
"name": "customer_refund",
"steps": [
{
"id": "analyze",
"type": "llm",
"prompt": "Is this a valid refund request? Reply 'yes' or 'no'.\nRequest: $user_input",
"output_key": "decision",
"allowed_outputs": ["yes", "no"], # v0.8.0 — enum guard
},
{
"id": "guardrail",
"type": "condition",
"condition": "'yes' in '$decision'",
"then": "process_refund",
"otherwise": "reject",
},
{"id": "process_refund", "type": "tool", "tool": "issue_refund", "is_terminal": True},
{"id": "reject", "type": "tool", "tool": "send_rejection", "is_terminal": True},
],
})
vm = ExecutionVM(
llm=LiteLLMAdapter("openai/gpt-4o-mini"),
tools={"issue_refund": ..., "send_rejection": ...},
)
trace = await vm.run(program, context={"user_input": "I was charged twice"})
print(trace.status) # SUCCESS
print(trace.total_cost_usd()) # e.g. 0.000034
The guardrail step cannot be skipped, reordered, or overridden by the model.
Program DSL
Four step types:
| Type | Purpose |
|---|---|
llm |
call the model; result stored in output_key |
tool |
call a Python function; return "PENDING" to suspend |
condition |
branch on an expression; then / otherwise |
parallel |
run independent sub-steps concurrently via asyncio.gather |
Step fields (v0.8.0):
| Field | Default | Description |
|---|---|---|
on_error |
fail |
fail · skip · retry |
max_retries |
3 |
total attempts; backoff: 1s, 2s, 4s… cap 30s |
max_concurrency |
None |
parallel blocks only |
is_terminal |
False |
return SUCCESS after this step (leaf nodes) |
next_step |
None |
jump to named step instead of returning SUCCESS |
allowed_outputs |
None |
LLM-only — accepted output enum; ValidationError if empty |
timeout_seconds |
None |
LLM-only — asyncio.wait_for timeout in seconds |
on_timeout |
'fail' |
'fail' · 'fallback' (→ allowed_outputs[0] or '') |
Program budget options (v0.4.0+):
| Option | Default | Description |
|---|---|---|
max_steps |
None |
BUDGET_EXCEEDED if exceeded |
max_stalled_steps |
None |
STALLED after N consecutive no-op fingerprints |
max_tokens |
None |
BUDGET_EXCEEDED when total tokens exceed limit |
Variable interpolation
| Syntax | Resolves to |
|---|---|
$key |
value from initial context (typed — int/dict/list preserved) |
$step_id.output |
output of a previous step |
$step_id.output.field |
field within a step's dict output |
Condition expressions — Security
⚠ ASTEngine replaces
eval(). Conditions are parsed into a validated JSON AST and evaluated by a pure, sandboxed interpreter. No Python builtins are accessible.
Supported operators: ==, !=, >, <, in, not in, and, or, not,
contains, dotted-path $var.field.
Not supported: method calls (.lower(), .strip(), etc.), arithmetic, parentheses
grouping. Using an unsupported form raises ASTEvalError at parse time with an explicit
message (v0.7.5+).
Rules for safe use:
- Condition logic must be authored by you, not generated from untrusted input at runtime.
- LLM output may appear as a value being tested (
'yes' in '$decision'), never as the condition expression itself. - If you need case-insensitive matching, control the LLM output format via the prompt
(
Reply ONLY with: yes or no) rather than calling.lower()in the condition. - Use
allowed_outputs(v0.8.0) to enforce exact output values at the step level before the condition is evaluated.
# ❌ WRONG — method call raises ASTEvalError (v0.7.5+)
{"condition": "'yes' in '$decision'.lower()"}
# ✅ CORRECT — pure value comparison
{"condition": "'yes' in '$decision'"}
# ✅ BEST (v0.8.0) — enum guard + condition
{
"id": "analyze", "type": "llm", "prompt": "...",
"allowed_outputs": ["yes", "no"],
"on_error": "skip", # safe default on unexpected output
}
LLM Output Validation — allowed_outputs (v0.8.0)
Validates the model's raw output string against an explicit enum at the step level.
{
"id": "classify",
"type": "llm",
"prompt": "Classify the request. Reply ONLY with: refund / query / other",
"output_key": "category",
"allowed_outputs": ["refund", "query", "other"],
"on_error": "skip", # output → "refund" (first element) on mismatch
}
Behaviour by on_error:
on_error |
On mismatch |
|---|---|
fail (default) |
VMError → trace.FAILED |
skip |
output replaced with allowed_outputs[0] |
retry |
retry up to max_retries; VMError if exhausted |
Constraints: non-empty list required; llm steps only.
LLM Step Timeout — timeout_seconds (v0.8.0)
Prevents a hung LLM call from stalling the entire FSM.
{
"id": "analyze",
"type": "llm",
"prompt": "...",
"allowed_outputs": ["approve", "reject"],
"timeout_seconds": 5.0,
"on_timeout": "fallback", # → "approve" (allowed_outputs[0])
}
on_timeout=fail (default) transitions to FAILED. on_timeout=fallback substitutes
allowed_outputs[0] if set, otherwise ''.
Branch Semantics (v0.7.4)
Condition branch targets are terminal by default. Use next_step for inline continuation:
# Branch target is terminal — FSM returns SUCCESS after notify_success
{"id": "notify_success", "type": "tool", "tool": "send_email", "is_terminal": True}
# Branch target continues to poll_payment
{"id": "create_payment", "type": "tool", "tool": "create_payment_intent",
"next_step": "poll_payment"}
Terminal leaf steps (notify_*, reject_*, alert_*) must be placed after the main
flow in the flat steps array and marked is_terminal: true. The FSM starts at index 0 and
reaches terminal steps only via an id-jump (then/otherwise/next_step), never by
sequential index.
Parallel Execution
{
"id": "fetch",
"type": "parallel",
"max_concurrency": 5,
"on_error": "skip",
"parallel_steps": [
{"id": "weather", "type": "tool", "tool": "get_weather", "args": {"city": "$city"}},
{"id": "news", "type": "tool", "tool": "get_news", "args": {"topic": "$topic"}},
],
}
Wall-clock time = slowest sub-step. Partial result: failed sub-step with on_error: skip
produces None, not an exception.
Observability
trace.trace_id # UUID4 — stable for OTel propagation
trace.status # TraceStatus.SUCCESS | FAILED | SUSPENDED | BUDGET_EXCEEDED | STALLED
trace.final_output
trace.total_tokens() # O(1) incremental accumulator
trace.total_cost_usd() # requires LiteLLMAdapter
trace.state_snapshots # list[(step_index, sha256_hex)]
for step in trace.steps:
print(step.step_id, step.status, step.duration_ms, step.usage)
Budget Interrupts
from nano_vm.vm import ExecutionVM, InterruptType
class InstrumentedVM(ExecutionVM):
async def _emit_interrupt(self, interrupt_type: InterruptType) -> None:
await notify_operator(f"interrupt: {interrupt_type.value}")
Budget exhaustion (BudgetInterrupt) fires before the next step executes.
The LLM cannot observe or influence it.
Testing — Deterministic by Design
from nano_vm import ExecutionVM, Program, TraceStatus
from nano_vm.adapters import MockLLMAdapter
vm = ExecutionVM(llm=MockLLMAdapter("yes")) # always returns "yes"
# Per-call sequence
vm = ExecutionVM(llm=MockLLMAdapter(["SAFE", "yes"]))
# Per-prompt substring mapping
vm = ExecutionVM(llm=MockLLMAdapter({
"Classify": "SAFE",
"eligible": "yes",
"__default__": "ok",
}))
trace = await vm.run(program, context={"user_input": "refund"})
assert trace.status == TraceStatus.SUCCESS
Same input → same step sequence. No API key required.
State Determinism vs Semantic Determinism
nano-vm guarantees State Determinism — step execution order, no skipping, reproducible
trace structure — regardless of LLM output. It does not guarantee Semantic Determinism
(LLM text may differ across runs even at temperature=0.0). Use MockLLMAdapter for both.
Planner (Optional)
from nano_vm import Planner
planner = Planner(llm=adapter, max_retries=2, temperature=0.0)
program = await planner.generate(
"Fetch latest AI news, summarize, classify by topic",
available_tools=["fetch_rss", "summarize", "classify"],
context_keys=["user_id"],
)
trace = await vm.run(program)
Exactly 1 LLM call → validated Program. Planner output is probabilistic; execution
remains deterministic. Review Planner-generated programs before deploying to production.
MCP Integration
nano-vm pairs with nano-vm-mcp — an MCP server
that exposes run_program, get_trace, list_programs, get_program, delete_program
over stdio or SSE transport with bearer auth and SQLite WAL persistence.
Architecture
MCP Client
→ nano-vm-mcp (Gateway)
→ GovernedRunProgramHandler ← PolicySnapshot, CapabilityRef
→ llm-nano-vm (Kernel) ← deterministic FSM, ASTEngine, ProjectionLayer
→ GovernanceEnvelope store ← SQLite WAL, append-only audit log
GovernanceEnvelope
Each successful execution step produces a GovernanceEnvelope (frozen Pydantic model)
stored in the governance_envelopes table:
| Field | Description |
|---|---|
execution_id |
Session / trace identifier |
step_id |
Step index within the execution |
policy_hash |
SHA-256 of the active PolicySnapshot |
canonical_snapshot_hash |
Merkle/delta hash of CanonicalState |
payload |
Projected (sanitized) step output |
CapabilityRef and GDPR Tombstoning
Sensitive values are stored as CapabilityRef tokens (vault://secret/<id>).
On a GDPR erasure event, the ref is tombstoned (is_tombstone=True). All subsequent
projections return [REDACTED_TOMBSTONE], preserving the hash chain.
Custom Adapter
class MyAdapter:
async def complete(self, messages: list[dict], **kwargs) -> str:
... # call any LLM API
Built-in via [litellm] extra:
LiteLLMAdapter("groq/llama-3.3-70b-versatile")
LiteLLMAdapter("openrouter/llama-3.3-70b-instruct:free")
LiteLLMAdapter("ollama/llama3")
LiteLLMAdapter("openai/gpt-4o-mini")
Performance
The VM introduces near-zero overhead. The bottleneck is the LLM API or external I/O.
v0.8.2 stress suite (435/435 tests · 0 violations)
| Suite | Result |
|---|---|
| MoMo PoC v4 | 9/9 PASS |
| Stripe PoC v1 | 9/9 PASS |
| FSM invariant suite (v0.6.0) | 13/13 · 1,020,000 ops · 0 violations |
| Integration suite (v0.8.2) | 10/10 · 1,096,500 ops · 0 violations |
| 10k stress (v0.7.0) | 14,327 graphs/sec · 0.70 s/run |
Integration benchmark detail (v0.8.2)
Environment: WSL2 · Windows 11 · Python 3.12 · Mock adapter · 3 cycles × 5 runs × 10,000 items.
| ID | Scenario | Mean TPS | p95 avg | Extended |
|---|---|---|---|---|
| BM-INT-01 | Refund pipeline | 2,200/s | 123 ms | — |
| BM-INT-02 | Double-execution guard | 2,800/s | 69 ms | — |
| BM-INT-03 | Budget enforcement | 2,400/s | 97 ms | — |
| BM-INT-04 | Parallel throughput | 1,000/s | 196 ms | — |
| BM-INT-05 | MCP store round-trip | 11,000/s | 0.13 ms | — |
| BM-INT-06 | GovernanceEnvelope | 2,100/s | 108 ms | — |
| BM-INT-07 | Crash consistency | 11/s | 115 ms | crash_rate=100% hash_match=100% |
| BM-INT-08 | Replay equivalence | 1,300/s | 164 ms | hash_match=100% |
| BM-INT-09 | Adversarial retries | 2,600/s | 87 ms | dup=3000 ooo=1000 delayed=1000 |
| BM-INT-10 | Long-horizon | 95/s | 11,887 ms | peak RSS 76.5 MB · alloc 3.62 MB |
Reproduce:
python benchmarks/stress_test.py
python benchmarks/benchmark_v030.py
python benchmarks/benchmark_v040.py
python benchmarks/run_all.py
python benchmarks/benchmark_double.py
python benchmarks/benchmark_nano_vm.py
python benchmarks/benchmark_stress_060
python benchmarks/benchmark_integration.py
Comparison
| LangChain | CrewAI | Temporal | nano-vm | |
|---|---|---|---|---|
| LLM-native | ✅ | ✅ | ❌ | ✅ |
| Deterministic FSM | ❌ | ❌ | ✅ | ✅ |
| Replayable traces | partial | minimal | ✅ | ✅ |
| Suspend/resume | partial | partial | ✅ | ✅ |
| Runtime guardrails | ❌ | ❌ | partial | ✅ |
| Lightweight | ❌ | ❌ | ❌ | ✅ |
| Business workflows | partial | ❌ | ✅ | ✅ |
| AI workflows | ✅ | ✅ | partial | ✅ |
vs Marvin / DSPy: those optimize what the LLM produces (structured outputs, prompt tuning). nano-vm controls when and whether steps run — orthogonal concerns, composable.
When to Use
Use nano-vm when:
- workflow structure is known in advance
- correctness and auditability matter (fintech, compliance, enterprise)
- you need a reproducible trace for debugging or logging
- guardrails must be enforced at the system level, not in the prompt
- async orchestration with suspend/resume is required
Do NOT use when:
- workflow must be discovered fully at runtime
- the task is open-ended creative reasoning
- fully autonomous multi-agent coordination is required
Roadmap
Done:
- Deterministic FSM runtime (v0.1)
-
parallelsteps —asyncio.gather(v0.2.0) -
retrypolicy +max_concurrency(v0.3.0) - Budget guards:
max_steps,max_stalled_steps,max_tokens(v0.4.0) -
state_snapshots— sha256 fingerprint per step (v0.4.0) -
Planner— intent → Program in 1 LLM call (v0.5.0) - FSM invariant stress suite — 13/13 · 1,020,000 ops (v0.6.0)
-
suspend / resume—"PENDING"sentinel +CursorRepository(v0.7.0) -
BudgetInterrupt+_emit_interrupt()hook (v0.7.0) -
VaultStepResult+VaultStepMetadata— MCP-compatible DTOs (v0.7.0) -
Trace.trace_id— UUID4, OTel-ready (v0.7.0) -
erase()— GDPR tombstoning with hash-chain preservation (v0.7.0) -
ASTEngine—eval()removed; sandboxed condition evaluator (v0.7.0) - Integration benchmark suite — 10/10 · 1,096,500 ops (v0.7.3)
-
Step.is_terminal,Step.next_step— branch semantics (v0.7.4) -
$step_id.output/$step_id.output.fieldresolution fix (v0.7.4) -
_resolvetyped return + multi-segment dotted path (v0.7.4) - ASTEngine METHOD_CALL guard —
ASTEvalErrorat parse time (v0.7.5) -
py.typedmarker — PEP 561 (v0.7.4) - MCP server —
nano-vm-mcpwith GovernanceEnvelope, CapabilityRef, SSE + stdio -
Step.allowed_outputs— LLM output validation against enum (v0.8.0) -
Step.timeout_seconds+on_timeout— per-step LLM timeout (v0.8.0) -
asyncio.iscoroutinefunction→inspect.iscoroutinefunction— Python 3.14+ compatibility (v0.8.2)
Upcoming — DSL hardening (v0.8.x):
-
ProgramValidator— static analysis: unreachable steps, missing targets, cycle detection
Upcoming — execution graph (v0.8.x):
-
depends_on+TopologicalSorter— declarative dependency graph overparallel
Upcoming — observability (v0.8.x):
- OpenTelemetry span per FSM step
- Incremental counters in
Trace:llm_calls,tool_calls,retries_total
Upcoming — gateway (v0.9.x):
-
nano-vm-mcp:GovernedToolExecutorcircuit breaker — degradation isolation - Blueprint registry —
resume()without explicit program argument -
replan_on_interrupt— Planner-driven continuation on budget interrupts
Contact & Support
Author: @ale007xd on Telegram · @ale007xd on X
USDT (TON): UQCakyytrEGBikOi3eYMpveGHXDB1-fd6lcuQC9VvKqMrI-9
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_nano_vm-0.8.2.tar.gz.
File metadata
- Download URL: llm_nano_vm-0.8.2.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04ecbc69f4ba923f37b37d4ad33cda334b2794bdb9991067f0f2e7e9c99d7247
|
|
| MD5 |
953bf6838d9a4e68982f683e439b49e3
|
|
| BLAKE2b-256 |
089569139de96c3280bb712af12094339137b049302eaf7123cdf21592d2109f
|
File details
Details for the file llm_nano_vm-0.8.2-py3-none-any.whl.
File metadata
- Download URL: llm_nano_vm-0.8.2-py3-none-any.whl
- Upload date:
- Size: 40.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c9ed8d5d10837fc1db4e0cbd3fba96de94127bc07e67bb84ab6a599e8b052fa
|
|
| MD5 |
a9eed5117fb5423ee7c46f54c4409b07
|
|
| BLAKE2b-256 |
718b33216c0a5581defcf6092c8d036dd5ac9de6cdc4dd65c79770110925c9cb
|