Deterministic VM for LLM program execution
Project description
Deterministic parallel execution for LLM pipelines.
Use when your workflow structure is known and correctness is non-negotiable.
Guardrails enforced by the VM, not by the prompt.
LangChain = flexible but unpredictable · llm-nano-vm = predictable but still flexible
The Problem with LLM Agents
| Prompting | LLM Agents | llm-nano-vm | |
|---|---|---|---|
| Execution guarantee | ❌ none | ❌ at model's discretion | ✅ enforced by VM |
| Step skipping possible | ✅ yes | ✅ yes | ❌ never |
| Reproducible trace | ❌ | ❌ | ✅ |
| Debuggable | ❌ | hard | full trace |
| Cost/latency visibility | ❌ | partial | per-step |
"LangChain cannot guarantee execution order. llm-nano-vm can."
Mental Model
nondeterminism ∈ Planner (1 LLM call, optional)
determinism ∈ ExecutionVM (FSM)
- Planner — LLM converts user intent → Program DSL
- Program — declarative workflow you define and version
- ExecutionVM — finite state machine; runs the program step by step
- Trace — full execution log: status, cost, tokens, duration per step
The LLM is a stateless worker. Control stays in your code.
FSM Transition Table
ExecutionVM is a finite state machine. The full δ-function:
| Current state | Step type | Outcome | Next state |
|---|---|---|---|
RUNNING |
llm |
success | RUNNING (advance to next step) |
RUNNING |
llm |
all retries exhausted | FAILED |
RUNNING |
tool |
success | RUNNING |
RUNNING |
tool |
returns "PENDING" sentinel |
SUSPENDED |
RUNNING |
tool |
error, on_error=fail |
FAILED |
RUNNING |
tool |
error, on_error=skip |
RUNNING (output=None) |
RUNNING |
condition |
branch taken | RUNNING (jump to then/otherwise) |
RUNNING |
condition |
no branch matches | FAILED |
RUNNING |
parallel |
all sub-steps done | RUNNING |
RUNNING |
any | max_steps exceeded |
BUDGET_EXCEEDED |
RUNNING |
any | max_tokens exceeded |
BUDGET_EXCEEDED |
RUNNING |
any | max_stalled_steps exceeded |
STALLED |
RUNNING |
— | no more steps | SUCCESS |
SUSPENDED |
— | resume_with_program() called |
RUNNING (from cursor) |
FAILED / SUCCESS / BUDGET_EXCEEDED / STALLED |
— | — | terminal |
Terminal states are absorbing — once reached, no further step is executed.
SUSPENDED is resumable — cursor is persisted; execution continues from the suspended step.
Install
pip install llm-nano-vm
pip install llm-nano-vm[litellm] # for built-in provider support
Quick Start — Guardrail That Never Skips
from nano_vm import ExecutionVM, Program
from nano_vm.adapters import LiteLLMAdapter
program = Program.from_dict({
"name": "customer_refund",
"steps": [
{
"id": "analyze",
"type": "llm",
"prompt": "Is this a valid refund request? Reply 'yes' or 'no'.\nRequest: $user_input",
"output_key": "decision",
},
{
"id": "guardrail", # ALWAYS runs — VM enforces it
"type": "condition",
"condition": "'yes' in '$decision'.lower()",
"then": "process_refund",
"otherwise": "reject",
},
{
"id": "process_refund",
"type": "tool",
"tool": "issue_refund",
},
{
"id": "reject",
"type": "tool",
"tool": "send_rejection",
},
],
})
vm = ExecutionVM(
llm=LiteLLMAdapter("openai/gpt-4o-mini"),
tools={"issue_refund": ..., "send_rejection": ...},
)
trace = await vm.run(program, context={"user_input": "I was charged twice"})
print(trace.status) # SUCCESS
print(trace.final_output) # tool result
print(trace.total_cost_usd()) # e.g. 0.000034
The guardrail step cannot be skipped, reordered, or overridden by the model.
suspend / resume via Webhook (v0.6.0)
For async workflows — payment confirmations, courier events, external approvals:
from nano_vm.vm import ExecutionVM, InMemoryCursorRepository
# Tool signals async wait via "PENDING" sentinel
async def initiate_payment(order_id: str) -> str:
await register_webhook_handler(order_id)
return "PENDING" # VM suspends here, persists cursor
vm = ExecutionVM(
llm=adapter,
cursor_repo=InMemoryCursorRepository(), # use SqliteCursorRepository in production
tools={"initiate_payment": initiate_payment, ...},
)
trace = await vm.run(program, context={"order_id": "123"})
assert trace.status == TraceStatus.SUSPENDED
# When webhook fires:
trace = await vm.resume_with_program(
program=program,
trace_id=trace.trace_id,
webhook_event={"type": "payment.confirmed", "order_id": "123"},
)
assert trace.status == TraceStatus.SUCCESS
InMemoryCursorRepository — tests and dry-run only.
Production: implement CursorRepository Protocol backed by infrastructure.db (SQLite WAL).
BudgetInterrupt (v0.6.0)
Budget exhaustion is a system interrupt, not a control-flow condition. The LLM cannot observe or influence it.
from nano_vm.vm import ExecutionVM, InterruptType
class InstrumentedVM(ExecutionVM):
async def _emit_interrupt(self, interrupt_type: InterruptType) -> None:
await notify_operator(f"interrupt: {interrupt_type.value}")
vm = InstrumentedVM(llm=adapter)
Override _emit_interrupt() via subclass (standard inheritance, no magic).
Base implementation is a no-op hook — documented, not silent.
How the DSL Controls Agent Behavior
LLM decides: WHAT to say, how to reason, what content to produce
DSL decides: WHICH step runs next, WHEN to branch, WHEN to stop
The LLM has no knowledge of the program structure. It receives a prompt and returns a string — nothing more.
| LLM | DSL (VM) | |
|---|---|---|
| Produce content | ✅ free | — |
| Skip a step | ❌ impossible | enforces every step |
| Reorder steps | ❌ impossible | order fixed at definition |
| Branch on output | ❌ cannot | condition step evaluates |
| Decide workflow is done | ❌ impossible | VM controls termination |
Program DSL
Four step types:
| Type | Purpose |
|---|---|
llm |
call the model; result stored in output_key |
tool |
call a Python function; return "PENDING" to suspend |
condition |
branch on an expression; then / otherwise |
parallel |
run independent sub-steps concurrently via asyncio.gather |
Step options:
| Option | Default | Description |
|---|---|---|
on_error |
fail |
fail · skip · retry |
max_retries |
3 |
total attempts; exponential backoff: 1s, 2s, 4s… cap 30s |
max_concurrency |
None |
parallel blocks only; None = no cap |
Program budget options:
| Option | Default | Description |
|---|---|---|
max_steps |
None |
BUDGET_EXCEEDED if exceeded |
max_stalled_steps |
None |
STALLED after N consecutive no-op steps |
max_tokens |
None |
BUDGET_EXCEEDED when total tokens ≥ limit; O(1) per step |
Variable interpolation
| Syntax | Resolves to |
|---|---|
$key |
value from initial context |
$step_id.output |
output of a previous step |
⚠ Security note — condition expressions:
conditionstrings are evaluated viaeval()with__builtins__cleared. This is a partial sandbox. Do not interpolate raw user input into condition expressions. LLM output used as a branching signal should only appear in context variables that your condition tests ('yes' in '$decision'), never as the condition expression itself.
Numeric context variables are injected directly — no string coercion needed for comparisons like$value > 0.9.
Testing — Deterministic by Design
from nano_vm import ExecutionVM, Program, TraceStatus
from nano_vm.adapters import MockLLMAdapter
vm = ExecutionVM(llm=MockLLMAdapter({"Classify": "SAFE", "__default__": "ok"}))
trace = await vm.run(program, context={"user_input": "refund"})
assert trace.status == TraceStatus.SUCCESS
assert [s.step_id for s in trace.steps] == ["classify", "route", "verify_eligibility", ...]
Same input → same step sequence. Always. No API key required.
Observability
trace.status # SUCCESS | FAILED | BUDGET_EXCEEDED | STALLED | SUSPENDED
trace.trace_id # UUID4 — stable for OTel propagation (v0.6.0)
trace.final_output
trace.total_tokens() # O(1) — incremental accumulator
trace.total_cost_usd()
trace.state_snapshots # list[(step_index, sha256_hex)]
trace.error
for step in trace.steps:
print(step.step_id, step.status, step.duration_ms, step.usage)
Performance
VM overhead is near-zero. Bottleneck in production: LLM API latency and DB I/O.
v0.6.0 — Stress test: 10 000 FSM graphs × 5 runs
System: Linux · x86_64 (2 cores) · Python 3.12
Test: 10 000 items × 5 deterministic runs, concurrency=200, Mock adapter
Run 1: 0.70 s 14 286 it/s 8973 OK / 1027 ERR
Run 2: 0.70 s 14 286 it/s 8973 OK / 1027 ERR
Run 3: 0.69 s 14 493 it/s 8973 OK / 1027 ERR
Run 4: 0.70 s 14 286 it/s 8973 OK / 1027 ERR
Run 5: 0.70 s 14 286 it/s 8973 OK / 1027 ERR
─────────────────────────────────────────────────
AVG: 0.70 s 14 327 it/s
Determinism: ✅ identical results across all 5 runs
Failure isolation: ✅ VMError caught per-coroutine, event loop unaffected
Error rate: 10.27% matches P(value > 0.9) = 0.1 exactly
v0.5.0 — Double-execution safety
Raw stateless agent: ~20% double-executions / 1000 runs
FSM runtime (vm.run): 0 double-executions / 3000 runs
v0.4.0 — Budget mechanism overhead
BM5 max_steps=1000 ±9.5% (within noise — single int check)
BM7 max_tokens fixed in v0.5.0: O(1) via _token_accumulator
v0.3.0 — 20 parallel steps via OpenRouter
Total: 1.7574 s · 20 steps · 11.38 steps/sec · VM overhead ~1.80 ms/step
Planner (Optional)
from nano_vm import Planner
planner = Planner(llm=adapter, max_retries=2, temperature=0.0)
program = await planner.generate(
"Fetch latest AI news, summarize, classify by topic",
available_tools=["fetch_rss", "summarize", "classify"],
)
trace = await vm.run(program)
Exactly 1 LLM call. Outputs a validated Program. Determinism confirmed (BM11).
Comparison
| LangChain | AutoGPT / CrewAI | Prefect / Airflow | llm-nano-vm | |
|---|---|---|---|---|
| Execution order | flexible | model-driven | enforced | enforced |
| Guardrails | prompt-level | prompt-level | task-level | VM-level |
| Async suspend/resume | ❌ | ❌ | native | ✅ v0.6.0 |
| Parallel execution | manual | model-driven | native | scoped, deterministic |
| Trace | partial | minimal | job logs | full, per-step + sub-step |
| Overhead | heavy | heavy | heavy | near-zero |
| Best for | flexible pipelines | autonomous tasks | data/ETL | compliance-grade LLM workflows |
When to Use
Use llm-nano-vm when:
- workflow structure is known in advance
- correctness and auditability matter (fintech, compliance, enterprise)
- you need async suspend/resume for webhook-driven flows
- you want guardrails enforced at the system level, not in the prompt
Do NOT use when:
- workflow is unknown and must be discovered at runtime
- task is open-ended creative reasoning
- you need fully autonomous multi-agent coordination
Roadmap
- FSM execution engine (v0.1)
-
llm / tool / conditionstep types - LiteLLM adapter + cost tracking
-
parallelsteps —asyncio.gather(v0.2.0) -
MockLLMAdapter— deterministic testing (v0.2.0) -
max_concurrency+retrypolicy per sub-step (v0.3.0) -
max_steps/max_stalled_steps/max_tokensbudget (v0.4.0) -
state_snapshots— sha256 per step (v0.4.0) -
Planner— intent → Program in 1 call (v0.5.0) -
total_tokens()O(1) via_token_accumulator(v0.5.0) - Double-execution safety: 0/3000 FSM vs ~20% stateless (v0.5.0)
-
suspend/resume_with_program()via"PENDING"sentinel (v0.6.0) -
BudgetInterrupt— isolated signal,_emit_interrupt()hook (v0.6.0) -
VaultStepResult+VaultStepMetadata— MCP-compatible contracts (v0.6.0) -
Trace.trace_idUUID4 — OTel propagation (v0.6.0) - MCP server —
run_program,get_trace, SQLite WAL, SSE + Bearer auth (nano-vm-mcp) -
SqliteCursorRepository— productionCursorRepositoryimplementation -
resume()— Blueprint registry lookup (P8 of nano-vm-vault) - REST API — pay-per-run, API keys (nano-vm-server)
💼 llm-nano-vm Pro
- 🆓 Core (this repo) — MIT, fully open-source
- 💼 Pro layer — planned commercial extensions
Planned Pro features:
- 📊 Visual execution graph (Trace UI)
- 🌐 Distributed multi-node execution
- 🔄 Provider pools & smart routing
- 🔐 Access control & multi-user support
- 📈 Cost analytics dashboard
Contact & Support
Author: @ale007xd on Telegram · @ale007xd on X
UQCakyytrEGBikOi3eYMpveGHXDB1-fd6lcuQC9VvKqMrI-9
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_nano_vm-0.6.0.tar.gz.
File metadata
- Download URL: llm_nano_vm-0.6.0.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4f75971b214b33090013eb12c93216a2d6f051dba4275d376e94bc6665a2d57
|
|
| MD5 |
b855f8449199f3c79fe1d962c1875665
|
|
| BLAKE2b-256 |
e7bab815b4e485afb61e185b9465e0d78e1082dbf1c6d148e26151765cb8a8a3
|
File details
Details for the file llm_nano_vm-0.6.0-py3-none-any.whl.
File metadata
- Download URL: llm_nano_vm-0.6.0-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8049e8a8756424677881b73fd9006ed66a4a65cdab2fc5dcab1842c3916d9ff
|
|
| MD5 |
c14610628ac333d1fd96c771526bf356
|
|
| BLAKE2b-256 |
996545e379eac853971e9425780ce0448eaee96a8dc33988e6d6eb023359a1e4
|