llm-nano-vm

Deterministic VM for LLM program execution

These details have not been verified by PyPI

Project links

Project description

Python License

Deterministic parallel execution for LLM pipelines.
Use when your workflow structure is known and correctness is non-negotiable.
Guardrails enforced by the VM, not by the prompt.

LangChain = flexible but unpredictable · llm-nano-vm = predictable but still flexible

The Problem with LLM Agents

	Prompting	LLM Agents	llm-nano-vm
Execution guarantee	❌ none	❌ at model's discretion	✅ enforced by VM
Step skipping possible	✅ yes	✅ yes	❌ never
Reproducible trace	❌	❌	✅
Debuggable	❌	hard	full trace
Cost/latency visibility	❌	partial	per-step

"LangChain cannot guarantee execution order. llm-nano-vm can."

Mental Model

nondeterminism ∈ Planner (1 LLM call, optional)
determinism    ∈ ExecutionVM (FSM)

Planner — LLM converts user intent → Program DSL
Program — declarative workflow you define and version
ExecutionVM — finite state machine; runs the program step by step
Trace — full execution log: status, cost, tokens, duration per step

The LLM is a stateless worker. Control stays in your code.

FSM Transition Table

ExecutionVM is a finite state machine. The full δ-function:

Current state	Step type	Outcome	Next state
`RUNNING`	`llm`	success	`RUNNING` (advance to next step)
`RUNNING`	`llm`	all retries exhausted	`FAILED`
`RUNNING`	`tool`	success	`RUNNING`
`RUNNING`	`tool`	returns `"PENDING"` sentinel	`SUSPENDED`
`RUNNING`	`tool`	error, `on_error=fail`	`FAILED`
`RUNNING`	`tool`	error, `on_error=skip`	`RUNNING` (output=`None`)
`RUNNING`	`condition`	branch taken	`RUNNING` (jump to `then`/`otherwise`)
`RUNNING`	`condition`	no branch matches	`FAILED`
`RUNNING`	`parallel`	all sub-steps done	`RUNNING`
`RUNNING`	any	`max_steps` exceeded	`BUDGET_EXCEEDED`
`RUNNING`	any	`max_tokens` exceeded	`BUDGET_EXCEEDED`
`RUNNING`	any	`max_stalled_steps` exceeded	`STALLED`
`RUNNING`	—	no more steps	`SUCCESS`
`SUSPENDED`	—	`resume_with_program()` called	`RUNNING` (from cursor)
`FAILED` / `SUCCESS` / `BUDGET_EXCEEDED` / `STALLED`	—	—	terminal

Terminal states are absorbing — once reached, no further step is executed. SUSPENDED is resumable — cursor is persisted; execution continues from the suspended step.

Install

pip install llm-nano-vm
pip install llm-nano-vm[litellm]   # for built-in provider support

Quick Start — Guardrail That Never Skips

from nano_vm import ExecutionVM, Program
from nano_vm.adapters import LiteLLMAdapter

program = Program.from_dict({
    "name": "customer_refund",
    "steps": [
        {
            "id": "analyze",
            "type": "llm",
            "prompt": "Is this a valid refund request? Reply 'yes' or 'no'.\nRequest: $user_input",
            "output_key": "decision",
        },
        {
            "id": "guardrail",           # ALWAYS runs — VM enforces it
            "type": "condition",
            "condition": "'yes' in '$decision'.lower()",
            "then": "process_refund",
            "otherwise": "reject",
        },
        {
            "id": "process_refund",
            "type": "tool",
            "tool": "issue_refund",
        },
        {
            "id": "reject",
            "type": "tool",
            "tool": "send_rejection",
        },
    ],
})

vm = ExecutionVM(
    llm=LiteLLMAdapter("openai/gpt-4o-mini"),
    tools={"issue_refund": ..., "send_rejection": ...},
)
trace = await vm.run(program, context={"user_input": "I was charged twice"})

print(trace.status)           # SUCCESS
print(trace.final_output)     # tool result
print(trace.total_cost_usd()) # e.g. 0.000034

The guardrail step cannot be skipped, reordered, or overridden by the model.

suspend / resume via Webhook (v0.6.0)

For async workflows — payment confirmations, courier events, external approvals:

from nano_vm.vm import ExecutionVM, InMemoryCursorRepository

# Tool signals async wait via "PENDING" sentinel
async def initiate_payment(order_id: str) -> str:
    await register_webhook_handler(order_id)
    return "PENDING"   # VM suspends here, persists cursor

vm = ExecutionVM(
    llm=adapter,
    cursor_repo=InMemoryCursorRepository(),  # use SqliteCursorRepository in production
    tools={"initiate_payment": initiate_payment, ...},
)

trace = await vm.run(program, context={"order_id": "123"})
assert trace.status == TraceStatus.SUSPENDED

# When webhook fires:
trace = await vm.resume_with_program(
    program=program,
    trace_id=trace.trace_id,
    webhook_event={"type": "payment.confirmed", "order_id": "123"},
)
assert trace.status == TraceStatus.SUCCESS

InMemoryCursorRepository — tests and dry-run only. Production: implement CursorRepository Protocol backed by infrastructure.db (SQLite WAL).

BudgetInterrupt (v0.6.0)

Budget exhaustion is a system interrupt, not a control-flow condition. The LLM cannot observe or influence it.

from nano_vm.vm import ExecutionVM, InterruptType

class InstrumentedVM(ExecutionVM):
    async def _emit_interrupt(self, interrupt_type: InterruptType) -> None:
        await notify_operator(f"interrupt: {interrupt_type.value}")

vm = InstrumentedVM(llm=adapter)

Override _emit_interrupt() via subclass (standard inheritance, no magic). Base implementation is a no-op hook — documented, not silent.

How the DSL Controls Agent Behavior

LLM decides:  WHAT to say, how to reason, what content to produce
DSL decides:  WHICH step runs next, WHEN to branch, WHEN to stop

The LLM has no knowledge of the program structure. It receives a prompt and returns a string — nothing more.

	LLM	DSL (VM)
Produce content	✅ free	—
Skip a step	❌ impossible	enforces every step
Reorder steps	❌ impossible	order fixed at definition
Branch on output	❌ cannot	`condition` step evaluates
Decide workflow is done	❌ impossible	VM controls termination

Program DSL

Four step types:

Type	Purpose
`llm`	call the model; result stored in `output_key`
`tool`	call a Python function; return `"PENDING"` to suspend
`condition`	branch on an expression; `then` / `otherwise`
`parallel`	run independent sub-steps concurrently via `asyncio.gather`

Step options:

Option	Default	Description
`on_error`	`fail`	`fail` · `skip` · `retry`
`max_retries`	`3`	total attempts; exponential backoff: 1s, 2s, 4s… cap 30s
`max_concurrency`	`None`	parallel blocks only; `None` = no cap

Program budget options:

Option	Default	Description
`max_steps`	`None`	`BUDGET_EXCEEDED` if exceeded
`max_stalled_steps`	`None`	`STALLED` after N consecutive no-op steps
`max_tokens`	`None`	`BUDGET_EXCEEDED` when total tokens ≥ limit; O(1) per step

Variable interpolation

Syntax	Resolves to
`$key`	value from initial context
`$step_id.output`	output of a previous step

⚠ Security note — condition expressions:
condition strings are evaluated via eval() with __builtins__ cleared. This is a partial sandbox. Do not interpolate raw user input into condition expressions. LLM output used as a branching signal should only appear in context variables that your condition tests ('yes' in '$decision'), never as the condition expression itself.
Numeric context variables are injected directly — no string coercion needed for comparisons like $value > 0.9.

Testing — Deterministic by Design

from nano_vm import ExecutionVM, Program, TraceStatus
from nano_vm.adapters import MockLLMAdapter

vm = ExecutionVM(llm=MockLLMAdapter({"Classify": "SAFE", "__default__": "ok"}))

trace = await vm.run(program, context={"user_input": "refund"})
assert trace.status == TraceStatus.SUCCESS
assert [s.step_id for s in trace.steps] == ["classify", "route", "verify_eligibility", ...]

Same input → same step sequence. Always. No API key required.

Observability

trace.status                # SUCCESS | FAILED | BUDGET_EXCEEDED | STALLED | SUSPENDED
trace.trace_id              # UUID4 — stable for OTel propagation (v0.6.0)
trace.final_output
trace.total_tokens()        # O(1) — incremental accumulator
trace.total_cost_usd()
trace.state_snapshots       # list[(step_index, sha256_hex)]
trace.error

for step in trace.steps:
    print(step.step_id, step.status, step.duration_ms, step.usage)

Performance

VM overhead is near-zero. Bottleneck in production: LLM API latency and DB I/O.

v0.6.0 — Stress test: 10 000 FSM graphs × 5 runs

System: Linux · x86_64 (2 cores) · Python 3.12
Test:   10 000 items × 5 deterministic runs, concurrency=200, Mock adapter

  Run 1:  0.70 s  14 286 it/s   8973 OK / 1027 ERR
  Run 2:  0.70 s  14 286 it/s   8973 OK / 1027 ERR
  Run 3:  0.69 s  14 493 it/s   8973 OK / 1027 ERR
  Run 4:  0.70 s  14 286 it/s   8973 OK / 1027 ERR
  Run 5:  0.70 s  14 286 it/s   8973 OK / 1027 ERR
  ─────────────────────────────────────────────────
  AVG:    0.70 s  14 327 it/s

  Determinism:       ✅ identical results across all 5 runs
  Failure isolation: ✅ VMError caught per-coroutine, event loop unaffected
  Error rate:        10.27% matches P(value > 0.9) = 0.1 exactly

v0.5.0 — Double-execution safety

  Raw stateless agent:   ~20% double-executions / 1000 runs
  FSM runtime (vm.run):  0 double-executions / 3000 runs

v0.4.0 — Budget mechanism overhead

  BM5  max_steps=1000   ±9.5%  (within noise — single int check)
  BM7  max_tokens       fixed in v0.5.0: O(1) via _token_accumulator

v0.3.0 — 20 parallel steps via OpenRouter

  Total: 1.7574 s · 20 steps · 11.38 steps/sec · VM overhead ~1.80 ms/step

Planner (Optional)

from nano_vm import Planner

planner = Planner(llm=adapter, max_retries=2, temperature=0.0)
program = await planner.generate(
    "Fetch latest AI news, summarize, classify by topic",
    available_tools=["fetch_rss", "summarize", "classify"],
)
trace = await vm.run(program)

Exactly 1 LLM call. Outputs a validated Program. Determinism confirmed (BM11).

Comparison

	LangChain	AutoGPT / CrewAI	Prefect / Airflow	llm-nano-vm
Execution order	flexible	model-driven	enforced	enforced
Guardrails	prompt-level	prompt-level	task-level	VM-level
Async suspend/resume	❌	❌	native	✅ v0.6.0
Parallel execution	manual	model-driven	native	scoped, deterministic
Trace	partial	minimal	job logs	full, per-step + sub-step
Overhead	heavy	heavy	heavy	near-zero
Best for	flexible pipelines	autonomous tasks	data/ETL	compliance-grade LLM workflows

When to Use

Use llm-nano-vm when:

workflow structure is known in advance
correctness and auditability matter (fintech, compliance, enterprise)
you need async suspend/resume for webhook-driven flows
you want guardrails enforced at the system level, not in the prompt

Do NOT use when:

workflow is unknown and must be discovered at runtime
task is open-ended creative reasoning
you need fully autonomous multi-agent coordination

Roadmap

FSM execution engine (v0.1)
llm / tool / condition step types
LiteLLM adapter + cost tracking
parallel steps — asyncio.gather (v0.2.0)
MockLLMAdapter — deterministic testing (v0.2.0)
max_concurrency + retry policy per sub-step (v0.3.0)
max_steps / max_stalled_steps / max_tokens budget (v0.4.0)
state_snapshots — sha256 per step (v0.4.0)
Planner — intent → Program in 1 call (v0.5.0)
total_tokens() O(1) via _token_accumulator (v0.5.0)
Double-execution safety: 0/3000 FSM vs ~20% stateless (v0.5.0)
suspend / resume_with_program() via "PENDING" sentinel (v0.6.0)
BudgetInterrupt — isolated signal, _emit_interrupt() hook (v0.6.0)
VaultStepResult + VaultStepMetadata — MCP-compatible contracts (v0.6.0)
Trace.trace_id UUID4 — OTel propagation (v0.6.0)
MCP server — run_program, get_trace, SQLite WAL, SSE + Bearer auth (nano-vm-mcp)
SqliteCursorRepository — production CursorRepository implementation
resume() — Blueprint registry lookup (P8 of nano-vm-vault)
REST API — pay-per-run, API keys (nano-vm-server)

💼 llm-nano-vm Pro

🆓 Core (this repo) — MIT, fully open-source
💼 Pro layer — planned commercial extensions

Planned Pro features:

📊 Visual execution graph (Trace UI)
🌐 Distributed multi-node execution
🔄 Provider pools & smart routing
🔐 Access control & multi-user support
📈 Cost analytics dashboard

Contact & Support

Author: @ale007xd on Telegram · @ale007xd on X

UQCakyytrEGBikOi3eYMpveGHXDB1-fd6lcuQC9VvKqMrI-9

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.6

Jun 11, 2026

0.8.5

Jun 9, 2026

0.8.4

Jun 4, 2026

0.8.3

Jun 3, 2026

0.8.2

May 29, 2026

0.8.0

May 22, 2026

0.7.5

May 18, 2026

0.7.4

May 16, 2026

0.7.3

May 14, 2026

This version

0.6.0

May 3, 2026

0.5.0

Apr 30, 2026

0.4.0

Apr 28, 2026

0.3.0

Apr 28, 2026

0.2.0

Apr 27, 2026

0.1.4

Apr 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_nano_vm-0.6.0.tar.gz (1.1 MB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_nano_vm-0.6.0-py3-none-any.whl (27.8 kB view details)

Uploaded May 3, 2026 Python 3

File details

Details for the file llm_nano_vm-0.6.0.tar.gz.

File metadata

Download URL: llm_nano_vm-0.6.0.tar.gz
Upload date: May 3, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_nano_vm-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`c4f75971b214b33090013eb12c93216a2d6f051dba4275d376e94bc6665a2d57`
MD5	`b855f8449199f3c79fe1d962c1875665`
BLAKE2b-256	`e7bab815b4e485afb61e185b9465e0d78e1082dbf1c6d148e26151765cb8a8a3`

See more details on using hashes here.

File details

Details for the file llm_nano_vm-0.6.0-py3-none-any.whl.

File metadata

Download URL: llm_nano_vm-0.6.0-py3-none-any.whl
Upload date: May 3, 2026
Size: 27.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_nano_vm-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d8049e8a8756424677881b73fd9006ed66a4a65cdab2fc5dcab1842c3916d9ff`
MD5	`c14610628ac333d1fd96c771526bf356`
BLAKE2b-256	`996545e379eac853971e9425780ce0448eaee96a8dc33988e6d6eb023359a1e4`

See more details on using hashes here.

llm-nano-vm 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

The Problem with LLM Agents

Mental Model

FSM Transition Table

Install

Quick Start — Guardrail That Never Skips

suspend / resume via Webhook (v0.6.0)

BudgetInterrupt (v0.6.0)

How the DSL Controls Agent Behavior

Program DSL

Variable interpolation

Testing — Deterministic by Design

Observability

Performance

v0.6.0 — Stress test: 10 000 FSM graphs × 5 runs

v0.5.0 — Double-execution safety

v0.4.0 — Budget mechanism overhead

v0.3.0 — 20 parallel steps via OpenRouter

Planner (Optional)

Comparison

When to Use

Roadmap

💼 llm-nano-vm Pro

Contact & Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes