llm-nano-vm

Deterministic VM for LLM program execution

These details have not been verified by PyPI

Project links

Project description

Python License

Deterministic parallel execution for LLM pipelines.
Use when your workflow structure is known and correctness is non-negotiable.
Guardrails enforced by the VM, not by the prompt.

LangChain = flexible but unpredictable · llm-nano-vm = predictable but still flexible

The Problem with LLM Agents

	Prompting	LLM Agents	llm-nano-vm
Execution guarantee	❌ none	❌ at model's discretion	✅ enforced by VM
Step skipping possible	✅ yes	✅ yes	❌ never
Reproducible trace	❌	❌	✅
Debuggable	❌	hard	full trace
Cost/latency visibility	❌	partial	per-step

"LangChain cannot guarantee execution order. llm-nano-vm can."

Mental Model

nondeterminism ∈ Planner (1 LLM call, optional)
determinism    ∈ ExecutionVM (FSM)

Planner — LLM converts user intent → Program DSL
Program — declarative workflow you define and version
ExecutionVM — finite state machine; runs the program step by step
Trace — full execution log: status, cost, tokens, duration per step

The LLM is a stateless worker. Control stays in your code.

Install

pip install llm-nano-vm
pip install llm-nano-vm[litellm]   # for built-in provider support

Quick Start — Guardrail That Never Skips

from nano_vm import ExecutionVM, Program
from nano_vm.adapters import LiteLLMAdapter

program = Program.from_dict({
    "name": "customer_refund",
    "steps": [
        {
            "id": "analyze",
            "type": "llm",
            "prompt": "Is this a valid refund request? Reply 'yes' or 'no'.\nRequest: $user_input",
            "output_key": "decision",
        },
        {
            "id": "guardrail",           # ALWAYS runs — VM enforces it
            "type": "condition",
            "condition": "'yes' in '$decision'.lower()",
            "then": "process_refund",
            "otherwise": "reject",
        },
        {
            "id": "process_refund",
            "type": "tool",
            "tool": "issue_refund",
        },
        {
            "id": "reject",
            "type": "tool",
            "tool": "send_rejection",
        },
    ],
})

vm = ExecutionVM(
    llm=LiteLLMAdapter("openai/gpt-4o-mini"),
    tools={"issue_refund": ..., "send_rejection": ...},
)
trace = await vm.run(program, context={"user_input": "I was charged twice"})

print(trace.status)           # SUCCESS
print(trace.final_output)     # tool result
print(trace.total_cost_usd()) # e.g. 0.000034

The guardrail step cannot be skipped, reordered, or overridden by the model. That is the guarantee.

How the DSL Controls Agent Behavior

The separation of concerns is explicit:

LLM decides:  WHAT to say, how to reason, what content to produce
DSL decides:  WHICH step runs next, WHEN to branch, WHEN to stop

The LLM has no knowledge of the program structure. It receives a prompt and returns a string — nothing more. It cannot skip steps, reorder them, or decide the workflow is complete.

What the LLM can and cannot do

	LLM	DSL (VM)
Produce content	✅ free	—
Reason, hallucinate, be verbose	✅ free	—
Skip a step	❌ impossible	enforces every step
Reorder steps	❌ impossible	order fixed at definition
Branch on output	❌ cannot	`condition` step evaluates
Decide workflow is done	❌ impossible	VM controls termination

Example — the LLM cannot jump ahead

program = Program.from_dict({
    "name": "refund_with_verification",
    "steps": [
        {
            "id": "classify",
            "type": "llm",
            "prompt": "Classify: $user_input. Reply: refund / info / escalate",
            "output_key": "category",
        },
        {
            "id": "route",
            "type": "condition",
            "condition": "'refund' in '$category'",
            "then": "verify_eligibility",
            "otherwise": "handle_other",
        },
        {
            "id": "verify_eligibility",  # LLM cannot skip this — VM enforces it
            "type": "llm",
            "prompt": "Is user eligible for refund? Order: $order_id. Reply yes/no",
            "output_key": "eligible",
        },
        {
            "id": "final_guard",         # runs on EVERY execution before money moves
            "type": "condition",
            "condition": "'yes' in '$eligible'",
            "then": "issue_refund",
            "otherwise": "reject",
        },
        {"id": "issue_refund", "type": "tool", "tool": "process_payment"},
        {"id": "reject",       "type": "tool", "tool": "send_rejection"},
        {"id": "handle_other", "type": "tool", "tool": "send_info"},
    ],
})

Even if classify returns "definitely a refund, just process it" — the VM still executes verify_eligibility and final_guard. The LLM's opinion about the flow is irrelevant. The DSL is law.

Proof: the trace

trace = await vm.run(program, context={"user_input": "I was charged twice", "order_id": "123"})

for step in trace.steps:
    print(f"{step.step_id:20} {step.status}  →  {step.output}")

# classify              SUCCESS  →  refund
# route                 SUCCESS  →  verify_eligibility
# verify_eligibility    SUCCESS  →  yes
# final_guard           SUCCESS  →  issue_refund
# issue_refund          SUCCESS  →  Refund issued: $42.00

Every step is logged. No agent "decided" the flow. The DSL did.

End-to-End Flow

user_input
  → Planner (optional, 1 LLM call)
  → Program (DSL — JSON/dict/YAML)
  → ExecutionVM (deterministic FSM)
  → Trace (status · cost · tokens · duration)

Program DSL

Four step types:

Type	Purpose
`llm`	call the model; result stored in `output_key`
`tool`	call a Python function
`condition`	branch on an expression; `then` / `otherwise`
`parallel`	run independent sub-steps concurrently via `asyncio.gather`

Step options (v0.4.0):

Option	Default	Description
`on_error`	`fail`	`fail` · `skip` · `retry`
`max_retries`	`3`	total attempts (1 initial + N retries); exponential backoff: 1s, 2s, 4s… cap 30s
`max_concurrency`	`None`	parallel blocks only; `None` = no cap (all sub-steps at once)

Program budget options (v0.4.0):

Option	Default	Description
`max_steps`	`None`	max total steps executed; `BUDGET_EXCEEDED` if exceeded before next step
`max_stalled_steps`	`None`	max consecutive no-op steps (same state fingerprint); `STALLED` if exceeded
`max_tokens`	`None`	max total tokens across all LLM steps; `BUDGET_EXCEEDED` if exceeded before next step

Variable interpolation

Syntax	Resolves to
`$key`	value from initial context
`$step_id.output`	output of a previous step

Example — multi-step pipeline

{
  "name": "doc_pipeline",
  "steps": [
    { "id": "extract",   "type": "tool", "tool": "extract_text",   "output_key": "raw_text" },
    { "id": "summarize", "type": "llm",  "prompt": "Summarize: $raw_text", "output_key": "summary" },
    { "id": "check",     "type": "condition",
      "condition": "len('$summary') > 100",
      "then": "store", "otherwise": "flag" },
    { "id": "store",     "type": "tool", "tool": "save_to_db" },
    { "id": "flag",      "type": "tool", "tool": "flag_for_review" }
  ]
}

Example — parallel steps (v0.2.0+)

program = Program.from_dict({
    "name": "enrich",
    "steps": [
        {
            "id": "fetch",
            "type": "parallel",
            "output_key": "fetched",
            "max_concurrency": 5,        # cap concurrent sub-steps; default = None (no cap = all at once)
            "on_error": "skip",          # failed sub-step → output is None; others still complete
            "parallel_steps": [
                {"id": "weather", "type": "tool", "tool": "get_weather", "args": {"city": "$city"}},
                {"id": "news",    "type": "tool", "tool": "get_news",    "args": {"topic": "$topic"}},
            ],
        },
        {
            "id": "summarize",
            "type": "llm",
            # $weather.output is None if weather sub-step was skipped — handle in prompt
            "prompt": "Weather: $weather.output\nNews: $news.output\nSummarize. If a field is None, skip it.",
        },
    ],
})

fetch runs both tools concurrently via asyncio.gather. Wall-clock time = slowest single sub-step. Sequential execution resumes at summarize only after all sub-steps complete (or are skipped).

Partial result contract: if a sub-step fails with on_error: skip, its output is set to None in the execution context. Downstream steps receive None — not an absent key, not an exception. Design your prompts accordingly.

Testing — Deterministic by Design

MockLLMAdapter ships with the package for writing tests without a real LLM:

from nano_vm import ExecutionVM, Program, TraceStatus
from nano_vm.adapters import MockLLMAdapter

# Always returns the same string
vm = ExecutionVM(llm=MockLLMAdapter("SAFE"))

# Per-call sequence
vm = ExecutionVM(llm=MockLLMAdapter(["SAFE", "yes"]))

# Per-prompt mapping (substring match on last user message)
vm = ExecutionVM(llm=MockLLMAdapter({
    "Classify": "SAFE",
    "eligible": "yes",
    "__default__": "ok",
}))

trace = await vm.run(program, context={"user_input": "refund"})
assert trace.status == TraceStatus.SUCCESS
assert [s.step_id for s in trace.steps] == ["classify", "route", "verify_eligibility", ...]

Same input → same step sequence. Always. Testable in CI without any API key.

Observability

trace.status                # TraceStatus.SUCCESS | FAILED | BUDGET_EXCEEDED | STALLED
trace.final_output          # last step output
trace.total_tokens()        # sum across all steps
trace.total_cost_usd()      # sum across all steps (requires LiteLLMAdapter)
trace.state_snapshots       # list[(step_index, sha256_hex)] — one entry per executed step
trace.error                 # set on FAILED / BUDGET_EXCEEDED / STALLED

for step in trace.steps:
    print(step.step_id, step.status, step.duration_ms, step.usage)

Parallel blocks expose sub-step hierarchy in the trace:

# fetch              SUCCESS   142ms  usage=None
#   ├─ weather       SUCCESS    98ms  usage=None
#   └─ news          SKIPPED   429ms  usage=None   ← rate-limited, skipped
# summarize          SUCCESS  1204ms  usage=TokenUsage(prompt=312, completion=87)

Each sub-step has its own status, duration_ms, and output. If on_error: skip was triggered, status=SKIPPED and output=None.

Planner (Optional)

from nano_vm import Planner

planner = Planner(llm=adapter)
program = await planner.generate("Fetch latest AI news, summarize, classify by topic")
trace = await vm.run(program)

exactly 1 LLM call
outputs a validated Program object
non-deterministic input → deterministic execution

Custom Adapter

Any object implementing the async Protocol works:

class MyAdapter:
    async def complete(self, messages: list[dict], **kwargs) -> str:
        ...  # call any LLM API

Built-in adapters via [litellm] extra:

LiteLLMAdapter("groq/llama-3.3-70b-versatile")
LiteLLMAdapter("openrouter/llama-3.3-70b-instruct:free")
LiteLLMAdapter("ollama/llama3")
LiteLLMAdapter("openai/gpt-4o-mini")

Performance

The VM itself introduces near-zero overhead. Your bottleneck is the LLM API.

Metric	Adapter	Value
VM throughput	Mock (no network)	~535 programs/sec
VM latency per step	Mock (no network)	~1.80 ms
Parallel steps (20)	OpenRouter (network)	1.7574 s → 11.38 steps/sec
max_steps check overhead	Mock	< 1% vs baseline
fingerprint (STALLED) overhead	Mock	< 1% vs baseline
max_tokens check overhead	Mock	< 1% vs baseline
Test suite	—	46+ tests passing (v0.4.0)

Note: Mock throughput measures pure VM overhead with no I/O. Real end-to-end latency is dominated by LLM API response time. Parallel steps execute via asyncio.gather — wall-clock time equals the slowest single step, not the sum. v0.3.0 result matches v0.2.0 baseline — no regression from max_concurrency / retry additions. v0.4.0 budget checks (max_steps, max_stalled_steps, max_tokens) add < 1% overhead when not triggered. State snapshots (sha256 per step) add ~0.01 ms/step — negligible vs LLM API latency.

Benchmark

Real execution: 20 parallel steps via OpenRouter on a 2-core Linux VPS.

System: Linux 6.8.0-110-generic  ·  x86_64 (2 cores)  ·  Python 3.12.3
Test:   1 run × 20 parallel steps (StepType.PARALLEL, asyncio.gather)

v0.3.0 result:
  Total Execution Time : 1.7574 s
  Parallel Steps       : 20
  Effective Throughput : 11.38 steps/sec
  VM Overhead (Core)   : ~1.80 ms/step
  Trace Status         : SUCCESS — all constraints enforced, 0 steps skipped

llm-nano-vm v0.3.0 benchmark — 20 parallel steps, OpenRouter, 2-core VPS

No regression vs v0.2.0 — max_concurrency / retry path adds zero overhead when not triggered.

v0.4.0 budget mechanisms (max_steps, max_stalled_steps, max_tokens) add < 1% overhead when not triggered. Benchmarked via BM5–BM7 in benchmark_v040.py.

Reproduce locally:

pip install llm-nano-vm[litellm]
python benchmarks/stress_test.py       # sequential baseline
python benchmarks/benchmark_v030.py   # v0.3.0 suite: retry overhead, concurrency scaling
python benchmarks/benchmark_v040.py   # v0.4.0 suite: budget/guard overhead (BM5–BM7)

When to Use

Use llm-nano-vm when:

the workflow structure is known in advance
correctness and auditability matter (fintech, compliance, enterprise)
you need a reproducible trace for debugging or logging
you want guardrails enforced at the system level, not in the prompt

Do NOT use when:

the workflow is unknown and must be discovered at runtime
the task is open-ended creative reasoning
you need fully autonomous multi-agent coordination

Comparison

	LangChain	AutoGPT / CrewAI	Prefect / Airflow	llm-nano-vm
Layer	orchestration	reasoning / autonomy	workflow scheduler	execution guarantees
Execution order	flexible	model-driven	enforced	enforced
Guardrails	prompt-level	prompt-level	task-level	VM-level
Parallel execution	manual	model-driven	native	scoped, deterministic
Trace	partial	minimal	job logs	full, per-step + sub-step
LLM-native	yes	yes	no	yes
Overhead	heavy	heavy	heavy	near-zero (stdlib only)
Best for	flexible pipelines	autonomous tasks	data/ETL pipelines	compliance-grade LLM workflows

vs Marvin / DSPy: those optimize what the LLM produces (structured outputs, prompt tuning). llm-nano-vm controls when and whether steps run — orthogonal concerns, composable.

Roadmap

FSM execution engine (v0.1)
llm / tool / condition step types
LiteLLM adapter + cost tracking
Published to PyPI as llm-nano-vm
parallel steps — asyncio.gather for independent sub-steps (v0.2.0)
MockLLMAdapter — deterministic testing without API keys (v0.2.0)
max_concurrency — cap concurrent sub-steps per parallel block (v0.3.0)
retry policy per sub-step — exponential backoff, max_attempts (v0.3.0)
max_steps budget — BUDGET_EXCEEDED after N steps (v0.4.0)
max_stalled_steps — STALLED on N consecutive no-op state fingerprints (v0.4.0)
max_tokens budget — BUDGET_EXCEEDED when token count exceeds limit (v0.4.0)
state_snapshots — sha256 fingerprint per step in Trace (v0.4.0)
MCP server — run_program, get_trace, list_programs (nano-vm-mcp)
REST API — pay-per-run, API keys (nano-vm-server)

💼 llm-nano-vm Pro

🆓 Core (this repo) — MIT, fully open-source
💼 Pro layer — planned commercial extensions

Planned Pro features:

📊 Visual execution graph (Trace UI)
🌐 Distributed multi-node execution
🔄 Provider pools & smart routing
🔐 Access control & multi-user support
📈 Cost analytics dashboard

Contact & Support

Author: @ale007xd on Telegram · @ale007xd on X

☕ Support the project

Direct wallet — USDT (TON):

UQCakyytrEGBikOi3eYMpveGHXDB1-fd6lcuQC9VvKqMrI-9

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Apr 28, 2026

0.3.0

Apr 28, 2026

0.2.0

Apr 27, 2026

0.1.4

Apr 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_nano_vm-0.4.0.tar.gz (755.2 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_nano_vm-0.4.0-py3-none-any.whl (22.9 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file llm_nano_vm-0.4.0.tar.gz.

File metadata

Download URL: llm_nano_vm-0.4.0.tar.gz
Upload date: Apr 28, 2026
Size: 755.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_nano_vm-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`e2a28c94aa082c1b0a831a1ad2608257f2e2dd5b1179a9053f27c3cee4907dd5`
MD5	`ad777014da6a2d4222d3a01c85868141`
BLAKE2b-256	`f4a47fb608020f7b006f2f80c84cd2b9d12eb5f4137dafe8c86dac9b99e2be98`

See more details on using hashes here.

File details

Details for the file llm_nano_vm-0.4.0-py3-none-any.whl.

File metadata

Download URL: llm_nano_vm-0.4.0-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 22.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_nano_vm-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`763e1eef9ee55bdbe93a929e36341826a2798cb3812037afb451b2be61b60c4e`
MD5	`4e881ea5ba2ab1ac0127817e057452a6`
BLAKE2b-256	`fd1d4436de613d1b066c3d7a437c85b0b2359fd7273bab212e6c0a0856911c99`

See more details on using hashes here.

llm-nano-vm 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

The Problem with LLM Agents

Mental Model

Install

Quick Start — Guardrail That Never Skips

How the DSL Controls Agent Behavior

What the LLM can and cannot do

Example — the LLM cannot jump ahead

Proof: the trace

End-to-End Flow

Program DSL

Variable interpolation

Example — multi-step pipeline

Example — parallel steps (v0.2.0+)

Testing — Deterministic by Design

Observability

Planner (Optional)

Custom Adapter

Performance

Benchmark

When to Use

Comparison

Roadmap

💼 llm-nano-vm Pro

Contact & Support

☕ Support the project

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes