Deterministic VM for LLM program execution
Project description
Deterministic parallel execution for LLM pipelines.
Use when your workflow structure is known and correctness is non-negotiable.
Guardrails enforced by the VM, not by the prompt.
LangChain = flexible but unpredictable · llm-nano-vm = predictable but still flexible
The Problem with LLM Agents
| Prompting | LLM Agents | llm-nano-vm | |
|---|---|---|---|
| Execution guarantee | ❌ none | ❌ at model's discretion | ✅ enforced by VM |
| Step skipping possible | ✅ yes | ✅ yes | ❌ never |
| Reproducible trace | ❌ | ❌ | ✅ |
| Debuggable | ❌ | hard | full trace |
| Cost/latency visibility | ❌ | partial | per-step |
"LangChain cannot guarantee execution order. llm-nano-vm can."
Mental Model
nondeterminism ∈ Planner (1 LLM call, optional)
determinism ∈ ExecutionVM (FSM)
- Planner — LLM converts user intent → Program DSL
- Program — declarative workflow you define and version
- ExecutionVM — finite state machine; runs the program step by step
- Trace — full execution log: status, cost, tokens, duration per step
The LLM is a stateless worker. Control stays in your code.
Install
pip install llm-nano-vm
pip install llm-nano-vm[litellm] # for built-in provider support
Quick Start — Guardrail That Never Skips
from nano_vm import ExecutionVM, Program
from nano_vm.adapters import LiteLLMAdapter
program = Program.from_dict({
"name": "customer_refund",
"steps": [
{
"id": "analyze",
"type": "llm",
"prompt": "Is this a valid refund request? Reply 'yes' or 'no'.\nRequest: $user_input",
"output_key": "decision",
},
{
"id": "guardrail", # ALWAYS runs — VM enforces it
"type": "condition",
"condition": "'yes' in '$decision'.lower()",
"then": "process_refund",
"otherwise": "reject",
},
{
"id": "process_refund",
"type": "tool",
"tool": "issue_refund",
},
{
"id": "reject",
"type": "tool",
"tool": "send_rejection",
},
],
})
vm = ExecutionVM(
llm=LiteLLMAdapter("openai/gpt-4o-mini"),
tools={"issue_refund": ..., "send_rejection": ...},
)
trace = await vm.run(program, context={"user_input": "I was charged twice"})
print(trace.status) # SUCCESS
print(trace.final_output) # tool result
print(trace.total_cost_usd()) # e.g. 0.000034
The guardrail step cannot be skipped, reordered, or overridden by the model.
That is the guarantee.
How the DSL Controls Agent Behavior
The separation of concerns is explicit:
LLM decides: WHAT to say, how to reason, what content to produce
DSL decides: WHICH step runs next, WHEN to branch, WHEN to stop
The LLM has no knowledge of the program structure. It receives a prompt and returns a string — nothing more. It cannot skip steps, reorder them, or decide the workflow is complete.
What the LLM can and cannot do
| LLM | DSL (VM) | |
|---|---|---|
| Produce content | ✅ free | — |
| Reason, hallucinate, be verbose | ✅ free | — |
| Skip a step | ❌ impossible | enforces every step |
| Reorder steps | ❌ impossible | order fixed at definition |
| Branch on output | ❌ cannot | condition step evaluates |
| Decide workflow is done | ❌ impossible | VM controls termination |
Example — the LLM cannot jump ahead
program = Program.from_dict({
"name": "refund_with_verification",
"steps": [
{
"id": "classify",
"type": "llm",
"prompt": "Classify: $user_input. Reply: refund / info / escalate",
"output_key": "category",
},
{
"id": "route",
"type": "condition",
"condition": "'refund' in '$category'",
"then": "verify_eligibility",
"otherwise": "handle_other",
},
{
"id": "verify_eligibility", # LLM cannot skip this — VM enforces it
"type": "llm",
"prompt": "Is user eligible for refund? Order: $order_id. Reply yes/no",
"output_key": "eligible",
},
{
"id": "final_guard", # runs on EVERY execution before money moves
"type": "condition",
"condition": "'yes' in '$eligible'",
"then": "issue_refund",
"otherwise": "reject",
},
{"id": "issue_refund", "type": "tool", "tool": "process_payment"},
{"id": "reject", "type": "tool", "tool": "send_rejection"},
{"id": "handle_other", "type": "tool", "tool": "send_info"},
],
})
Even if classify returns "definitely a refund, just process it" —
the VM still executes verify_eligibility and final_guard.
The LLM's opinion about the flow is irrelevant. The DSL is law.
Proof: the trace
trace = await vm.run(program, context={"user_input": "I was charged twice", "order_id": "123"})
for step in trace.steps:
print(f"{step.step_id:20} {step.status} → {step.output}")
# classify SUCCESS → refund
# route SUCCESS → verify_eligibility
# verify_eligibility SUCCESS → yes
# final_guard SUCCESS → issue_refund
# issue_refund SUCCESS → Refund issued: $42.00
Every step is logged. No agent "decided" the flow. The DSL did.
End-to-End Flow
user_input
→ Planner (optional, 1 LLM call)
→ Program (DSL — JSON/dict/YAML)
→ ExecutionVM (deterministic FSM)
→ Trace (status · cost · tokens · duration)
Program DSL
Four step types:
| Type | Purpose |
|---|---|
llm |
call the model; result stored in output_key |
tool |
call a Python function |
condition |
branch on an expression; then / otherwise |
parallel |
run independent sub-steps concurrently via asyncio.gather |
Step options (v0.4.0):
| Option | Default | Description |
|---|---|---|
on_error |
fail |
fail · skip · retry |
max_retries |
3 |
total attempts (1 initial + N retries); exponential backoff: 1s, 2s, 4s… cap 30s |
max_concurrency |
None |
parallel blocks only; None = no cap (all sub-steps at once) |
Program budget options (v0.4.0):
| Option | Default | Description |
|---|---|---|
max_steps |
None |
max total steps executed; BUDGET_EXCEEDED if exceeded before next step |
max_stalled_steps |
None |
max consecutive no-op steps (same state fingerprint); STALLED if exceeded |
max_tokens |
None |
max total tokens across all LLM steps; BUDGET_EXCEEDED if exceeded before next step |
Variable interpolation
| Syntax | Resolves to |
|---|---|
$key |
value from initial context |
$step_id.output |
output of a previous step |
Example — multi-step pipeline
{
"name": "doc_pipeline",
"steps": [
{ "id": "extract", "type": "tool", "tool": "extract_text", "output_key": "raw_text" },
{ "id": "summarize", "type": "llm", "prompt": "Summarize: $raw_text", "output_key": "summary" },
{ "id": "check", "type": "condition",
"condition": "len('$summary') > 100",
"then": "store", "otherwise": "flag" },
{ "id": "store", "type": "tool", "tool": "save_to_db" },
{ "id": "flag", "type": "tool", "tool": "flag_for_review" }
]
}
Example — parallel steps (v0.2.0+)
program = Program.from_dict({
"name": "enrich",
"steps": [
{
"id": "fetch",
"type": "parallel",
"output_key": "fetched",
"max_concurrency": 5, # cap concurrent sub-steps; default = None (no cap = all at once)
"on_error": "skip", # failed sub-step → output is None; others still complete
"parallel_steps": [
{"id": "weather", "type": "tool", "tool": "get_weather", "args": {"city": "$city"}},
{"id": "news", "type": "tool", "tool": "get_news", "args": {"topic": "$topic"}},
],
},
{
"id": "summarize",
"type": "llm",
# $weather.output is None if weather sub-step was skipped — handle in prompt
"prompt": "Weather: $weather.output\nNews: $news.output\nSummarize. If a field is None, skip it.",
},
],
})
fetch runs both tools concurrently via asyncio.gather. Wall-clock time = slowest single sub-step.
Sequential execution resumes at summarize only after all sub-steps complete (or are skipped).
Partial result contract: if a sub-step fails with on_error: skip, its output is set to None in the execution context. Downstream steps receive None — not an absent key, not an exception. Design your prompts accordingly.
Testing — Deterministic by Design
MockLLMAdapter ships with the package for writing tests without a real LLM:
from nano_vm import ExecutionVM, Program, TraceStatus
from nano_vm.adapters import MockLLMAdapter
# Always returns the same string
vm = ExecutionVM(llm=MockLLMAdapter("SAFE"))
# Per-call sequence
vm = ExecutionVM(llm=MockLLMAdapter(["SAFE", "yes"]))
# Per-prompt mapping (substring match on last user message)
vm = ExecutionVM(llm=MockLLMAdapter({
"Classify": "SAFE",
"eligible": "yes",
"__default__": "ok",
}))
trace = await vm.run(program, context={"user_input": "refund"})
assert trace.status == TraceStatus.SUCCESS
assert [s.step_id for s in trace.steps] == ["classify", "route", "verify_eligibility", ...]
Same input → same step sequence. Always. Testable in CI without any API key.
Observability
trace.status # TraceStatus.SUCCESS | FAILED | BUDGET_EXCEEDED | STALLED
trace.final_output # last step output
trace.total_tokens() # sum across all steps
trace.total_cost_usd() # sum across all steps (requires LiteLLMAdapter)
trace.state_snapshots # list[(step_index, sha256_hex)] — one entry per executed step
trace.error # set on FAILED / BUDGET_EXCEEDED / STALLED
for step in trace.steps:
print(step.step_id, step.status, step.duration_ms, step.usage)
Parallel blocks expose sub-step hierarchy in the trace:
# fetch SUCCESS 142ms usage=None
# ├─ weather SUCCESS 98ms usage=None
# └─ news SKIPPED 429ms usage=None ← rate-limited, skipped
# summarize SUCCESS 1204ms usage=TokenUsage(prompt=312, completion=87)
Each sub-step has its own status, duration_ms, and output. If on_error: skip was triggered, status=SKIPPED and output=None.
Planner (Optional)
from nano_vm import Planner
planner = Planner(llm=adapter)
program = await planner.generate("Fetch latest AI news, summarize, classify by topic")
trace = await vm.run(program)
- exactly 1 LLM call
- outputs a validated Program object
- non-deterministic input → deterministic execution
Custom Adapter
Any object implementing the async Protocol works:
class MyAdapter:
async def complete(self, messages: list[dict], **kwargs) -> str:
... # call any LLM API
Built-in adapters via [litellm] extra:
LiteLLMAdapter("groq/llama-3.3-70b-versatile")
LiteLLMAdapter("openrouter/llama-3.3-70b-instruct:free")
LiteLLMAdapter("ollama/llama3")
LiteLLMAdapter("openai/gpt-4o-mini")
Performance
The VM itself introduces near-zero overhead. Your bottleneck is the LLM API.
| Metric | Adapter | Value |
|---|---|---|
| VM throughput | Mock (no network) | ~535 programs/sec |
| VM latency per step | Mock (no network) | ~1.80 ms |
| Parallel steps (20) | OpenRouter (network) | 1.7574 s → 11.38 steps/sec |
| max_steps check overhead | Mock | < 1% vs baseline |
| fingerprint (STALLED) overhead | Mock | < 1% vs baseline |
| max_tokens check overhead | Mock | < 1% vs baseline |
| Test suite | — | 46+ tests passing (v0.4.0) |
Note: Mock throughput measures pure VM overhead with no I/O. Real end-to-end latency is dominated by LLM API response time. Parallel steps execute via
asyncio.gather— wall-clock time equals the slowest single step, not the sum. v0.3.0 result matches v0.2.0 baseline — no regression frommax_concurrency/retryadditions. v0.4.0 budget checks (max_steps, max_stalled_steps, max_tokens) add < 1% overhead when not triggered. State snapshots (sha256 per step) add ~0.01 ms/step — negligible vs LLM API latency.
Benchmark
Real execution: 20 parallel steps via OpenRouter on a 2-core Linux VPS.
System: Linux 6.8.0-110-generic · x86_64 (2 cores) · Python 3.12.3
Test: 1 run × 20 parallel steps (StepType.PARALLEL, asyncio.gather)
v0.3.0 result:
Total Execution Time : 1.7574 s
Parallel Steps : 20
Effective Throughput : 11.38 steps/sec
VM Overhead (Core) : ~1.80 ms/step
Trace Status : SUCCESS — all constraints enforced, 0 steps skipped
No regression vs v0.2.0 — max_concurrency / retry path adds zero overhead when not triggered.
v0.4.0 budget mechanisms (max_steps, max_stalled_steps, max_tokens) add < 1% overhead
when not triggered. Benchmarked via BM5–BM7 in benchmark_v040.py.
Reproduce locally:
pip install llm-nano-vm[litellm]
python benchmarks/stress_test.py # sequential baseline
python benchmarks/benchmark_v030.py # v0.3.0 suite: retry overhead, concurrency scaling
python benchmarks/benchmark_v040.py # v0.4.0 suite: budget/guard overhead (BM5–BM7)
When to Use
Use llm-nano-vm when:
- the workflow structure is known in advance
- correctness and auditability matter (fintech, compliance, enterprise)
- you need a reproducible trace for debugging or logging
- you want guardrails enforced at the system level, not in the prompt
Do NOT use when:
- the workflow is unknown and must be discovered at runtime
- the task is open-ended creative reasoning
- you need fully autonomous multi-agent coordination
Comparison
| LangChain | AutoGPT / CrewAI | Prefect / Airflow | llm-nano-vm | |
|---|---|---|---|---|
| Layer | orchestration | reasoning / autonomy | workflow scheduler | execution guarantees |
| Execution order | flexible | model-driven | enforced | enforced |
| Guardrails | prompt-level | prompt-level | task-level | VM-level |
| Parallel execution | manual | model-driven | native | scoped, deterministic |
| Trace | partial | minimal | job logs | full, per-step + sub-step |
| LLM-native | yes | yes | no | yes |
| Overhead | heavy | heavy | heavy | near-zero (stdlib only) |
| Best for | flexible pipelines | autonomous tasks | data/ETL pipelines | compliance-grade LLM workflows |
vs Marvin / DSPy: those optimize what the LLM produces (structured outputs, prompt tuning). llm-nano-vm controls when and whether steps run — orthogonal concerns, composable.
Roadmap
- FSM execution engine (v0.1)
-
llm / tool / conditionstep types - LiteLLM adapter + cost tracking
- Published to PyPI as
llm-nano-vm -
parallelsteps —asyncio.gatherfor independent sub-steps (v0.2.0) -
MockLLMAdapter— deterministic testing without API keys (v0.2.0) -
max_concurrency— cap concurrent sub-steps per parallel block (v0.3.0) -
retrypolicy per sub-step — exponential backoff, max_attempts (v0.3.0) -
max_stepsbudget — BUDGET_EXCEEDED after N steps (v0.4.0) -
max_stalled_steps— STALLED on N consecutive no-op state fingerprints (v0.4.0) -
max_tokensbudget — BUDGET_EXCEEDED when token count exceeds limit (v0.4.0) -
state_snapshots— sha256 fingerprint per step in Trace (v0.4.0) - MCP server —
run_program,get_trace,list_programs(nano-vm-mcp) - REST API — pay-per-run, API keys (nano-vm-server)
💼 llm-nano-vm Pro
- 🆓 Core (this repo) — MIT, fully open-source
- 💼 Pro layer — planned commercial extensions
Planned Pro features:
- 📊 Visual execution graph (Trace UI)
- 🌐 Distributed multi-node execution
- 🔄 Provider pools & smart routing
- 🔐 Access control & multi-user support
- 📈 Cost analytics dashboard
Contact & Support
Author: @ale007xd on Telegram · @ale007xd on X
☕ Support the project
Direct wallet — USDT (TON):
UQCakyytrEGBikOi3eYMpveGHXDB1-fd6lcuQC9VvKqMrI-9
License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_nano_vm-0.4.0.tar.gz.
File metadata
- Download URL: llm_nano_vm-0.4.0.tar.gz
- Upload date:
- Size: 755.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2a28c94aa082c1b0a831a1ad2608257f2e2dd5b1179a9053f27c3cee4907dd5
|
|
| MD5 |
ad777014da6a2d4222d3a01c85868141
|
|
| BLAKE2b-256 |
f4a47fb608020f7b006f2f80c84cd2b9d12eb5f4137dafe8c86dac9b99e2be98
|
File details
Details for the file llm_nano_vm-0.4.0-py3-none-any.whl.
File metadata
- Download URL: llm_nano_vm-0.4.0-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
763e1eef9ee55bdbe93a929e36341826a2798cb3812037afb451b2be61b60c4e
|
|
| MD5 |
4e881ea5ba2ab1ac0127817e057452a6
|
|
| BLAKE2b-256 |
fd1d4436de613d1b066c3d7a437c85b0b2359fd7273bab212e6c0a0856911c99
|