Durable checkpoint/resume runner for async state-machine loops built on loom-tailcalls.
Project description
loom-runner
Small durable checkpoint/resume runner for async state-machine loops built on
top of loom-tailcalls and flow-xray. Full stack overview: kroq86.github.io/loom-stack
Official showcases: loom-run (dev chat + MCP) · loom-ops (ops runbooks + HITL) — both wire runner + flow-xray; pick by domain.
This is not a planner, memory system, graph DSL, hosted tracing product, or full agent SDK. It is the first slice of a Loom-based agent runtime: run a typed async transition loop, checkpoint each state transition, resume later, inspect history, and explain a run.
Loom stack
Overview: kroq86.github.io/loom-stack — packages, flow, audience, quick start.
The stack is a pyramid, not five equal frameworks. Tail-call optimization is the primitive, runner is the durable runtime, xray is the microscope, and the apps prove the stack in real workflows.
| Layer | Project | Job |
|---|---|---|
| Primitive | loom-tailcalls | Make async recursive/state-machine loops stack-safe |
| Runtime kernel | loom-runner ← this repo | Make those loops durable, resumable, idempotent |
| Microscope | flow-xray | Show what actually happened in one offline HTML trace |
| Proof app | loom-run | Chat agent reference implementation |
| Proof app | loom-ops | Ops/runbook agent reference implementation |
@tailrec agent loop → loom-runner run/resume → --trace trace.html
(shape) (durability) (flow-xray)
This repo is the runtime kernel. loom-runner is the library package and
CLI for durable execution. loom-run is a runnable chat showcase built on it;
the names are close, but the layer is different.
Dependency direction: loom-runner depends on loom-tailcalls and optionally
emits flow-xray traces. loom-run and loom-ops depend on loom-runner; the
kernel never depends on the apps.
Who it is for
- Authors of long-running async agent loops who need checkpoint/resume without building their own store
- Users of loom-tailcalls who want persistence and CLI inspection on top of stack-safe transitions
- Users of flow-xray who want
--trace trace.htmlfrom the runner CLI - Anyone who needs an inspectable run (
explain,history,attempts,tool-calls) rather than a black box
Not for you if the agent is a single LLM call, or you already have LangGraph/Temporal (or similar) with persistence you are happy with.
This is not reasoning, planning, memory, or a path to AGI — it is a durability + observability primitive for state-machine-shaped agent runtimes.
Runtime transitions are logged as logical steps with attempt history. A retry
does not create a new transition: for the same run_id, step_index, and
stable input hash, the runner reuses the committed outcome. Transient errors
are retryable by default; validation, business, permission, and unknown errors
fail the run unless the caller supplies a different policy.
Tool side effects are only idempotent when invoked through
RunContext.call_tool(...). Direct tool calls or external effects inside a
transition are intentionally treated as unmanaged user code in this first
runtime slice.
Long runs can use bounded reads and explicit storage policies. By default the runner keeps every checkpoint and every inline tool payload for maximum inspectability.
Checkpoint policies
| Mode | steps history |
attempts log |
Use when |
|---|---|---|---|
full (default) |
every step | every attempt | Debugging, short runs |
interval |
every Nth + step 0 | every attempt | Long runs, need sparse step history |
compact |
every step (like full) | failures and retries only | Long deterministic loops (100k+) |
CheckpointPolicy(mode="compact") # skip successful single-shot attempt rows
CheckpointPolicy(mode="interval", every=1000)
explain / loom-runner attempts on a compact run may show attempt_count near zero and note compact_attempt_log — that is expected. Idempotency and resume still use committed_steps and runs.state_json.
PayloadPolicy(max_inline_bytes=N) replaces large managed tool payloads with hash/size metadata.
Opening an existing SQLite DB migrates schema v2: redundant indexes on steps / attempts primary keys are dropped (pragma user_version = 2). Run VACUUM to reclaim file size.
Benchmark:
python scripts/bench_runtime.py --steps 100000 --policy full --db /tmp/full.sqlite
python scripts/bench_runtime.py --steps 100000 --policy compact --db /tmp/compact.sqlite
compact greatly reduces attempt_rows and DB size from attempts/indexes; committed_steps size is unchanged in v0.1.3.
The import package remains loom_agent; the distribution and CLI are named
loom-runner because loom-agent is already occupied by an unrelated package
on PyPI.
Install
python3.13 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Minimal Shape
from dataclasses import dataclass
from loom_agent import AgentRunner, Complete, Continue, RunContext, SQLiteCheckpointStore
@dataclass(frozen=True)
class State:
current: int
target: int
async def step(state: State, ctx: RunContext):
if state.current >= state.target:
return Complete({"current": state.current})
return Continue(State(current=state.current + 1, target=state.target))
runner = AgentRunner(
step=step,
store=SQLiteCheckpointStore("runs.sqlite"),
encode_state=lambda state: {"current": state.current, "target": state.target},
decode_state=lambda data: State(**data),
encode_result=lambda result: result,
decode_result=lambda data: data,
)
Example
loom-runner run examples/counter_agent.py --run-id demo --db runs.sqlite --max-steps 5
loom-runner resume examples/counter_agent.py --run-id demo --db runs.sqlite --max-steps 100
loom-runner list examples/counter_agent.py --db runs.sqlite
loom-runner get examples/counter_agent.py --run-id demo --db runs.sqlite
loom-runner history examples/counter_agent.py --run-id demo --db runs.sqlite
loom-runner attempts examples/counter_agent.py --run-id demo --db runs.sqlite --limit 20
loom-runner tool-calls examples/counter_agent.py --run-id demo --db runs.sqlite --limit 20
loom-runner explain examples/counter_agent.py --run-id demo --db runs.sqlite
Add --trace trace.html to either command to emit a local flow-xray HTML
trace. The runner traces step leaves and keeps the tail-recursive driver as the
durable loop boundary.
Or directly:
python3.13 examples/counter_agent.py
Tests
python3.13 -m pytest
Runtime Benchmark
python3.13 scripts/bench_runtime.py --steps 100000
python3.13 scripts/bench_runtime.py --steps 100000 --checkpoint-every 100
The benchmark reports wall time, retained checkpoint rows, attempt rows, DB size, and peak Python memory. It is a local regression tool, not a hosted-scale performance claim.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file loom_runner-0.1.3.tar.gz.
File metadata
- Download URL: loom_runner-0.1.3.tar.gz
- Upload date:
- Size: 25.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
388301b5b726cd7ead9f79df7bdc55fec86ce317162e00cb4658845ec80757f7
|
|
| MD5 |
98df67d6086f22ad013a5cb474c3ea2a
|
|
| BLAKE2b-256 |
77a1524d7c8d8cf36759a6e3a79dc41e9810d04420d65d368b15ee403bb50ef8
|
File details
Details for the file loom_runner-0.1.3-py3-none-any.whl.
File metadata
- Download URL: loom_runner-0.1.3-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e5ff6fbf61114f2eefb8fe6cac87fab9f5cfd33f92dc73710520e804e65d2c5
|
|
| MD5 |
5fa312a5fbfdc9269a0da5ddb005d9c6
|
|
| BLAKE2b-256 |
1bbe4f8ebf60ab6875456d3faab02ef0cca8cc5ad4688ff73e745ef41d4e5ac7
|