Turtles all the way down for AI coding agents. When done means done.
Project description
Lute
Turtles all the way down
In one sentence: Lute turns "I think the agent is done" into "the exam passed, the work is on a branch, and the transcript is inspectable."
AI coding agents changed the bottleneck. The hard part is no longer getting a model to write code. The hard part is proving the code is actually done.
Did the tests really pass? Did it delete the test? Did it leave a half-fixed migration on a branch somewhere? Did it spend twenty runs rediscovering the same dead end?
Lute is the missing primitive: a while-loop for agents with a real exam.
Give it any CLI agent that reads stdin and edits the working directory
(codex exec, claude -p, a shell script, your own tool). Lute runs the
agent, re-runs the check, commits the attempt, and repeats until the
machine-checkable done_when passes.
It is deliberately plain. A Lute run is just your repo, a lute.yaml, a git
branch, logs, cards, and checks.
The agent can be clever. The runner stays boring. That is the point.
See It Work
You: lute once --until "pytest -q" --agent "codex exec --sandbox workspace-write" -- "fix the failing tests"
Lute: check failed. Starting run 1 on branch lute/once.
Agent: edits app.py, appends the journal, exits.
Lute: pytest still fails. Commit the attempt. Start run 2.
Agent: fixes the edge case it missed.
Lute: pytest passes. Commit the close. Done.
That is the whole product. Agent writes. Lute checks. Git records. Budgets stop runaway loops. Cards pull you back in when the model needs help. The runner never trusts the agent's confidence or exit code. The exam decides.
Two demos run the real thing:
bash contrib/demo-catch-cheat.sh # hermetic: a scripted agent deletes the failing
# test; lute quarantines the edit and refuses the pass
bash contrib/demo.sh # live: codex fixes a rigged repo, nested exams close
"Isn't this just a bash loop?" At the kernel, yes — contrib/lute.sh is a
thirty-line one, and it passes Lute's first two notches. The product is
everything the thirty lines cannot say: the agent cannot pass by editing the
exam (tampered materials are quarantined and restored), cannot buy budget by
rewriting the ledger (accounting is re-derived from committed history), cannot
end the loop by sounding done (only the runner's own re-run of done_when
closes a loop), and cannot fail silently forever (budgets become cards in
INBOX/). How that differs from ralph-style loops, Aider, vendor goal
features, and plain CI is spelled out, both directions, in
docs/COMPARED.md.
Install: One Paste
Paste this in a terminal to install Lute:
curl -fsSL https://raw.githubusercontent.com/jethros-projects/luteloops/v0.2.0/scripts/install.sh | bash
Uninstall just the tool, leaving project repos, .lute/, INBOX/, branches,
logs, and crontab entries alone:
curl -fsSL https://raw.githubusercontent.com/jethros-projects/luteloops/v0.2.0/scripts/uninstall.sh | bash
Or ask Codex/Claude Code:
Install Lute from https://github.com/jethros-projects/luteloops by running the project installer. Verify with lute --help.
When the PyPI package is published, Python 3.10+ only:
pipx install luteloops
lute --help
To install the tagged release directly:
pipx install git+https://github.com/jethros-projects/luteloops.git@v0.2.0
Or run it straight from a checkout. The executable is a tiny lute script, and
the runtime stays Python standard library + PyYAML:
python3 -m pip install pyyaml # only needed for checkout-style use
python3 lute --help # zero-install: run it from the checkout
export PATH="$PWD:$PATH" # optional: keep this checkout on PATH
Keep lute, lute_core/, and luteloops/ together for checkout-style use;
the script is intentionally tiny and imports the package next to it.
Quick Start
- Try a one-shot loop against a real check
- For bigger work, write or generate a
lute.yaml - Let the loop run until it closes, blocks, or asks for approval
lute once --until "pytest -q" --agent "codex exec --sandbox workspace-write" -- "make the failing tests pass"
(Claude Code as the engine: --agent "claude -p --permission-mode acceptEdits".
The flags matter — see the two engine gotchas below.)
once writes no lute.yaml: it runs the agent on a throwaway lute/once
branch until --until exits 0, then stops. The check is mandatory. That is
what makes Lute different from a bare while agent; do ...; done.
For larger jobs:
lute plan "upgrade this repo to React 19"
lute lint lute.proposed.yaml
mv lute.proposed.yaml lute.yaml
lute run
For dependency-heavy jobs, lute plan --dag "..." asks the planner to reason
from a workflow dependency graph first, while still writing ordinary
lute.proposed.yaml. Pass --keep-dag to also write the intermediate
lute.plan.yaml review artifact.
lute plan builds a bounded repo briefing first: git status, detected
build/test/CI signals, existing test/check files, path hints from the goal, and
root AGENTS.md guidance. It then gives the agent that briefing plus the
packaged luteloops skill. Run lute init --skill only when you want a local
copy to inspect or customize.
Who this is for:
- Founders and solo builders who want AI agents to grind through real work without pretending a vibes-only answer is done
- Staff engineers and maintainers running upgrades, migrations, bug fixes, release prep, or test repair across a repo
- Teams using Claude, Codex, or custom agents who want one shared finish line: shell checks, budgets, gates, and human escalation
- People who like small tools: everything important is in git,
.lute/,INBOX/, and the files your agent changed
Where Next
- docs/COMPARED.md — vs a bare loop, ralph-style loops, Aider, vendor goal features, and plain CI; both directions, by mechanism
- contrib/recipes/ — verified upgrades ready to adapt: React major, a bot's dependency bump, Django major, Node LTS
- docs/UPGRADE-BOT.md — drive Renovate/Dependabot PRs with a loop: locally, by cron, or in CI via contrib/action
- docs/INTEGRATIONS.md — reach lute from Claude Code and Codex: every verb as an MCP tool (
lute mcp) plus a skill/AGENTS.md that hands walk-away tasks to lute - docs/ATTESTATION.md — the verdict as a verifiable, signable artifact
- THREAT_MODEL.md — what holds uncaged, what needs the cage, what is out of scope
- INVARIANT.md — the one claim everything defends, mapped to the notches that enforce it
The Loop
Lute is a process, not a prompt library:
Check -> Work -> Verify -> Commit -> Repeat -> Escalate
- Check first: if
done_whenalready passes, no agent wakes up - Work in fresh iterations: each run gets the failing output and the loop task on stdin
- Verify outside the agent: Lute re-runs
done_whenitself - Commit every attempt: inspect, diff, bisect, land, or throw away the branch
- Escalate when needed: budgets, gates, merge conflicts, and missing
decisions produce
INBOX/cards instead of fake success
Nested loops close from the inside out. A parent closes only when its children
have closed and its own check passes. There is no if/else, depends_on, or
expression language. Order plus shell exit codes are the control flow.
Nested Loops Scale The Work
This is where Lute gets big without getting complicated. A loop can contain loops, and those loops can contain loops. The same rule applies at every level: children close first, then the parent exam proves the integrated result.
That means a repo-sized migration can become a set of smaller exams instead of one giant prompt. Each loop can have its own task, check, budget, confirm streak, gate, protected files, and even its own agent. The root stays honest by running the final exam for the whole project.
loop: billing-migration
agent: claude -p --permission-mode acceptEdits
budget: 72h
done_when: "pytest tests/billing tests/api && npm test"
loops:
- loop: data-model
budget: 30 runs
done_when: "pytest tests/billing/db"
loops:
- loop: ledger-schema
task: Add the new ledger tables and migrations.
done_when: "python scripts/check_schema.py"
protected: ["scripts/check_schema.py"]
budget: 8 runs
- loop: backfill
task: Write the idempotent backfill and its tests.
done_when: "pytest tests/billing/test_backfill.py"
budget: 10 runs
- loop: billing-api
budget: 30 runs
done_when: "pytest tests/api/billing"
loops:
- loop: invoice-endpoints
task: Move invoice reads and writes onto the new ledger.
done_when: "pytest tests/api/billing/test_invoices.py"
budget: 10 runs
- loop: webhooks
task: Preserve webhook behavior through the migration.
done_when: "pytest tests/api/billing/test_webhooks.py"
confirm: 2
budget: 10 runs
- loop: billing-ui
budget: 20 runs
done_when: "npm test -- --run billing"
loops:
- loop: invoice-screen
task: Update the invoice UI for the ledger-backed API.
done_when: "npm test -- --run invoice-screen"
budget: 8 runs
- loop: admin-reporting
task: Keep admin reports consistent with the migrated data.
done_when: "npm test -- --run admin-reporting"
budget: 8 runs
The scale claim is real, but bounded. Lute is not literally infinite: your runtime, repo size, git operations, checks, agents, and patience are all finite. The scalable part is that the runner does not need a new abstraction when the work gets larger. If the job can be decomposed into independently checkable milestones, you can keep nesting the same primitive and let each child loop close under its own proof.
DAG Planning, Lute Output
lute plan --dag "<goal>" is an authoring aid for complicated plans. The
planner first identifies checkable milestones and prerequisite edges, then
compiles that reasoning back into normal Lute YAML: list order for sequence,
nesting for integration, shell checks for conditions, and parallel: true only
for independent direct siblings with disjoint files/resources.
The final lute.proposed.yaml never gains depends_on, dag, nodes,
edges, Mermaid, Markdown plans, or a graph scheduler. It is the same contract
as a hand-written lute.yaml: children close first, the parent proves the
merged result, and the root exam proves the whole goal. Use --keep-dag when
you want to inspect the planner's lute.plan.yaml review artifact; Lute still
runs only the compiled proposal after you review and rename it.
The Commands
| verb | what it does |
|---|---|
lute init |
scaffold a lute.yaml and .lute/ (or lute init --skill to write a local copy of the packaged luteloops skill) |
lute lint [file] |
validate the schema, resolve uncaged agents (caged agent commands are not resolved on the host), and execute every done_when once except caged judge checks, which are reported as skipped; classify each pass / fail / error / not-yet / skipped; an error fails the lint, because an exam must be administrable before work begins |
lute run [root-id] |
run loops depth-first until everything is green (--agent CMD, --file F, --plain, --bg to detach, --dry-run to preview, --skip-if-running for cron overlap); child loops run through their parent |
lute once --until C -- "task" |
one-shot, no file: run an agent until check C passes (--agent, --id, --budget) |
lute watch [file] |
read-only event snapshot for a running or finished run (--snapshot text, --json machine-readable, --filter LOG de-noise a transcript) |
lute status [file] |
re-run each check once for loops without an unanswered card and print the loop hierarchy: ✔ done / ↻ running / ⏳ waiting / ✗ blocked / ✋ gated, plus cumulative agent time |
lute inbox |
list every blocked/gated loop with the exact command to answer it |
lute answer <loop> "..." |
reply to a card in INBOX/; blocked-loop answers are injected into the next run, while gated loops seal only on exact approve |
lute judge -- "<rubric>" |
run the configured judge once against HEAD's diff — the same oracle behind done_when: "judge: ..."; exits 0 only when the grader's first line is PASS |
lute attest [--verify F] |
emit (or verify) an in-toto statement binding a closed run's verdict to git-immutable facts: the pinned exam's hash, base + head SHAs, run count, and an event hash chain (docs/ATTESTATION.md) |
| `lute quarantine [list | diff |
lute stop |
cleanly stop the active run (and any parallel children) in this repo |
lute land [branch] |
merge lute/<root> into the start branch only if the root exam still passes against the merged tree; conflict or a failed re-check aborts clean and escalates (--file F; opt-in — the default is review-then-merge-yourself) |
lute plan [--dag] [--keep-dag] "<goal>" |
an agent reads the luteloops skill and drafts lute.proposed.yaml; --dag uses dependency planning first, and --keep-dag also writes lute.plan.yaml for review (--agent CMD picks the drafting agent, as it does for lint's resolution pass) |
Plus lute cron sync / lute cron remove for the schedules: manifest (below), and
lute --help / lute <verb> --help / lute --version.
What Makes It Safe To Let Run
| feature | why it matters |
|---|---|
| Machine checks | "Done" means a command exited 0, not that the model sounds confident |
| Budgets | Cap loops by run count or wall-clock time; stuck agents become cards |
| Nested loops | Turn huge goals into independently checkable milestones |
| Journals | Keep short memory across fresh agent processes |
| Confirm streaks | Require multiple consecutive passes for flaky checks |
gate: human |
Pause before deploy, publish, migrate, send, or other irreversible steps |
protected: |
Quarantine edits to exam materials before they can buy a pass or enter the run commit |
cage: |
Run model-facing commands in a container with explicit mounts |
parallel: true |
Run independent child loops in separate worktrees, then integrate |
watch --json |
Stable machine-readable status for wrappers, cron, dashboards, and scripts |
Contracts (for scripting lute / bringing your own agent)
Exit codes: a wrapper branches on these:
| code | meaning |
|---|---|
0 |
all loops closed (or landed) |
1 |
usage / precondition (bad invocation, missing file, dirty tree) |
2 |
internal/git error |
3 |
blocked: a loop hit its budget or a parallel/land merge conflicted; see lute inbox, then lute answer |
4 |
gated: a passing loop is awaiting human approval (lute answer <loop> approve) |
For a detached or cron run the exit code reaches no one; read lute watch --snapshot --json, a
pure projection of events (no rechecks). Its shape is stable:
{
"root": "build", // root loop id
"outcome": "blocked", // running | closed | blocked | gated: the canonical verdict
"exit": 3, // matching exit int; null while outcome is "running"
"ended": false, // true only after a run_end event; blocked/gated runs may halt first
"branch": "lute/build",
"tree": { "id": "build", "depth": 0, "mark": "✗", "word": "blocked", "runs": 2, "secs": 41.0,
"active": false, "children": [ /* same shape, recursively */ ] },
"cards": [ { "lid": "build", "gated": false, "answered": false, "summary": "BLOCKED: ...", "next": "lute answer build \"...\"" } ]
}
Match on outcome (and per-node word), not the per-node mark glyph; mark is presentational
and may change. exit is null while outcome is running, then the integer code once it halts.
The agent contract
Any CLI is a valid engine if: it reads the prompt on stdin, makes its
edits in the working directory, and exits. The runner stages tracked changes plus new files
created during that run; it leaves pre-existing untracked clutter and INBOX/ cards alone. The
agent's exit code is logged but never trusted; the only verdict is the runner re-running
done_when. That is why lute can't lie about doneness, and why your wrapper need not produce a
meaningful exit code.
The two engine gotchas. Most agent CLIs are interactive by default, and a
non-interactive one still enforces its own permissions: without edit rights it
exits happily having changed nothing. Lute names such runs — no edits on the
console line, "edits": false in the event stream, and after two in a row the
escalation card asks whether the agent command is non-interactive and allowed
to edit files. Known-good spellings:
codex exec --sandbox workspace-write # reads stdin, edits the workspace
claude -p --permission-mode acceptEdits # non-interactive, file edits allowed
Give the engine the rights its task needs: a dependency upgrade must run the
package manager, so an edits-only mode is not enough there
(claude -p --dangerously-skip-permissions is full power and belongs inside
cage: docker; codex exec gains network inside its sandbox only with
-c sandbox_workspace_write.network_access=true). Lute pipes the prompt and
reads the tree; it cannot grant your agent permissions.
A circular exam can still pass by echoing the task string back into a file, so it measures obedience rather than behavior. Prefer tests, builds, protected fixtures, or ground truth the worker does not author.
State ownership: normal repo content outside .lute/ and INBOX/ is agent-owned work
product. Runner-owned state is .lute/config.yaml, .lute/ledger.jsonl, .lute/events.jsonl,
.lute/lock, .lute/journal/*, .lute/logs/*, .lute/wt/, and INBOX/*. Before writing events,
ledger entries, logs, cards, or lock files, lute repairs those paths as real files/directories and
never follows agent-created symlinks. Journals are prompt memory: agents append to them by contract,
but budget and closure decisions never trust journal contents. If an agent deletes .lute/logs,
symlinks the ledger to /dev/null, truncates it, or rewrites durations, the runner restores trusted
state and budget accounting continues from the ledger snapshot plus committed run history.
Knobs and handshakes (env): two are yours to set: LUTE_CHECK_TIMEOUT
(seconds, default 600) caps every check and judge run, and LUTE_KEY_DIR
relocates the answer-auth key directory (default ~/.lute/keys). The rest are
the runner's own plumbing, exported to the processes it spawns: LUTE_SLOT
numbers parallel children (1, 2, 3…) so their checks can dodge collisions,
LUTE_TRUSTED_BASE pins the commit that exams and judges are audited against,
and LUTE_STATE_DIR points a parallel child at its own state directory. Read
them in checks freely; set them yourself only in tests.
Write Your First lute.yaml
Upgrade React, the lute way: write the exams, then let the loop grind:
# lute.yaml
loop: react-19
# An upgrade must run npm, so the engine needs command rights, not just edit
# rights - full power belongs inside cage: docker (see the engine gotchas).
agent: claude -p --dangerously-skip-permissions
budget: 48h
done_when: "npm test && npm run build"
loops:
- loop: bump-react # if-trick: skipped when already on 19
task: Upgrade react and react-dom to ^19 in package.json, npm install.
done_when: "node -e 'process.exit(require(\"react/package.json\").version.startsWith(\"19\")?0:1)'"
budget: 3 runs
- loop: fix-build
task: Fix every build error from the upgrade. No downgrades, and no
@ts-ignore/@ts-expect-error/@ts-nocheck to silence errors.
done_when: "npm run build"
budget: 15 runs
- loop: fix-tests
task: Repair tests broken by the upgrade. Never delete or skip a test.
done_when: "npm test"
confirm: 2
budget: 15 runs
Then:
lute lint # caged judge checks are reported as skipped; caged agents/judges are image-local
lute run # grinds on branch lute/react-19, one commit per iteration
lute status # ✔ done / ↻ in progress / ◌ untouched
If a loop exhausts its budget you get INBOX/<loop>.md and exit code 3;
reply with lute answer fix-tests "the snapshot tests are obsolete; regenerate them"
and run again. Writing good loops is a skill, literally: see
luteloops/SKILL.md, which lute plan injects into its drafting prompt after a
bounded repo briefing.
On cost:
budgetcaps iterations (N runs) and wall-clock (48h); never tokens or dollars; lute can't see your agent's API spend and does not accept dollar budgets.lute statusreports cumulative runs and agent time so you can eyeball consumption, andlute inboxshows what's waiting on you. Set a tightrunsbudget before walking away if the bill matters.
Unattended runs
Start it, walk away, get pulled back only when it needs you:
- Detach:
lute run --bgreturns immediately; the run lives in its own session and survives the terminal closing; re-attach withlute watch, end it withlute stop - Get notified: set
on_halt:in.lute/config.yamlto your own notifier; it fires the instant a loop blocks or gates, with$LUTE_LOOP,$LUTE_REASON(blocked/gated) and$LUTE_CARDin the environment (fire-and-forget; a failing hook never breaks the run):# .lute/config.yaml on_halt: 'curl -fsS -d "$LUTE_LOOP $LUTE_REASON" https://ntfy.sh/your-topic'
- Come back to it:
lute inboxlists what's waiting and the exactlute answerto type;lute watch --snapshot --jsonis a stable surface for a wrapping script
Run State And Watch
Runs write files; renderers read files. Every agent run's full transcript lands
in .lute/logs/<loop>.run<N>.log (tail -f works mid-run), and the runner
appends one JSON event per line to .lute/events.jsonl.
In a real terminal, lute run detaches into its own session and prints the
process id plus the follow-up commands:
detached: run continues (pid N) · re-attach: lute watch · stop: lute stop
lute run --bg takes the same detached path explicitly; output from the runner
itself lands in .lute/logs/runner.log, which is handy for cron and scripts.
Use lute run --plain when you want a foreground process that streams one
compact line per event and exits with the run result.
lute watch is read-only and replay-only: it renders the current loop hierarchy
once from .lute/events.jsonl, without re-running checks. lute watch --json
emits the same replay state for wrappers, dashboards, and cron probes. To inspect
the active agent transcript, tail the log path named by the stream or event file;
lute watch --filter .lute/logs/<loop>.run<N>.log prints that log with repeated
blocks collapsed to a single copy with a ×N marker. Logs, events, worktrees,
and the run lock are runner-owned runtime state and stay out of your commits;
journals and the ledger are durable runner files committed by Lute after a run,
with ledger writes repaired and authenticated through the state store.
Parallel siblings (parallel: true)
By default children run sequentially, in document order. When independent
child loops each take real time, mark their parent parallel: true and all of
its direct children run at once, each in its own git worktree on its own
branch, as a separate lute run process:
loop: ship-services
done_when: "./integration-test.sh" # the parent exam IS the integration check
parallel: true
loops:
- loop: api # the three run concurrently, each in .lute/wt/<root>__<id>
task: Build the API. Bind PORT=$((3000+LUTE_SLOT)).
done_when: "cd api && npm test"
- loop: web
task: Build the web app.
done_when: "cd web && npm test"
- loop: worker
task: Build the worker.
done_when: "cd worker && npm test"
Isolation is a worktree per child; reconciliation is git merge as each closes.
Children must be genuinely independent: touching disjoint files.
Non-overlapping edits auto-merge; a real conflict is not auto-resolved: the
run halts with an escalation card naming the conflicting files and loops, the
parent branch left clean, exit 3; make the edits disjoint (or merge by
hand) and re-run. If a child escalates or gates instead of closing, the parent
collects all children to a stopping point, relays their cards, merges none, and
exits with the most severe child code. After every child merges cleanly, Lute
re-runs each direct child done_when once against the merged tree before the
parent can close. If a child invariant was broken by the merge, the failure
becomes the parent loop's next repair prompt. The parent still runs its own
done_when on the integrated tree, so write the parent exam to cover
cross-child behavior that no child owns alone.
LUTE_SLOT (1, 2, 3… per child) lets checks dodge collisions: a per-slot port
(PORT=$((3000+LUTE_SLOT))) or scratch path. A run is crash-durable by
re-derivation: git worktree list and the child branches are the state, so a
re-run skips children whose work is already merged and resumes the rest in their
worktrees. Only one top-level lute run may be active per repo; a .lute/lock
(pid + start) guards it, and a lock whose pid is dead is stale and reclaimed.
Watchers (exit 75 = "not yet")
A check has three honest answers, not two: exit 0 is pass, exit 75 is
"not yet": nothing is wrong, nothing is done, ask me later. Anything
else is fail. On a not-yet the runner wakes no agent and spends no
run budget; it sleeps check_every (a new optional per-loop field: 30s,
5m, 2h; default 60s) and re-asks. Because run budgets do not tick while
waiting, any loop whose check returns 75 must have an s/m/h time budget,
and check_every must be positive.
lute lint errors when a dry-run returns 75 without a time cap, and lute run
escalates immediately instead of hanging. Only a real failure's output ever
rides into an agent prompt; silence is not evidence.
A check or judge: command that exceeds Lute's per-check timeout counts as a
failed exam, not an internal crash. The loop follows the normal budget path and
eventually opens an INBOX/ card with the timeout in the failure tail.
loop: deploy-quiet
task: Investigate and fix whatever broke the deploy.
done_when: "./checks/quiet.sh" # 0 quiet 24h · 75 waiting · 1 alerts found
check_every: 30m
budget: 48h
Combine the trio: a not-yet check, lute run --bg, and lute cron sync,
and lute is a monitor that costs ~nothing while things are healthy and spends
exactly one agent run per real problem. The event stream, watch, and plain
mode show waiting loops as ⏳ (⏳ deploy-quiet: not yet · next check in 30m),
and lute lint classifies a capped 75 dry-run as not_yet: a valid,
lint-passing outcome.
Gates (gate: human)
Some loops precede an irreversible verb: deploy, publish, send, migrate,
and must not auto-close on a green check. Mark them gate: human: when such
a loop would close, the run pauses with a READY card in INBOX/ and
exit 4 ("needs approval", distinct from exit 3's "needs help"). The
canonical pattern is a checkpoint between preparing and acting; list
order does the rest:
loops:
- loop: release-ready
done_when: "npm run build && test -f CHANGELOG.md"
gate: human # ✋ READY card, exit 4
- loop: ship # runs only after your approval
task: npm publish, then verify.
done_when: "npm view mypkg version | grep -qx 1.4.0"
Because approval is authenticated by a key under ~/.lute/keys/, gate: human
requires a configured cage:. Without a cage, the agent runs as your user and
can read the answer-auth key, so it can forge the approval token. lute lint
reports this as an error, and lute run refuses a gated manifest before work
starts. Use the cage even if the host has no other secrets; for gates, it is the
trust anchor, not just a convenience.
This is a different trust base from exam-pass integrity: host-side checks and
protected exams cover ordinary uncaged runs, and Lute reaps the agent's process
group before checking, but uncaged code is still host code and can deliberately
daemonize outside that group. Use a real cage for adversarial containment;
human approval and answered-card budget refresh require it. The full contract —
which guarantee holds caged versus uncaged, and what is explicitly out of scope —
is stated in THREAT_MODEL.md.
Approve with lute answer release-ready approve; after trimming whitespace,
the answer text must be exactly approve to seal. On the next run the exam is
re-verified once before sealing; if the world moved overnight the card is
marked SUPERSEDED and the loop takes the normal fail path. To reject,
answer with anything else; Lute records the note, does not inject it as repair
guidance, and does not unlock the following loop. Change whatever needed
changing and re-run; the gate re-examines the new state. A gated loop waiting
for you is exempt from time-budget expiry: the limit exists for unattended
waiting, and a gate is attended by definition.
Protected exams (protected:)
An agent runs as you, with your toolchain, so a careless or adversarial one
can pass an exam by editing the exam instead of doing the work: delete the
failing test and done_when goes green. List the exam's own materials under
protected:: globs, relative to the repo root, matched against the working
tree (so untracked check scripts count too):
loop: make-it-pass
task: Make the suite pass. Do not touch the tests.
done_when: "pytest -q"
protected:
- "tests/**"
- "checks/**"
budget: 20 runs
Lute compares those globs to their committed originals at the branch base.
Before and after checks, and after each agent run, it re-hashes the watched
materials plus trusted control files (the active manifest, whether lute.yaml
or --file, and .lute/config.yaml). If an agent modifies, deletes, or newly adds a watched file,
Lute saves the attempted edit under .lute/quarantine/<id>/, restores the
trusted copy, and leaves the quarantined edit out of the normal run commit. The
next prompt names the quarantine record so the agent can fix the actual work
instead of redefining the exam. Inspect records with:
lute quarantine
lute quarantine diff <id>
lute quarantine drop <id> # or: lute quarantine drop --all
The guard is opt-in per loop for protected: exam materials, while the active
manifest and config file are trusted control inputs. lute once is fileless, so
a committed lute.yaml is ordinary work there unless it is also listed under
protected:. done_when checks still run host-side; this protects the exam's
materials and control inputs from model-facing commands. lute lint warns when
a protected: glob matches nothing and when an inferable local check file is not
covered by protected:.
The cage (cage:)
By default an agent shares your filesystem and can read ~/.ssh. Set
cage: docker in .lute/config.yaml and every command lute runs on behalf of
a model runs inside a container: agents see only your repo (read-write at
/work) and whatever you name explicitly; judges see only an empty working directory and the stdin diff.
done_when checks stay on the host (they're yours and need your toolchain):
# .lute/config.yaml
cage: docker # or a custom template (podman) using {repo} {image} {cmd} {mounts}
cage_image: my-agent-cage # YOUR image; it must contain your agent CLI
cage_mounts: # extra host paths, mounted read-only, by name
- "~/.config/my-agent" # agent auth enters here; never implicitly
A done_when: "judge: <rubric>" runs the same oracle you can invoke by hand as
lute judge -- "<rubric>" (the check runner resolves the prefix to that grader
in-process rather than through your shell). Either way the judge keeps an
ordinary check's contract: the verdict is an exit code, like every other
done_when. The judge sends the rubric
as trusted instructions and wraps the candidate diff inside
BEGIN UNTRUSTED DIFF / END UNTRUSTED DIFF markers. The judge runs in an empty
working directory and grades the stdin diff only; if a grader must read or
execute working files, make that a normal shell done_when instead. The judge is
told to treat diff content as evidence only, never as instructions, and closes
only when stdout's first line is exactly PASS and the command exits 0.
The prompt still flows on stdin; output still lands in the same per-run log.
Secrets policy is absence: nothing of the host is visible except the repo
and what cage_mounts names, so ~/.ssh and your environment simply aren't
there. The image is yours to build: it must contain your agent CLI, and auth
enters read-only through cage_mounts, by name, never implicitly.
contrib/cage/Dockerfile is a worked Codex sample (node:20-slim +
@openai/codex + git). The built-in cage: docker template can still reach the network so model CLIs can call their APIs. Custom cage templates are
operator-owned: if egress isolation matters, supply your own template with the
network policy you want, for example a Docker template that includes
--network none.
The same isolation protects Lute's own answer-auth key. Answered cards can
refresh a loop's budget once, and gated cards seal human approval. If agents are
uncaged, those mechanisms are useful operator workflow, not adversarial security
boundaries; lint warns for budget-refreshable loops and errors for human gates.
A no-op template such as sh -lc {cmd} is still uncaged in practice: it runs as
you and can read the answer-auth key.
Schedules (cron, not a daemon)
A top-level schedules: section is a manifest, never a runtime:
schedules:
- run: react-19 # root-level loops only
at: "0 9 * * *"
lute cron sync compiles it into a managed block in your crontab
(# BEGIN lute <repo> … # END lute <repo>), idempotent and removable with
lute cron remove. Each tick is a fresh lute run --skip-if-running <root-id>:
if a previous tick still holds the repo lock, the new tick exits 0 without
starting duplicate work. Loops themselves never gain a time field. Note: cron
jobs run with a minimal environment; make sure your agent CLI is on cron's
PATH, and check mail (or wrap the entry) for tick output.
What's deliberately not here
The initial release is the small durable primitive: foreground, branch-only,
fast-check-first (parallel siblings are opt-in per parent, but a lone loop still
runs as one plain process). lute plan --dag does not add runtime DAG syntax,
automatic graph scheduling, or a depends_on manifest key. The verdict cache,
cron-resumed ticks on an always-on box, merge gates, agent-resolved merge
conflicts, registry, and richer cage policy are deliberately outside the initial
release. They should enter only when a real loop fails without them, and only if
they add no required fields.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file luteloops-0.2.0.tar.gz.
File metadata
- Download URL: luteloops-0.2.0.tar.gz
- Upload date:
- Size: 494.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f581069f4256e092c494faffba9a1555fb4823979dc31a38ee3f9ec306fb7427
|
|
| MD5 |
80215f7579a786c4e1822a3e3dedac3f
|
|
| BLAKE2b-256 |
2d35d302ddde4cd2db8e570073a07ee2682d3506926ec9069abd9a9a7057b0cc
|
Provenance
The following attestation bundles were made for luteloops-0.2.0.tar.gz:
Publisher:
publish.yml on jethros-projects/luteloops
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
luteloops-0.2.0.tar.gz -
Subject digest:
f581069f4256e092c494faffba9a1555fb4823979dc31a38ee3f9ec306fb7427 - Sigstore transparency entry: 2060544124
- Sigstore integration time:
-
Permalink:
jethros-projects/luteloops@587f21b839ccd78cbc8611c0996dc952fbecb203 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/jethros-projects
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@587f21b839ccd78cbc8611c0996dc952fbecb203 -
Trigger Event:
push
-
Statement type:
File details
Details for the file luteloops-0.2.0-py3-none-any.whl.
File metadata
- Download URL: luteloops-0.2.0-py3-none-any.whl
- Upload date:
- Size: 118.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a4c260297eb196b7ee7a8360155958cd388e7f19d2812997be7bef30a585793
|
|
| MD5 |
df56398fd7325053862ca08ca34f5233
|
|
| BLAKE2b-256 |
bad77f5a446f76f55f1ced3093b0c982980e1872935838923a1cad8e06f30cd0
|
Provenance
The following attestation bundles were made for luteloops-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on jethros-projects/luteloops
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
luteloops-0.2.0-py3-none-any.whl -
Subject digest:
0a4c260297eb196b7ee7a8360155958cd388e7f19d2812997be7bef30a585793 - Sigstore transparency entry: 2060544444
- Sigstore integration time:
-
Permalink:
jethros-projects/luteloops@587f21b839ccd78cbc8611c0996dc952fbecb203 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/jethros-projects
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@587f21b839ccd78cbc8611c0996dc952fbecb203 -
Trigger Event:
push
-
Statement type: