Skip to main content

Turtles all the way down for AI coding agents. When done means done.

Project description

Lute

exam

Turtles all the way down

In one sentence: Lute turns "I think the agent is done" into "the exam passed, the work is on a branch, and the transcript is inspectable."

AI coding agents changed the bottleneck. The hard part is no longer getting a model to write code. The hard part is proving the code is actually done.

Did the tests really pass? Did it delete the test? Did it leave a half-fixed migration on a branch somewhere? Did it spend twenty runs rediscovering the same dead end?

Lute is the missing primitive: a while-loop for agents with a real exam. Give it any CLI agent that reads stdin and edits the working directory (codex exec, claude -p, a shell script, your own tool). Lute runs the agent, re-runs the check, commits the attempt, and repeats until the machine-checkable done_when passes.

It is deliberately plain. A Lute run is just your repo, a lute.yaml, a git branch, logs, cards, and checks. The agent can be clever. The runner stays boring. That is the point.

See It Work

You: lute once --until "pytest -q" --agent "codex exec --sandbox workspace-write" -- "fix the failing tests"

Lute: check failed. Starting run 1 on branch lute/once.
Agent: edits app.py, appends the journal, exits.
Lute: pytest still fails. Commit the attempt. Start run 2.
Agent: fixes the edge case it missed.
Lute: pytest passes. Commit the close. Done.

That is the whole product. Agent writes. Lute checks. Git records. Budgets stop runaway loops. Cards pull you back in when the model needs help. The runner never trusts the agent's confidence or exit code. The exam decides.

Two demos run the real thing:

bash contrib/demo-catch-cheat.sh   # hermetic: a scripted agent deletes the failing
                                   # test; lute quarantines the edit and refuses the pass
bash contrib/demo.sh               # live: codex fixes a rigged repo, nested exams close

"Isn't this just a bash loop?" At the kernel, yes — contrib/lute.sh is a thirty-line one, and it passes Lute's first two notches. The product is everything the thirty lines cannot say: the agent cannot pass by editing the exam (tampered materials are quarantined and restored), cannot buy budget by rewriting the ledger (accounting is re-derived from committed history), cannot end the loop by sounding done (only the runner's own re-run of done_when closes a loop), and cannot fail silently forever (budgets become cards in INBOX/). How that differs from ralph-style loops, Aider, vendor goal features, and plain CI is spelled out, both directions, in docs/COMPARED.md.

Install

Recommended — from PyPI with pipx (Python 3.10+):

brew install pipx        # or: python3 -m pip install --user pipx   (skip if you already have pipx)
pipx ensurepath          # first pipx tool on this machine only — puts ~/.local/bin on your PATH
pipx install luteloops
lute --version           # → lute 0.2.1

pipx ensurepath is a one-time-per-machine step, not part of installing lute: it makes sure the folder pipx drops commands into is on your $PATH, so a fresh shell can find lute. If you've used pipx before, it's a no-op. Open a new terminal tab after it so the change takes effect — if lute still isn't found, that's the fix.

No pipx? One paste installs Lute into a private venv and symlinks lute onto your PATH:

curl -fsSL https://raw.githubusercontent.com/jethros-projects/luteloops/v0.2.1/scripts/install.sh | bash

Or ask Codex/Claude Code:

Install Lute from https://github.com/jethros-projects/luteloops by running the project installer. Verify with lute --help.

Uninstall just the tool, leaving project repos, .lute/, INBOX/, branches, logs, and crontab entries alone — pipx uninstall luteloops, or for the one-paste install:

curl -fsSL https://raw.githubusercontent.com/jethros-projects/luteloops/v0.2.1/scripts/uninstall.sh | bash

Or run it straight from a checkout. The executable is a tiny lute script, and the runtime stays Python standard library + PyYAML:

python3 -m pip install pyyaml      # only needed for checkout-style use
python3 lute --help                # zero-install: run it from the checkout
export PATH="$PWD:$PATH"           # optional: keep this checkout on PATH

Keep lute, lute_core/, and luteloops/ together for checkout-style use; the script is intentionally tiny and imports the package next to it.

Quick Start

  1. Try a one-shot loop against a real check
  2. For bigger work, write or generate a lute.yaml
  3. Let the loop run until it closes, blocks, or asks for approval
lute once --until "pytest -q" --agent "codex exec --sandbox workspace-write" -- "make the failing tests pass"

(Claude Code as the engine: --agent "claude -p --permission-mode acceptEdits". The flags matter — see the two engine gotchas below.)

once writes no lute.yaml: it runs the agent on a throwaway lute/once branch until --until exits 0, then stops. The check is mandatory. That is what makes Lute different from a bare while agent; do ...; done.

For larger jobs:

lute plan "upgrade this repo to React 19"
lute lint lute.proposed.yaml
mv lute.proposed.yaml lute.yaml
lute run

For dependency-heavy jobs, lute plan --dag "..." asks the planner to reason from a workflow dependency graph first, while still writing ordinary lute.proposed.yaml. Pass --keep-dag to also write the intermediate lute.plan.yaml review artifact.

lute plan builds a bounded repo briefing first: git status, detected build/test/CI signals, existing test/check files, path hints from the goal, and root AGENTS.md guidance. It then gives the agent that briefing plus the packaged luteloops skill. Run lute init --skill only when you want a local copy to inspect or customize.

Who this is for:

  • Founders and solo builders who want AI agents to grind through real work without pretending a vibes-only answer is done
  • Staff engineers and maintainers running upgrades, migrations, bug fixes, release prep, or test repair across a repo
  • Teams using Claude, Codex, or custom agents who want one shared finish line: shell checks, budgets, gates, and human escalation
  • People who like small tools: everything important is in git, .lute/, INBOX/, and the files your agent changed

Where Next

  • docs/COMPARED.md — vs a bare loop, ralph-style loops, Aider, vendor goal features, and plain CI; both directions, by mechanism
  • contrib/recipes/ — verified upgrades ready to adapt: React major, a bot's dependency bump, Django major, Node LTS
  • docs/UPGRADE-BOT.md — drive Renovate/Dependabot PRs with a loop: locally, by cron, or in CI via contrib/action
  • docs/INTEGRATIONS.md — reach lute from Claude Code and Codex: every verb as an MCP tool (lute mcp) plus a skill/AGENTS.md that hands walk-away tasks to lute
  • docs/ATTESTATION.md — the verdict as a verifiable, signable artifact
  • THREAT_MODEL.md — what holds uncaged, what needs the cage, what is out of scope
  • INVARIANT.md — the one claim everything defends, mapped to the notches that enforce it

The Loop

Lute is a process, not a prompt library:

Check -> Work -> Verify -> Commit -> Repeat -> Escalate

  • Check first: if done_when already passes, no agent wakes up
  • Work in fresh iterations: each run gets the failing output and the loop task on stdin
  • Verify outside the agent: Lute re-runs done_when itself
  • Commit every attempt: inspect, diff, bisect, land, or throw away the branch
  • Escalate when needed: budgets, gates, merge conflicts, and missing decisions produce INBOX/ cards instead of fake success

Nested loops close from the inside out. A parent closes only when its children have closed and its own check passes. There is no if/else, depends_on, or expression language. Order plus shell exit codes are the control flow.

Nested Loops Scale The Work

This is where Lute gets big without getting complicated. A loop can contain loops, and those loops can contain loops. The same rule applies at every level: children close first, then the parent exam proves the integrated result.

That means a repo-sized migration can become a set of smaller exams instead of one giant prompt. Each loop can have its own task, check, budget, confirm streak, gate, protected files, and even its own agent. The root stays honest by running the final exam for the whole project.

loop: billing-migration
agent: claude -p --permission-mode acceptEdits
budget: 72h
done_when: "pytest tests/billing tests/api && npm test"
loops:
  - loop: data-model
    budget: 30 runs
    done_when: "pytest tests/billing/db"
    loops:
      - loop: ledger-schema
        task: Add the new ledger tables and migrations.
        done_when: "python scripts/check_schema.py"
        protected: ["scripts/check_schema.py"]
        budget: 8 runs
      - loop: backfill
        task: Write the idempotent backfill and its tests.
        done_when: "pytest tests/billing/test_backfill.py"
        budget: 10 runs

  - loop: billing-api
    budget: 30 runs
    done_when: "pytest tests/api/billing"
    loops:
      - loop: invoice-endpoints
        task: Move invoice reads and writes onto the new ledger.
        done_when: "pytest tests/api/billing/test_invoices.py"
        budget: 10 runs
      - loop: webhooks
        task: Preserve webhook behavior through the migration.
        done_when: "pytest tests/api/billing/test_webhooks.py"
        confirm: 2
        budget: 10 runs

  - loop: billing-ui
    budget: 20 runs
    done_when: "npm test -- --run billing"
    loops:
      - loop: invoice-screen
        task: Update the invoice UI for the ledger-backed API.
        done_when: "npm test -- --run invoice-screen"
        budget: 8 runs
      - loop: admin-reporting
        task: Keep admin reports consistent with the migrated data.
        done_when: "npm test -- --run admin-reporting"
        budget: 8 runs

The scale claim is real, but bounded. Lute is not literally infinite: your runtime, repo size, git operations, checks, agents, and patience are all finite. The scalable part is that the runner does not need a new abstraction when the work gets larger. If the job can be decomposed into independently checkable milestones, you can keep nesting the same primitive and let each child loop close under its own proof.

DAG Planning, Lute Output

lute plan --dag "<goal>" is an authoring aid for complicated plans. The planner first identifies checkable milestones and prerequisite edges, then compiles that reasoning back into normal Lute YAML: list order for sequence, nesting for integration, shell checks for conditions, and parallel: true only for independent direct siblings with disjoint files/resources.

The final lute.proposed.yaml never gains depends_on, dag, nodes, edges, Mermaid, Markdown plans, or a graph scheduler. It is the same contract as a hand-written lute.yaml: children close first, the parent proves the merged result, and the root exam proves the whole goal. Use --keep-dag when you want to inspect the planner's lute.plan.yaml review artifact; Lute still runs only the compiled proposal after you review and rename it.

The Commands

verb what it does
lute init scaffold a lute.yaml and .lute/ (or lute init --skill to write a local copy of the packaged luteloops skill)
lute lint [file] validate the schema, resolve uncaged agents (caged agent commands are not resolved on the host), and execute every done_when once except caged judge checks, which are reported as skipped; classify each pass / fail / error / not-yet / skipped; an error fails the lint, because an exam must be administrable before work begins
lute run [root-id] run loops depth-first until everything is green (--agent CMD, --file F, --plain, --bg to detach, --dry-run to preview, --skip-if-running for cron overlap); child loops run through their parent
lute once --until C -- "task" one-shot, no file: run an agent until check C passes (--agent, --id, --budget)
lute watch [file] read-only event snapshot for a running or finished run (--snapshot text, --json machine-readable, --filter LOG de-noise a transcript)
lute status [file] re-run each check once for loops without an unanswered card and print the loop hierarchy: ✔ done / ↻ running / ⏳ waiting / ✗ blocked / ✋ gated, plus cumulative agent time
lute inbox list every blocked/gated loop with the exact command to answer it
lute answer <loop> "..." reply to a card in INBOX/; blocked-loop answers are injected into the next run, while gated loops seal only on exact approve
lute judge -- "<rubric>" run the configured judge once against HEAD's diff — the same oracle behind done_when: "judge: ..."; exits 0 only when the grader's first line is PASS
lute attest [--verify F] emit (or verify) an in-toto statement binding a closed run's verdict to git-immutable facts: the pinned exam's hash, base + head SHAs, run count, and an event hash chain (docs/ATTESTATION.md)
`lute quarantine [list diff
lute stop cleanly stop the active run (and any parallel children) in this repo
lute land [branch] merge lute/<root> into the start branch only if the root exam still passes against the merged tree; conflict or a failed re-check aborts clean and escalates (--file F; opt-in — the default is review-then-merge-yourself)
lute plan [--dag] [--keep-dag] "<goal>" an agent reads the luteloops skill and drafts lute.proposed.yaml; --dag uses dependency planning first, and --keep-dag also writes lute.plan.yaml for review (--agent CMD picks the drafting agent, as it does for lint's resolution pass)

Plus lute cron sync / lute cron remove for the schedules: manifest (below), and lute --help / lute <verb> --help / lute --version.

What Makes It Safe To Let Run

feature why it matters
Machine checks "Done" means a command exited 0, not that the model sounds confident
Budgets Cap loops by run count or wall-clock time; stuck agents become cards
Nested loops Turn huge goals into independently checkable milestones
Journals Keep short memory across fresh agent processes
Confirm streaks Require multiple consecutive passes for flaky checks
gate: human Pause before deploy, publish, migrate, send, or other irreversible steps
protected: Quarantine edits to exam materials before they can buy a pass or enter the run commit
cage: Run model-facing commands in a container with explicit mounts
parallel: true Run independent child loops in separate worktrees, then integrate
watch --json Stable machine-readable status for wrappers, cron, dashboards, and scripts

Contracts (for scripting lute / bringing your own agent)

Exit codes: a wrapper branches on these:

code meaning
0 all loops closed (or landed)
1 usage / precondition (bad invocation, missing file, dirty tree)
2 internal/git error
3 blocked: a loop hit its budget or a parallel/land merge conflicted; see lute inbox, then lute answer
4 gated: a passing loop is awaiting human approval (lute answer <loop> approve)

For a detached or cron run the exit code reaches no one; read lute watch --snapshot --json, a pure projection of events (no rechecks). Its shape is stable:

{
  "root": "build",            // root loop id
  "outcome": "blocked",       // running | closed | blocked | gated: the canonical verdict
  "exit": 3,                  // matching exit int; null while outcome is "running"
  "ended": false,             // true only after a run_end event; blocked/gated runs may halt first
  "branch": "lute/build",
  "tree": { "id": "build", "depth": 0, "mark": "✗", "word": "blocked", "runs": 2, "secs": 41.0,
            "active": false, "children": [ /* same shape, recursively */ ] },
  "cards": [ { "lid": "build", "gated": false, "answered": false, "summary": "BLOCKED: ...", "next": "lute answer build \"...\"" } ]
}

Match on outcome (and per-node word), not the per-node mark glyph; mark is presentational and may change. exit is null while outcome is running, then the integer code once it halts.

The agent contract

Any CLI is a valid engine if: it reads the prompt on stdin, makes its edits in the working directory, and exits. The runner stages tracked changes plus new files created during that run; it leaves pre-existing untracked clutter and INBOX/ cards alone. The agent's exit code is logged but never trusted; the only verdict is the runner re-running done_when. That is why lute can't lie about doneness, and why your wrapper need not produce a meaningful exit code.

The two engine gotchas. Most agent CLIs are interactive by default, and a non-interactive one still enforces its own permissions: without edit rights it exits happily having changed nothing. Lute names such runs — no edits on the console line, "edits": false in the event stream, and after two in a row the escalation card asks whether the agent command is non-interactive and allowed to edit files. Known-good spellings:

codex exec --sandbox workspace-write        # reads stdin, edits the workspace
claude -p --permission-mode acceptEdits     # non-interactive, file edits allowed

Give the engine the rights its task needs: a dependency upgrade must run the package manager, so an edits-only mode is not enough there (claude -p --dangerously-skip-permissions is full power and belongs inside cage: docker; codex exec gains network inside its sandbox only with -c sandbox_workspace_write.network_access=true). Lute pipes the prompt and reads the tree; it cannot grant your agent permissions.

A circular exam can still pass by echoing the task string back into a file, so it measures obedience rather than behavior. Prefer tests, builds, protected fixtures, or ground truth the worker does not author.

State ownership: normal repo content outside .lute/ and INBOX/ is agent-owned work product. Runner-owned state is .lute/config.yaml, .lute/ledger.jsonl, .lute/events.jsonl, .lute/lock, .lute/journal/*, .lute/logs/*, .lute/wt/, and INBOX/*. Before writing events, ledger entries, logs, cards, or lock files, lute repairs those paths as real files/directories and never follows agent-created symlinks. Journals are prompt memory: agents append to them by contract, but budget and closure decisions never trust journal contents. If an agent deletes .lute/logs, symlinks the ledger to /dev/null, truncates it, or rewrites durations, the runner restores trusted state and budget accounting continues from the ledger snapshot plus committed run history.

Knobs and handshakes (env): two are yours to set: LUTE_CHECK_TIMEOUT (seconds, default 600) caps every check and judge run, and LUTE_KEY_DIR relocates the answer-auth key directory (default ~/.lute/keys). The rest are the runner's own plumbing, exported to the processes it spawns: LUTE_SLOT numbers parallel children (1, 2, 3…) so their checks can dodge collisions, LUTE_TRUSTED_BASE pins the commit that exams and judges are audited against, and LUTE_STATE_DIR points a parallel child at its own state directory. Read them in checks freely; set them yourself only in tests.

Write Your First lute.yaml

Upgrade React, the lute way: write the exams, then let the loop grind:

# lute.yaml
loop: react-19
# An upgrade must run npm, so the engine needs command rights, not just edit
# rights - full power belongs inside cage: docker (see the engine gotchas).
agent: claude -p --dangerously-skip-permissions
budget: 48h
done_when: "npm test && npm run build"
loops:
  - loop: bump-react             # if-trick: skipped when already on 19
    task: Upgrade react and react-dom to ^19 in package.json, npm install.
    done_when: "node -e 'process.exit(require(\"react/package.json\").version.startsWith(\"19\")?0:1)'"
    budget: 3 runs
  - loop: fix-build
    task: Fix every build error from the upgrade. No downgrades, and no
      @ts-ignore/@ts-expect-error/@ts-nocheck to silence errors.
    done_when: "npm run build"
    budget: 15 runs
  - loop: fix-tests
    task: Repair tests broken by the upgrade. Never delete or skip a test.
    done_when: "npm test"
    confirm: 2
    budget: 15 runs

Then:

lute lint     # caged judge checks are reported as skipped; caged agents/judges are image-local
lute run      # grinds on branch lute/react-19, one commit per iteration
lute status   # ✔ done / ↻ in progress / ◌ untouched

If a loop exhausts its budget you get INBOX/<loop>.md and exit code 3; reply with lute answer fix-tests "the snapshot tests are obsolete; regenerate them" and run again. Writing good loops is a skill, literally: see luteloops/SKILL.md, which lute plan injects into its drafting prompt after a bounded repo briefing.

On cost: budget caps iterations (N runs) and wall-clock (48h); never tokens or dollars; lute can't see your agent's API spend and does not accept dollar budgets. lute status reports cumulative runs and agent time so you can eyeball consumption, and lute inbox shows what's waiting on you. Set a tight runs budget before walking away if the bill matters.

Unattended runs

Start it, walk away, get pulled back only when it needs you:

  • Detach: lute run --bg returns immediately; the run lives in its own session and survives the terminal closing; re-attach with lute watch, end it with lute stop
  • Get notified: set on_halt: in .lute/config.yaml to your own notifier; it fires the instant a loop blocks or gates, with $LUTE_LOOP, $LUTE_REASON (blocked/gated) and $LUTE_CARD in the environment (fire-and-forget; a failing hook never breaks the run):
    # .lute/config.yaml
    on_halt: 'curl -fsS -d "$LUTE_LOOP $LUTE_REASON" https://ntfy.sh/your-topic'
    
  • Come back to it: lute inbox lists what's waiting and the exact lute answer to type; lute watch --snapshot --json is a stable surface for a wrapping script

Run State And Watch

Runs write files; renderers read files. Every agent run's full transcript lands in .lute/logs/<loop>.run<N>.log (tail -f works mid-run), and the runner appends one JSON event per line to .lute/events.jsonl.

In a real terminal, lute run detaches into its own session and prints the process id plus the follow-up commands:

detached: run continues (pid N) · re-attach: lute watch · stop: lute stop

lute run --bg takes the same detached path explicitly; output from the runner itself lands in .lute/logs/runner.log, which is handy for cron and scripts. Use lute run --plain when you want a foreground process that streams one compact line per event and exits with the run result.

lute watch is read-only and replay-only: it renders the current loop hierarchy once from .lute/events.jsonl, without re-running checks. lute watch --json emits the same replay state for wrappers, dashboards, and cron probes. To inspect the active agent transcript, tail the log path named by the stream or event file; lute watch --filter .lute/logs/<loop>.run<N>.log prints that log with repeated blocks collapsed to a single copy with a ×N marker. Logs, events, worktrees, and the run lock are runner-owned runtime state and stay out of your commits; journals and the ledger are durable runner files committed by Lute after a run, with ledger writes repaired and authenticated through the state store.

Parallel siblings (parallel: true)

By default children run sequentially, in document order. When independent child loops each take real time, mark their parent parallel: true and all of its direct children run at once, each in its own git worktree on its own branch, as a separate lute run process:

loop: ship-services
done_when: "./integration-test.sh"   # the parent exam IS the integration check
parallel: true
loops:
  - loop: api          # the three run concurrently, each in .lute/wt/<root>__<id>
    task: Build the API. Bind PORT=$((3000+LUTE_SLOT)).
    done_when: "cd api && npm test"
  - loop: web
    task: Build the web app.
    done_when: "cd web && npm test"
  - loop: worker
    task: Build the worker.
    done_when: "cd worker && npm test"

Isolation is a worktree per child; reconciliation is git merge as each closes. Children must be genuinely independent: touching disjoint files. Non-overlapping edits auto-merge; a real conflict is not auto-resolved: the run halts with an escalation card naming the conflicting files and loops, the parent branch left clean, exit 3; make the edits disjoint (or merge by hand) and re-run. If a child escalates or gates instead of closing, the parent collects all children to a stopping point, relays their cards, merges none, and exits with the most severe child code. After every child merges cleanly, Lute re-runs each direct child done_when once against the merged tree before the parent can close. If a child invariant was broken by the merge, the failure becomes the parent loop's next repair prompt. The parent still runs its own done_when on the integrated tree, so write the parent exam to cover cross-child behavior that no child owns alone.

LUTE_SLOT (1, 2, 3… per child) lets checks dodge collisions: a per-slot port (PORT=$((3000+LUTE_SLOT))) or scratch path. A run is crash-durable by re-derivation: git worktree list and the child branches are the state, so a re-run skips children whose work is already merged and resumes the rest in their worktrees. Only one top-level lute run may be active per repo; a .lute/lock (pid + start) guards it, and a lock whose pid is dead is stale and reclaimed.

Watchers (exit 75 = "not yet")

A check has three honest answers, not two: exit 0 is pass, exit 75 is "not yet": nothing is wrong, nothing is done, ask me later. Anything else is fail. On a not-yet the runner wakes no agent and spends no run budget; it sleeps check_every (a new optional per-loop field: 30s, 5m, 2h; default 60s) and re-asks. Because run budgets do not tick while waiting, any loop whose check returns 75 must have an s/m/h time budget, and check_every must be positive. lute lint errors when a dry-run returns 75 without a time cap, and lute run escalates immediately instead of hanging. Only a real failure's output ever rides into an agent prompt; silence is not evidence.

A check or judge: command that exceeds Lute's per-check timeout counts as a failed exam, not an internal crash. The loop follows the normal budget path and eventually opens an INBOX/ card with the timeout in the failure tail.

loop: deploy-quiet
task: Investigate and fix whatever broke the deploy.
done_when: "./checks/quiet.sh"   # 0 quiet 24h · 75 waiting · 1 alerts found
check_every: 30m
budget: 48h

Combine the trio: a not-yet check, lute run --bg, and lute cron sync, and lute is a monitor that costs ~nothing while things are healthy and spends exactly one agent run per real problem. The event stream, watch, and plain mode show waiting loops as ⏳ (⏳ deploy-quiet: not yet · next check in 30m), and lute lint classifies a capped 75 dry-run as not_yet: a valid, lint-passing outcome.

Gates (gate: human)

Some loops precede an irreversible verb: deploy, publish, send, migrate, and must not auto-close on a green check. Mark them gate: human: when such a loop would close, the run pauses with a READY card in INBOX/ and exit 4 ("needs approval", distinct from exit 3's "needs help"). The canonical pattern is a checkpoint between preparing and acting; list order does the rest:

loops:
  - loop: release-ready
    done_when: "npm run build && test -f CHANGELOG.md"
    gate: human                  # ✋ READY card, exit 4
  - loop: ship                   # runs only after your approval
    task: npm publish, then verify.
    done_when: "npm view mypkg version | grep -qx 1.4.0"

Because approval is authenticated by a key under ~/.lute/keys/, gate: human requires a configured cage:. Without a cage, the agent runs as your user and can read the answer-auth key, so it can forge the approval token. lute lint reports this as an error, and lute run refuses a gated manifest before work starts. Use the cage even if the host has no other secrets; for gates, it is the trust anchor, not just a convenience. This is a different trust base from exam-pass integrity: host-side checks and protected exams cover ordinary uncaged runs, and Lute reaps the agent's process group before checking, but uncaged code is still host code and can deliberately daemonize outside that group. Use a real cage for adversarial containment; human approval and answered-card budget refresh require it. The full contract — which guarantee holds caged versus uncaged, and what is explicitly out of scope — is stated in THREAT_MODEL.md.

Approve with lute answer release-ready approve; after trimming whitespace, the answer text must be exactly approve to seal. On the next run the exam is re-verified once before sealing; if the world moved overnight the card is marked SUPERSEDED and the loop takes the normal fail path. To reject, answer with anything else; Lute records the note, does not inject it as repair guidance, and does not unlock the following loop. Change whatever needed changing and re-run; the gate re-examines the new state. A gated loop waiting for you is exempt from time-budget expiry: the limit exists for unattended waiting, and a gate is attended by definition.

Protected exams (protected:)

An agent runs as you, with your toolchain, so a careless or adversarial one can pass an exam by editing the exam instead of doing the work: delete the failing test and done_when goes green. List the exam's own materials under protected:: globs, relative to the repo root, matched against the working tree (so untracked check scripts count too):

loop: make-it-pass
task: Make the suite pass. Do not touch the tests.
done_when: "pytest -q"
protected:
  - "tests/**"
  - "checks/**"
budget: 20 runs

Lute compares those globs to their committed originals at the branch base. Before and after checks, and after each agent run, it re-hashes the watched materials plus trusted control files (the active manifest, whether lute.yaml or --file, and .lute/config.yaml). If an agent modifies, deletes, or newly adds a watched file, Lute saves the attempted edit under .lute/quarantine/<id>/, restores the trusted copy, and leaves the quarantined edit out of the normal run commit. The next prompt names the quarantine record so the agent can fix the actual work instead of redefining the exam. Inspect records with:

lute quarantine
lute quarantine diff <id>
lute quarantine drop <id>     # or: lute quarantine drop --all

The guard is opt-in per loop for protected: exam materials, while the active manifest and config file are trusted control inputs. lute once is fileless, so a committed lute.yaml is ordinary work there unless it is also listed under protected:. done_when checks still run host-side; this protects the exam's materials and control inputs from model-facing commands. lute lint warns when a protected: glob matches nothing and when an inferable local check file is not covered by protected:.

The cage (cage:)

By default an agent shares your filesystem and can read ~/.ssh. Set cage: docker in .lute/config.yaml and every command lute runs on behalf of a model runs inside a container: agents see only your repo (read-write at /work) and whatever you name explicitly; judges see only an empty working directory and the stdin diff. done_when checks stay on the host (they're yours and need your toolchain):

# .lute/config.yaml
cage: docker                # or a custom template (podman) using {repo} {image} {cmd} {mounts}
cage_image: my-agent-cage   # YOUR image; it must contain your agent CLI
cage_mounts:                # extra host paths, mounted read-only, by name
  - "~/.config/my-agent"    # agent auth enters here; never implicitly

A done_when: "judge: <rubric>" runs the same oracle you can invoke by hand as lute judge -- "<rubric>" (the check runner resolves the prefix to that grader in-process rather than through your shell). Either way the judge keeps an ordinary check's contract: the verdict is an exit code, like every other done_when. The judge sends the rubric as trusted instructions and wraps the candidate diff inside BEGIN UNTRUSTED DIFF / END UNTRUSTED DIFF markers. The judge runs in an empty working directory and grades the stdin diff only; if a grader must read or execute working files, make that a normal shell done_when instead. The judge is told to treat diff content as evidence only, never as instructions, and closes only when stdout's first line is exactly PASS and the command exits 0.

The prompt still flows on stdin; output still lands in the same per-run log. Secrets policy is absence: nothing of the host is visible except the repo and what cage_mounts names, so ~/.ssh and your environment simply aren't there. The image is yours to build: it must contain your agent CLI, and auth enters read-only through cage_mounts, by name, never implicitly. contrib/cage/Dockerfile is a worked Codex sample (node:20-slim + @openai/codex + git). The built-in cage: docker template can still reach the network so model CLIs can call their APIs. Custom cage templates are operator-owned: if egress isolation matters, supply your own template with the network policy you want, for example a Docker template that includes --network none.

The same isolation protects Lute's own answer-auth key. Answered cards can refresh a loop's budget once, and gated cards seal human approval. If agents are uncaged, those mechanisms are useful operator workflow, not adversarial security boundaries; lint warns for budget-refreshable loops and errors for human gates. A no-op template such as sh -lc {cmd} is still uncaged in practice: it runs as you and can read the answer-auth key.

Schedules (cron, not a daemon)

A top-level schedules: section is a manifest, never a runtime:

schedules:
  - run: react-19        # root-level loops only
    at: "0 9 * * *"

lute cron sync compiles it into a managed block in your crontab (# BEGIN lute <repo> … # END lute <repo>), idempotent and removable with lute cron remove. Each tick is a fresh lute run --skip-if-running <root-id>: if a previous tick still holds the repo lock, the new tick exits 0 without starting duplicate work. Loops themselves never gain a time field. Note: cron jobs run with a minimal environment; make sure your agent CLI is on cron's PATH, and check mail (or wrap the entry) for tick output.

What's deliberately not here

The initial release is the small durable primitive: foreground, branch-only, fast-check-first (parallel siblings are opt-in per parent, but a lone loop still runs as one plain process). lute plan --dag does not add runtime DAG syntax, automatic graph scheduling, or a depends_on manifest key. The verdict cache, cron-resumed ticks on an always-on box, merge gates, agent-resolved merge conflicts, registry, and richer cage policy are deliberately outside the initial release. They should enter only when a real loop fails without them, and only if they add no required fields.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

luteloops-0.2.1.tar.gz (498.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

luteloops-0.2.1-py3-none-any.whl (121.6 kB view details)

Uploaded Python 3

File details

Details for the file luteloops-0.2.1.tar.gz.

File metadata

  • Download URL: luteloops-0.2.1.tar.gz
  • Upload date:
  • Size: 498.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for luteloops-0.2.1.tar.gz
Algorithm Hash digest
SHA256 348f092191993cdcc89bf278cca9251d52a2471af8a552cd07ee8697b4d18711
MD5 191a609f4a4962df340ede28d6dcc75d
BLAKE2b-256 690cd7ef85d7653cfec6d1933a3b9becb25fd4895f89b20b56edf4c626bd916d

See more details on using hashes here.

Provenance

The following attestation bundles were made for luteloops-0.2.1.tar.gz:

Publisher: publish.yml on jethros-projects/luteloops

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file luteloops-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: luteloops-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 121.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for luteloops-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 009ce5f634534cc75cdd295add1bf0fde3829628a11e164f547cb5e362aee058
MD5 0b7e3c1ef965aa73bb9dfd475b55e61f
BLAKE2b-256 420852e54e1c8a879c91078f427116740232fc0d4d9d8dec445c946ecd5c0688

See more details on using hashes here.

Provenance

The following attestation bundles were made for luteloops-0.2.1-py3-none-any.whl:

Publisher: publish.yml on jethros-projects/luteloops

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page