Fail-soft runner for YAML-defined agent workflows — drives the Claude CLI through a workflow graph unattended for days.
Project description
local-worker
A Dockerized agent controller that runs YAML-defined workflows using the Claude CLI. Each workflow is a graph of agent, script, and branch nodes. The controller walks the graph, renders Jinja2 prompts, invokes Claude or shell scripts, extracts JSON outputs, and writes run artifacts.
Intent
The local-worker exists to run long, multi-step agent workflows unattended — the design target is a single run that survives for a week without a human babysitting it. That goal drives the two defining properties of this tool:
- Resilience is the default, not a mode. A single flaky node (an empty
Claude response, a rate limit, a spending cap, an unparseable output) must
never crash the whole run. The runner retries transient failures, reframes the
prompt, and finally defaults a node's outputs so the graph advances to its
nextrather than aborting. See docs/GUARDRAILS.md for the full recovery ladder and its tuning knobs. - Reproducibility and isolation. The agent works against its own clones inside the container (never a host working tree), all state lives in persistent named volumes, and every step is recorded as a run artifact. A run can be resumed from its checkpoint after a crash or reboot.
It is repository-agnostic: the same image runs any workflow against any repo a
workflow's setup.sh chooses to clone.
Prerequisites
- Docker Desktop (or Docker Engine + Compose plugin)
- A logged-in Claude subscription on the host (
~/.claude/.credentials.jsonpresent — i.e. you have runclaudeand authenticated). This is the default auth path and matches what your interactive Claude CLI uses.
No Python, uv, or Claude CLI installation is required on the host — everything runs inside the container.
Authentication
By default the worker uses your Claude subscription. At startup
entrypoint.sh seeds ~/.claude/.credentials.json from the host (mounted
read-only) into the persistent claude-state volume once; the CLI then
refreshes/rotates the token in-volume across runs and reboots. A minimal
~/.claude.json onboarding stub is written so headless runs don't prompt.
Alternatives:
- Long-lived OAuth token — run
claude setup-tokenon the host and exportCLAUDE_CODE_OAUTH_TOKENbeforerun.sh(or put it in a.envbesidecompose.yaml). This skips the credentials-file seed. - Bedrock — uncomment the
CLAUDE_CODE_USE_BEDROCK/AWS_PROFILEenv and the~/.awsmount incompose.yaml.
To re-seed credentials after re-authenticating on the host, clear the
claude-state volume (docker volume rm local-worker_claude-state).
Quick start
# From this directory
./run.sh ../workflows/hello-world
run.sh resolves the workflow path to absolute, validates that workflow.yaml exists, and launches the container via compose.yaml. Calling it with no arguments prints the available workflows.
Running any workflow
./run.sh <path-to-workflow-dir> [docker compose flags]
# Examples
./run.sh ../workflows/story-coder
./run.sh ../workflows/refactor
./run.sh ../workflows/delphi-ci
# Force a full image rebuild
./run.sh ../workflows/hello-world --build
# Workflows installed into a target repo by install.py
./run.sh /path/to/repo/.agents/workflows/story-coder
The workflow directory must contain a workflow.yaml file. Any prompts/ and scripts/ subdirectories are mounted alongside it and are accessible from within the container.
Environment variables
| Variable | Default | Description |
|---|---|---|
WORKFLOW_DIR |
(required, set by run.sh) |
Absolute path to the workflow directory |
CLAUDE_CODE_OAUTH_TOKEN |
(unset) | Optional long-lived OAuth token (claude setup-token); skips the credentials-file seed |
AGENT_RUNS_DIR |
/runs |
Where to write run artifacts (set to the persistent runs volume by compose.yaml) |
AGENT_CLI |
claude |
Which agent CLI drives the run: claude, codex, or copilot. Overridden by --cli. See Choosing the agent CLI backend |
AGENT_MODEL |
(unset) | Overrides every node's model for the run (a node's own model: still wins). Interpreted by the active backend |
CODEX_PROFILE |
(unset) | Run-level default codex config profile (e.g. openrouter, local). A node that names its own profile wins. Codex only |
AWS_PROFILE |
default |
AWS profile — only when using the Bedrock alternative |
Choosing the agent CLI backend
The controller drives one agent CLI per run, behind a backend facade
(workhorse/runner/backends.py). Selection is per-run, not per-node:
./run.sh ../workflows/story-coder # claude (default)
AGENT_CLI=codex ./run.sh ../workflows/story-coder
AGENT_CLI=copilot ./run.sh ../workflows/story-coder
# Direct controller invocation also accepts --cli {claude,codex,copilot}
| Backend | CLI | Default model | In-place compaction |
|---|---|---|---|
claude |
claude -p (stream-json) |
sonnet |
yes (/compact) |
codex |
codex exec --json |
CLI/profile default | no — ladder reframes on overflow |
copilot |
copilot -p --output-format json |
CLI default | no — ladder reframes on overflow |
Node model selection
A node's optional model: field is interpreted by the active backend. When unset,
the backend's own default applies (so workflows need not hard-code a Claude alias):
nodes:
- id: lead_review
type: agent
model: opus # claude: alias; codex: a config profile (see below)
Codex config profiles (<profile>@<model-slug>)
For the codex backend, model: selects a codex config profile
(from ~/.codex/config.toml) — which bundles provider, auth and a pinned model —
plus an optional model override, written as <profile>[@<model-slug>]. @ is the
delimiter because / and : already appear inside model slugs:
model: value |
Resulting codex flags |
|---|---|
local |
--profile local (the profile pins the model) |
openrouter@deepseek/deepseek-chat-v3.1 |
--profile openrouter -m deepseek/deepseek-chat-v3.1 |
openrouter@ |
--profile openrouter |
@gpt-5.5 |
-m gpt-5.5 (no profile; falls back to CODEX_PROFILE) |
| (unset) | CODEX_PROFILE if set, else codex's own default |
CODEX_PROFILE is the run-level default; a node's own <profile>@… always wins.
This lets one workflow tier per node — e.g. a lead node on
openrouter@anthropic/claude-sonnet-4.5 and bookkeeping nodes on local (a local
Qwen server) — the same way Claude nodes tier across opus/sonnet/haiku.
nodes:
- id: lead_review
type: agent
model: openrouter@anthropic/claude-sonnet-4.5
- id: record
type: agent
model: local # the local profile's pinned model
Profiles live in
~/.codex/config.toml. Each names amodel_provider(base_url+env_key) and a model; codex 0.128+ requireswire_api = "responses".
Mounts and volumes
| Source | Target | Type | Purpose |
|---|---|---|---|
~/.claude/.credentials.json |
/mnt/claude-credentials.json |
bind, read-only | Subscription auth — seeded into claude-state once at startup |
~/.claude/settings.json |
/mnt/claude-settings.json |
bind, read-only | Optional host Claude config (commented out by default) |
$WORKFLOW_DIR |
/workflow |
bind | Workflow definition (yaml, prompts, scripts) |
workspace volume |
/workspace |
named volume | Agent working tree — repo clones, branches, and commits; persists across reboots |
claude-state volume |
/claude-state |
named volume | Claude sessions + seeded credentials + onboarding stub; persists across reboots |
runs volume |
/runs |
named volume | Run artifacts; persists across reboots |
Persistence across reboots
All three named volumes (workspace, claude-state, runs) persist across
container restarts and host reboots, so the agent's work is never lost when the
container stops:
workspaceholds the cloned repo and the agent's committed branch (e.g.hrnet-research/auto). Even if a push out of the container fails, committed work survives here. (A workflow'ssetup.shtypicallyreset --hards the base branch on re-run, so commit work to a side branch — as the workflows do.)claude-statekeeps Claude session history and the refreshed auth token, isolated from your host installation. (Note: each node runs with a clean context — see "Sessions" under Development — so this is not one growing cross-node conversation.)runskeeps all run artifacts.
Resuming and run identity
The controller is auto-resume-in-place by default. Each (workflow, run-id)
pair maps to one stable run dir (<workflow>-<run-id>, run-id defaults to
default). On start the controller looks for a checkpoint there:
- No checkpoint → start fresh from the
startnode in that dir. - Checkpoint present → resume from the checkpointed node, restoring the saved context. A node that finished but didn't advance the cursor (killed in the gap) is fast-forwarded past rather than re-run, so side effects like git commits aren't duplicated.
This is what lets an unattended run survive a crash or reboot: relaunching the
same workflow continues where it left off. To start over, delete the run dir (or
the runs volume). To keep independent runs of the same workflow side by side,
pass distinct run ids.
Controller flags (passed to workhorse; --resume-* are manual overrides
of the auto behavior above):
| Flag | Purpose |
|---|---|
--run-id <id> |
Name the stable run dir (<workflow>-<id>); default default |
--resume-run <path-or-name> |
Resume a specific run dir from its checkpoint |
--resume-latest |
Resume the most recent unfinished run under --runs-dir |
--params '<json>' / --params-file <path> |
Override workflow vars on a fresh start |
"Survives reboot" therefore covers both the work products (commits, sessions, artifacts) and graph position — an interrupted graph auto-resumes mid-run.
Run artifacts
Each workflow execution writes a timestamped directory:
runs/
└── <workflow-name>-<timestamp>-<id>/
├── run.json # start/end time, terminal state
├── context.json # final context snapshot
├── <step-id>/
│ ├── prompt.md # rendered Jinja2 prompt sent to Claude
│ ├── output.json # extracted JSON outputs
│ └── context_after.json # context state after this step
└── <branch-id>/
└── branch.json # { path, value, next }
compose.yaml sets AGENT_RUNS_DIR=/runs so artifacts are written to the
persistent runs named volume (they survive reboots and don't pollute the
host working tree). To pull them out, copy from the volume — e.g. from the
assembler repo: make research-artifacts.
Repository isolation
The local-worker is repository-agnostic. Never add repo-specific bind mounts to compose.yaml — the agent must work against its own checkout of the target repository, not a host working tree.
If a workflow needs to operate on source code (read, edit, build, test), include a setup.sh script in the workflow directory. The script runs as the first node and clones the required repositories into the container at a known path (e.g. /workspace/<repo>). This ensures:
- The agent always works from a clean, versioned state
- No host working tree is mutated by accident
- The workflow is reproducible on any machine
See workflows/case-dev/scripts/setup.sh for an example.
Resetting state
# Wipe Claude session history + seeded credentials (re-seed auth on next run)
docker volume rm local-worker_claude-state
# Wipe all run artifacts in the volume
docker volume rm local-worker_runs
# Wipe the agent's working tree (clones/commits) — only if you want a clean clone
docker volume rm local-worker_workspace
# Wipe everything
docker compose down -v
Writing a workflow
A workflow is a directory with this layout:
my-workflow/
├── workflow.yaml # Graph definition
├── prompts/ # Jinja2 .md templates
│ └── step.md
└── scripts/ # Shell or Python scripts (must output JSON to stdout)
└── check.sh
workflow.yaml schema:
name: my-workflow
vars:
my_var: "default value" # Initial context variables
start: first_node
nodes:
- id: first_node
type: agent # agent | script | branch | terminal | fail
prompt: prompts/step.md
args:
key: "{{ my_var }}" # Jinja2 — rendered against context before sending
outputs:
- key: result # Extract this key from the agent's JSON response
default: {status: ok} # Optional: emitted if the node exhausts all retries
# (see "Unattended resilience" below). Unset → null.
next: check_result
- id: check_result
type: branch
path: result.status # Dot-path into context
cases:
ok: done
error: done
default: done
- id: done
type: terminal
Branch operators — in addition to cases (equality map), you can use conditions for numeric comparisons:
- id: decide
type: branch
path: result.count
conditions:
- op: ">="
value: "10"
next: bulk_path
default: single_path
Supported operators: ==, !=, <, >, <=, >=.
Agent prompts must output JSON containing the declared output keys:
Do the thing.
Output JSON only:
```json
{"result": {"status": "ok", "count": 5}}
**Scripts** receive Jinja2-rendered args as positional arguments and must print JSON to stdout:
```bash
#!/bin/bash
echo "{\"result\": {\"status\": \"ok\"}}"
Unattended resilience (output default)
Because runs are meant to survive a week without supervision, the controller
will, as a last resort, default an agent node's outputs and advance to next
rather than crash when Claude can't be coaxed into a usable answer (after
transient retries and prompt reframing — see docs/GUARDRAILS.md).
The runner is generic and doesn't know what your outputs mean, so you declare
the safe fallback per output via default:
outputs:
- key: decision
default: continue # branch-safe value if this node never answers
- key: review
default: {status: auto_approved}
- key: notes # no default → emitted as null
Choose defaults that keep the graph moving sensibly (e.g. a branch path that
lands on a safe route). An output with no default is emitted as null. To
disable defaulting entirely and hard-fail instead, set
AGENT_USE_DEFAULT_OUTPUTS=false.
Development
This section is for working on the controller itself (the Python that runs workflows), not on individual workflows.
Project layout
local-worker/
├── workhorse/ # The workhorse Python package (entrypoint: workhorse:main)
│ ├── main.py # CLI + the graph walk loop: checkpoint → run node → advance
│ ├── templates.py # Jinja2 rendering (resilient: missing vars render empty, not raise)
│ ├── artifacts.py # ArtifactWriter: run dir, checkpoints, per-step artifacts
│ ├── graph/
│ │ ├── nodes.py # Pydantic node models (AgentNode/ScriptNode/BranchNode/TerminalNode) + Graph
│ │ ├── loader.py # Parse + validate workflow.yaml into a Graph
│ │ └── context.py # WorkflowContext: the key→value bag + dot-path lookup for branches
│ └── runner/
│ ├── agent.py # Invoke Claude CLI; the retry → reframe → default resilience ladder
│ ├── script.py # Run a ScriptNode, capture JSON stdout
│ └── branch.py # Evaluate a BranchNode (cases / numeric conditions / default)
├── tests/ # Standalone test files (see below)
├── compose.yaml # Service, env, mounts, named volumes
├── Dockerfile # Ubuntu + uv + Claude CLI + the controller package
├── entrypoint.sh # Auth seeding, perms, exec `workhorse`
├── run.sh # Host launcher: resolve workflow dir, `docker compose up`
├── pyproject.toml / uv.lock # Python deps (jinja2, pyyaml, pydantic); managed with uv
├── README.md # This file (usage + development)
├── CLAUDE.md # Agent entry point; imports README.md + docs/
└── docs/
└── GUARDRAILS.md # The resilience/error-recovery design and env-var reference
How the controller works (the loop)
main.run() is a single loop over graph nodes. For each node it:
- Checkpoints the current node id + context (
ArtifactWriter.write_checkpoint) so a crash here is resumable. - Dispatches by node type to a runner:
runner/agent.py,runner/script.py, orrunner/branch.py. - Merges the node's outputs into the
WorkflowContext. - Writes a per-step artifact and advances
current_idtonode.next(or the branch target).
A terminal/fail node ends the loop. The resilience for agent nodes lives
entirely in runner/agent.py::run_agent — see docs/GUARDRAILS.md.
Sessions (per-node clean context)
Each node runs as a fresh prompt with a clean Claude context. The controller
does not chain one node's conversation into the next — node N does not inherit
node N‑1's messages. Concretely, run_agent drops any persisted .session_id
before a node's first attempt, and a reframed attempt also starts fresh.
The persisted session is --resumed in exactly one situation: continuing the
same node that was interrupted. When the controller resumes from a checkpoint
and re-enters a node that was killed mid-run (not fast-forwarded), it calls
run_agent(..., resume_session=True) for that one node so Claude picks up where
it left off; every node the run then advances to starts clean again.
Context overflow → compact & continue. If a node exhausts the model's
context window mid-run (the headless CLI returns instead of auto-compacting),
run_agent runs /compact on that node's session and retries the same prompt
on it, preserving the node's progress (bounded by AGENT_MAX_COMPACT_ATTEMPTS;
falls back to a fresh-session reframe if /compact can't help). Verified against
Claude Code 2.1.x. See the recovery ladder in docs/GUARDRAILS.md.
Not yet implemented: a configurable per-node turn limit (
--max-turns) that proactively compacts before the window is exhausted. Today compaction is reactive — triggered when an overflow is detected.
Running tests
Tests live in tests/ and are dependency-free: each file runs standalone
(python tests/test_x.py prints PASS/FAIL and exits non-zero on failure) and is
also pytest-compatible. There is no pytest in the venv by default; run them with
the project's Python:
# One file
.venv/bin/python tests/test_agent_recovery.py
# All of them
for t in tests/test_*.py; do .venv/bin/python "$t"; done
If a .venv isn't present, create one with uv sync (or uv run python tests/...).
Where to put tests. Add a tests/test_<area>.py, mirroring the existing
style: a if __name__ == "__main__" runner that iterates test_* functions, and
unit tests that patch the CLI boundary (_run_claude_cli / _invoke_claude) and
sleeping so nothing hits the network or waits in real time. Group by concern:
test_agent_cap.py (cap/transient handling), test_agent_recovery.py (reframe →
default ladder), test_branch_guardrail.py, test_resume_auto.py,
test_idempotency.py, test_templates_resilient.py.
Where docs go
- Tool/usage + development docs → this
README.md(root). - Design notes (resilience/error recovery, and any future deep-dives) →
docs/, e.g.docs/GUARDRAILS.md. Put new long-form design docs here rather than at the root. CLAUDE.md(root) is the agent entry point and stays at the root so Claude Code auto-loads it; it@-importsREADME.mdanddocs/GUARDRAILS.md.- Per-workflow docs → inside that workflow's own directory (under
../workflows/<name>/), not here. The controller is workflow-agnostic; keep workflow-specific knowledge with the workflow.
Keep these docs current when you change behavior — they are the contract for
operators running week-long jobs, and CLAUDE.md imports them, so updating them
keeps agent context accurate too.
Conventions
- Python 3.12,
from __future__ import annotationsat the top of each module. - Pydantic models for anything parsed from YAML (see
graph/nodes.py); add a new node type by extending the discriminatedNodeunion and handling it inmain.run()plus arunner/. - Fail soft for unattended runs. New failure paths in agent handling should slot into the existing retry → reframe → default ladder rather than raising, so one bad node can't end a week-long run. Reserve hard raises for genuinely unrecoverable, deterministic errors.
- Comments explain why. Match the existing density — the tricky invariants (checkpoint/fast-forward idempotency, cap-vs-transient classification) are documented inline; keep them that way.
Editing the container
The image bundles the Claude CLI and the controller package. After changing
Dockerfile, pyproject.toml, or anything that affects the image, rebuild:
./run.sh ../workflows/hello-world --build
Pure controller .py edits are picked up on the next run only after a rebuild
too, since workhorse/ is COPYd into the image (it is not bind-mounted).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file workhorse_agent-0.1.0.tar.gz.
File metadata
- Download URL: workhorse_agent-0.1.0.tar.gz
- Upload date:
- Size: 39.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5715fa0dd5e580ed4271abdc86cbc32ed1f528983fb866e7757fb66a9beb01fa
|
|
| MD5 |
3c44e1476b82cc3296f95a61d6adf753
|
|
| BLAKE2b-256 |
52b648d844d1963e048f2d33a9813d3ae8cc73d2afca61df07c53a47f2ede7f1
|
File details
Details for the file workhorse_agent-0.1.0-py3-none-any.whl.
File metadata
- Download URL: workhorse_agent-0.1.0-py3-none-any.whl
- Upload date:
- Size: 42.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf015177317940163a61bed276628f98aa371515938cb9d9c99673413391d861
|
|
| MD5 |
5d3dc3edc4d4bb2854495eb77397aa8d
|
|
| BLAKE2b-256 |
99d31480dc79400d7e4a2095078588b2f8206b8fadbb44b96d8ee493e164da57
|