Adversarial multi-agent development harness: PRD -> plan -> phased implementation, every artifact reviewed adversarially
Project description
Gauntlet
Adversarial multi-agent development harness. Every artifact — PRD, plan, and each implementation phase — runs the gauntlet of adversarial review before it ships: a builder agent implements, an independent reviewer agent attacks the result, a cheap triage model sorts the findings, the builder fixes, and the reviewer confirms the fix against the diff. A localhost judge service gates every tool call the agents make, failing closed.
The canonical spec is PRD-gauntlet.md. The bootstrap plan
is runs/gauntlet/plan.md.
Status: the bootstrap is complete — Gauntlet was built by running its own pipeline against itself (phases P1–P7, each adversarially reviewed and human-ratified). It is usable on other repositories via the steps below.
Table of contents
- How it works
- Prerequisites
- Install
- Configure credentials
- Quick start (≤ 3 commands)
- The run lifecycle
- Command reference
- Configuration
- Safety model
- Development
- Troubleshooting
How it works
A pipeline (YAML) is a sequence of stages; each stage is built from a few step types:
| Step type | What it does |
|---|---|
agent_task |
The builder implements a phase in the working tree. |
shell |
Runs a command (e.g. the test suite) as a hard gate. |
commit |
Commits the phase with an enforced message format. |
adversarial_cycle |
review → triage → fix → confirm, looped to convergence. |
human_gate |
Pauses the run for a human to approve / reject. |
The central invariant is that the working tree is clean and committed at
every point where control passes to the reviewer — this is what makes review
diffs meaningful and kill -9 resume safe.
Two pipelines ship by default: standard (for real work) and bootstrap (the
self-hosting pipeline used to build Gauntlet itself).
Prerequisites
Gauntlet is a thin orchestrator that drives external agent CLIs and model APIs. You need:
| Requirement | Why | Notes |
|---|---|---|
| Python ≥ 3.10 | runtime | Managed for you by uv. |
uv |
install + run | The only build/run tool you install by hand. |
claude CLI (Claude Code) |
the builder agent | Must be installed and authenticated. |
codex CLI (Codex CLI) |
the reviewer agent | Must be installed and authenticated. |
OPENAI_API_KEY |
triage / judge / escalation tiers | Default config uses gpt-5-mini (triage, judge) and gpt-5 (escalation) via LiteLLM. |
The default agent profiles are: builder = claude (model opus), reviewer =
codex (model gpt-5.5), triage/judge = gpt-5-mini, escalation = gpt-5.
You can repoint any tier to a different provider in config (see
Configuration); ANTHROPIC_API_KEY / GEMINI_API_KEY are
only needed if you switch the API tiers to those providers.
Install
macOS / Linux
1. Install uv (if you don't have it):
curl -LsSf https://astral.sh/uv/install.sh | sh
2. Install the agent CLIs and sign in to each (follow each tool's own docs):
# Claude Code (builder) — see https://docs.claude.com/en/docs/claude-code
claude --version # confirm it's on PATH
claude /login # or however your install authenticates
# Codex CLI (reviewer) — see https://github.com/openai/codex
codex --version
codex login
3. Install Gauntlet as a global tool:
uv tool install gauntlet-spec # from PyPI; or the git URL below for HEAD
# uv tool install git+https://github.com/johnpletka/gauntlet.git
gauntlet version
The PyPI package is
gauntlet-spec, notgauntlet. The bare namegauntleton PyPI is an unrelated (and broken) project. The installed command is stillgauntlet— only the install name differs.
Python 3.10+ is required. If your default interpreter is older,
uvwill refuse withdoes not satisfy Python>=3.10. Add--python 3.10(or newer) to the command anduvwill fetch a suitable interpreter automatically.
This puts two console scripts on your PATH: gauntlet (the CLI) and
gauntlet-judge-hook (the per-tool-call safety hook, wired automatically by
gauntlet init).
Windows
Gauntlet itself is pure Python and runs natively on Windows via uv. Use
PowerShell.
1. Install uv:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
2. Install and authenticate the agent CLIs. Install claude (Claude Code)
and codex per their official docs and confirm each is on your PATH:
claude --version
codex --version
Note on the agent CLIs: if a given CLI does not yet ship a native Windows build, install Gauntlet and that CLI inside WSL2 (Ubuntu) and follow the macOS / Linux steps there instead. The orchestrator, judge service (loopback HTTP on
127.0.0.1), and hooks are all cross-platform; the only platform-sensitive dependency is the agent CLIs themselves.
3. Install Gauntlet:
uv tool install gauntlet-spec
# or, for HEAD: uv tool install "git+https://github.com/johnpletka/gauntlet.git"
gauntlet version
The PyPI package is
gauntlet-spec, notgauntlet— the bare name is an unrelated, broken project. The command is stillgauntlet. Ifuvreportsdoes not satisfy Python>=3.10, append--python 3.10(or newer) and it will fetch a compatible interpreter.
Configure credentials
The API tiers (triage, judge, escalation) read credentials from the environment only — never from repo config (so keys never get committed).
macOS / Linux (add to ~/.zshrc / ~/.bashrc to persist):
export OPENAI_API_KEY="sk-..."
Windows — PowerShell (current session):
$env:OPENAI_API_KEY = "sk-..."
Windows — persist across sessions:
setx OPENAI_API_KEY "sk-..."
# then open a new terminal
Run gauntlet doctor (below) to verify everything resolves before your first
run.
Quick start (≤ 3 commands)
From the repository you want Gauntlet to work on:
gauntlet init # 1. scaffold config, pipeline, prompts, policy + wire hooks (idempotent)
gauntlet doctor # 2. validate CLIs, auth, hook wiring, judge, API keys
gauntlet new myfeat # 3a. scaffold runs/myfeat/ with a PRD stub
# ...author runs/myfeat/prd.md...
gauntlet run myfeat # 3b. start the pipeline
If the repository already carries committed Gauntlet assets (a teammate ran
init before you), you only need to wire this machine's hooks:
gauntlet init --from-repo
gauntlet doctor reports actionable, per-check status — installed CLI versions
vs. the verified pin file (.gauntlet/pins.yaml), authentication, hook wiring,
judge startability, and ApiAdapter keys — and exits non-zero on any blocker.
The run lifecycle
A run advances automatically until it hits a human_gate, then parks for
your decision:
gauntlet run myfeat # start (parks at the first gate)
gauntlet status myfeat # see current step + every step's state
gauntlet approve myfeat # accept the parked gate; drive to the next one
gauntlet reject myfeat --notes "…" # send the phase back for another fix round
gauntlet resume myfeat # resume after an interruption (kill -9 safe)
gauntlet report myfeat # per-step / per-agent cost + token breakdown
- Interrupted runs are resumable. State lives in the run's
manifest.json;gauntlet resumere-enters at the last incomplete step. A step that wrote a dirty tree before dying is parked or reset rather than re-run blindly. - Approved artifacts are immutable. A later phase that finds an approved PRD/plan incomplete halts and surfaces the conflict rather than amending it.
- At the final gate a
PR.mddraft is written underruns/<slug>/(it is not opened or pushed — that stays a human action). - After a run,
gauntlet feedback <slug>captures your retrospective notes and triage corrections to feed the self-improvement loop.
Command reference
| Command | Purpose |
|---|---|
gauntlet init [--from-repo] |
Scaffold config/pipeline/prompts/policy + wire hooks (idempotent). |
gauntlet doctor |
Validate environment: CLIs, auth, hooks, judge, keys. |
gauntlet new <slug> |
Scaffold runs/<slug>/ with a PRD stub. |
gauntlet run <slug> [--pipeline standard|bootstrap] [--no-judge] |
Start a run on branch gauntlet/<slug>. |
gauntlet status <slug> |
Show run status and each step's state. |
gauntlet approve <slug> [--gate ID] [--notes …] |
Approve a parked gate, continue the run. |
gauntlet reject <slug> --notes … [--gate ID] |
Reject a parked gate. |
gauntlet resume <slug> |
Resume an interrupted run at its last incomplete step. |
gauntlet abort <slug> |
Abort a run. |
gauntlet report <slug> |
Per-step / per-agent-profile cost breakdown. |
gauntlet feedback <slug> |
Capture human feedback + triage corrections (FR-6.1). |
gauntlet rollback <slug> --phase N |
Reset the branch + manifest to a phase boundary (guarded). |
gauntlet judge serve [...] |
Run the localhost judge service (normally engine-managed). |
gauntlet version |
Print the installed version. |
--no-judge disables the safety judge and is for testing only — it leaves
agent tool calls ungated. Don't use it on real work.
Configuration
gauntlet init writes a .gauntlet/ directory in your repo:
.gauntlet/config.yaml— agent profiles (adapter + model + flags), per-agent commit identities, run timeouts and budgets. References models, not credentials..gauntlet/pins.yaml— the CLI versions and exact flags verified by the contract suite;doctorchecks the installed CLIs against it.
Pipelines live in pipelines/*.yaml; prompt templates (versioned data, not
code) in prompts/; the judge fast-path allow/deny rules in policy.yaml.
To repoint a tier at a different provider, edit the agent profile's adapter
and model in .gauntlet/config.yaml and set that provider's key in your
environment (e.g. ANTHROPIC_API_KEY for an anthropic/* model). LiteLLM
model naming applies to api adapter profiles.
Safety model
- Agent tool calls (e.g. the builder's shell commands and file writes) pass through a PreToolUse hook → localhost judge service. The judge decides via a deterministic policy fast-path, then an LLM classifier rung, and fails closed (deny) on timeout, parse error, or any unexpected outcome.
- The judge binds
127.0.0.1only and rejects callers lacking the per-run token. Every decision is written to an audit log. - The reviewer runs read-only (codex sandbox
read-only); any worktree mutation by a reviewer is a detected process violation. - Permission-bypass flags (e.g.
--dangerously-skip-permissions) are rejected by config lint — they would disable the hook layer.
Development
Working on Gauntlet itself:
uv sync # create the venv, install deps + package (editable)
uv run pytest # unit suite (no credentials required)
uv run pytest -m integration # contract tests against live CLIs/APIs (needs creds)
uv run gauntlet doctor # validate your dev environment
uv run pytest runs unit tests only; the integration marker selects the live
contract suite, which requires authenticated CLIs and API keys.
Troubleshooting
gauntleterrors withModuleNotFoundError: No module named 'gauntlet'(orgauntlet.main) — you installed the unrelated PyPI package viauv tool install gauntlet. Runuv tool uninstall gauntlet, then reinstall the correct package:uv tool install gauntlet-spec(add--python 3.10if your default interpreter is older).gauntlet-judge-hook: command not foundduring a run — the hook console script isn't on the PATH the agent CLI sees. Re-rungauntlet init(orgauntlet init --from-repo) and confirmuv tool's bin directory is on your PATH (uv tool update-shell, then open a new terminal).doctorreports a stale CLI version — your installedclaude/codexdiffers from.gauntlet/pins.yaml. Re-verify with the integration suite, or update the pin file if the new version is intended.- A run parks unexpectedly / a step is
failed—gauntlet status <slug>shows where; the step's transcript underruns/<slug>/<run>/steps/has the detail.gauntlet resume <slug>re-enters safely once the cause is cleared. - An agent hits a provider session/usage limit mid-step — the engine fails
the step closed (it does not fake success). Wait for the limit to reset, then
gauntlet resume <slug>.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gauntlet_spec-0.1.1.tar.gz.
File metadata
- Download URL: gauntlet_spec-0.1.1.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbbe7a310fd71059f26609777e311db702f71fdee735945f4a7b1b5011d64595
|
|
| MD5 |
4d84bc975f4a4a08fa6e22a38c4f1d0f
|
|
| BLAKE2b-256 |
b605b486da2294b91de3492f4d787c42175fbc4f97d7c5c103e68f5c700766fd
|
Provenance
The following attestation bundles were made for gauntlet_spec-0.1.1.tar.gz:
Publisher:
release.yml on johnpletka/gauntlet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gauntlet_spec-0.1.1.tar.gz -
Subject digest:
dbbe7a310fd71059f26609777e311db702f71fdee735945f4a7b1b5011d64595 - Sigstore transparency entry: 1819615465
- Sigstore integration time:
-
Permalink:
johnpletka/gauntlet@c5544b8b251cc99ca8e62ac76c611766ef7fefaf -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/johnpletka
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c5544b8b251cc99ca8e62ac76c611766ef7fefaf -
Trigger Event:
push
-
Statement type:
File details
Details for the file gauntlet_spec-0.1.1-py3-none-any.whl.
File metadata
- Download URL: gauntlet_spec-0.1.1-py3-none-any.whl
- Upload date:
- Size: 183.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b11fb4cf0b41a1514695858e6395ff8529f2fb81de37c4fb5fc27beac3c8ae8e
|
|
| MD5 |
fafe8ca29c3ffe288f12e38493815b6b
|
|
| BLAKE2b-256 |
490249ebcc99b572c7b6dda16a3974fe3ee78caaae73cf033b8f5579cb7bb9ec
|
Provenance
The following attestation bundles were made for gauntlet_spec-0.1.1-py3-none-any.whl:
Publisher:
release.yml on johnpletka/gauntlet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gauntlet_spec-0.1.1-py3-none-any.whl -
Subject digest:
b11fb4cf0b41a1514695858e6395ff8529f2fb81de37c4fb5fc27beac3c8ae8e - Sigstore transparency entry: 1819615493
- Sigstore integration time:
-
Permalink:
johnpletka/gauntlet@c5544b8b251cc99ca8e62ac76c611766ef7fefaf -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/johnpletka
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c5544b8b251cc99ca8e62ac76c611766ef7fefaf -
Trigger Event:
push
-
Statement type: