Adversarial multi-agent development harness: PRD -> plan -> phased implementation, every artifact reviewed adversarially

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

johnpletka

These details have not been verified by PyPI

Project description

Gauntlet

Adversarial multi-agent development harness. Every artifact — PRD, plan, and each implementation phase — runs the gauntlet of adversarial review before it ships: a builder agent implements, an independent reviewer agent attacks the result, a cheap triage model sorts the findings, the builder fixes, and the reviewer confirms the fix against the diff. A localhost judge service gates every tool call the agents make, failing closed.

The canonical spec is PRD-gauntlet.md. The bootstrap plan is runs/gauntlet/plan.md.

Status: the bootstrap is complete — Gauntlet was built by running its own pipeline against itself (phases P1–P7, each adversarially reviewed and human-ratified). It is usable on other repositories via the steps below.

How it works
Prerequisites
Install
- macOS / Linux
- Windows
Configure credentials
Quick start (≤ 3 commands)
The run lifecycle
Command reference
Configuration
Safety model
Development
Troubleshooting

How it works

A pipeline (YAML) is a sequence of stages; each stage is built from a few step types:

Step type	What it does
`agent_task`	The builder implements a phase in the working tree.
`shell`	Runs a command (e.g. the test suite) as a hard gate.
`commit`	Commits the phase with an enforced message format.
`adversarial_cycle`	review → triage → fix → confirm, looped to convergence.
`human_gate`	Pauses the run for a human to `approve` / `reject`.

The central invariant is that the working tree is clean and committed at every point where control passes to the reviewer — this is what makes review diffs meaningful and kill -9 resume safe.

Two pipelines ship by default: standard (for real work) and bootstrap (the self-hosting pipeline used to build Gauntlet itself).

Prerequisites

Gauntlet is a thin orchestrator that drives external agent CLIs and model APIs. You need:

Requirement	Why	Notes
Python ≥ 3.10	runtime	Managed for you by `uv`.
`uv`	install + run	The only build/run tool you install by hand.
`claude` CLI (Claude Code)	the builder agent	Must be installed and authenticated.
`codex` CLI (Codex CLI)	the reviewer agent	Must be installed and authenticated.
`OPENAI_API_KEY`	triage / judge / escalation tiers	Default config uses `gpt-5-mini` (triage, judge) and `gpt-5` (escalation) via LiteLLM.

The default agent profiles are: builder = claude (model opus), reviewer = codex (model gpt-5.5), triage/judge = gpt-5-mini, escalation = gpt-5. You can repoint any tier to a different provider in config (see Configuration); ANTHROPIC_API_KEY / GEMINI_API_KEY are only needed if you switch the API tiers to those providers.

Install

macOS / Linux

1. Install uv (if you don't have it):

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Install the agent CLIs and sign in to each (follow each tool's own docs):

# Claude Code (builder) — see https://docs.claude.com/en/docs/claude-code
claude --version        # confirm it's on PATH
claude /login           # or however your install authenticates

# Codex CLI (reviewer) — see https://github.com/openai/codex
codex --version
codex login

3. Install Gauntlet as a global tool:

uv tool install gauntlet-spec       # from PyPI; or the git URL below for HEAD
# uv tool install git+https://github.com/johnpletka/gauntlet.git
gauntlet version

The PyPI package is gauntlet-spec, not gauntlet. The bare name gauntlet on PyPI is an unrelated (and broken) project. The installed command is still gauntlet — only the install name differs.

Python 3.10+ is required. If your default interpreter is older, uv will refuse with does not satisfy Python>=3.10. Add --python 3.10 (or newer) to the command and uv will fetch a suitable interpreter automatically.

This puts two console scripts on your PATH: gauntlet (the CLI) and gauntlet-judge-hook (the per-tool-call safety hook, wired automatically by gauntlet init).

Windows

Gauntlet itself is pure Python and runs natively on Windows via uv. Use PowerShell.

1. Install uv:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Install and authenticate the agent CLIs. Install claude (Claude Code) and codex per their official docs and confirm each is on your PATH:

claude --version
codex --version

Note on the agent CLIs: if a given CLI does not yet ship a native Windows build, install Gauntlet and that CLI inside WSL2 (Ubuntu) and follow the macOS / Linux steps there instead. The orchestrator, judge service (loopback HTTP on 127.0.0.1), and hooks are all cross-platform; the only platform-sensitive dependency is the agent CLIs themselves.

3. Install Gauntlet:

uv tool install gauntlet-spec
# or, for HEAD: uv tool install "git+https://github.com/johnpletka/gauntlet.git"
gauntlet version

The PyPI package is gauntlet-spec, not gauntlet — the bare name is an unrelated, broken project. The command is still gauntlet. If uv reports does not satisfy Python>=3.10, append --python 3.10 (or newer) and it will fetch a compatible interpreter.

Configure credentials

The API tiers (triage, judge, escalation) read credentials from the environment only — never from repo config (so keys never get committed).

macOS / Linux (add to ~/.zshrc / ~/.bashrc to persist):

export OPENAI_API_KEY="sk-..."

Windows — PowerShell (current session):

$env:OPENAI_API_KEY = "sk-..."

Windows — persist across sessions:

setx OPENAI_API_KEY "sk-..."
# then open a new terminal

Run gauntlet doctor (below) to verify everything resolves before your first run.

Quick start (≤ 3 commands)

From the repository you want Gauntlet to work on:

gauntlet init        # 1. scaffold config, pipeline, prompts, policy + wire hooks (idempotent)
gauntlet doctor      # 2. validate CLIs, auth, hook wiring, judge, API keys
gauntlet new myfeat  # 3a. scaffold runs/myfeat/ with a PRD stub
#    ...author runs/myfeat/prd.md...
gauntlet run myfeat  # 3b. start the pipeline

If the repository already carries committed Gauntlet assets (a teammate ran init before you), you only need to wire this machine's hooks:

gauntlet init --from-repo

gauntlet doctor reports actionable, per-check status — installed CLI versions vs. the verified pin file (.gauntlet/pins.yaml), authentication, hook wiring, judge startability, and ApiAdapter keys — and exits non-zero on any blocker.

The run lifecycle

A run advances automatically until it hits a human_gate, then parks for your decision:

gauntlet run myfeat              # start (parks at the first gate)
gauntlet status myfeat           # see current step + every step's state
gauntlet approve myfeat          # accept the parked gate; drive to the next one
gauntlet reject myfeat --notes "…"   # send the phase back for another fix round
gauntlet resume myfeat           # resume after an interruption (kill -9 safe)
gauntlet report myfeat           # per-step / per-agent cost + token breakdown

Interrupted runs are resumable. State lives in the run's manifest.json; gauntlet resume re-enters at the last incomplete step. A step that wrote a dirty tree before dying is parked or reset rather than re-run blindly.
Approved artifacts are immutable. A later phase that finds an approved PRD/plan incomplete halts and surfaces the conflict rather than amending it.
At the final gate a PR.md draft is written under runs/<slug>/ (it is not opened or pushed — that stays a human action).
After a run, gauntlet feedback <slug> captures your retrospective notes and triage corrections to feed the self-improvement loop.

Command reference

Command	Purpose
`gauntlet init [--from-repo]`	Scaffold config/pipeline/prompts/policy + wire hooks (idempotent).
`gauntlet doctor`	Validate environment: CLIs, auth, hooks, judge, keys.
`gauntlet new <slug>`	Scaffold `runs/<slug>/` with a PRD stub.
`gauntlet run <slug> [--pipeline standard\|bootstrap] [--no-judge]`	Start a run on branch `gauntlet/<slug>`.
`gauntlet status <slug>`	Show run status and each step's state.
`gauntlet approve <slug> [--gate ID] [--notes …]`	Approve a parked gate, continue the run.
`gauntlet reject <slug> --notes … [--gate ID]`	Reject a parked gate.
`gauntlet resume <slug>`	Resume an interrupted run at its last incomplete step.
`gauntlet abort <slug>`	Abort a run.
`gauntlet report <slug>`	Per-step / per-agent-profile cost breakdown.
`gauntlet feedback <slug>`	Capture human feedback + triage corrections (FR-6.1).
`gauntlet rollback <slug> --phase N`	Reset the branch + manifest to a phase boundary (guarded).
`gauntlet judge serve [...]`	Run the localhost judge service (normally engine-managed).
`gauntlet version`	Print the installed version.

--no-judge disables the safety judge and is for testing only — it leaves agent tool calls ungated. Don't use it on real work.

Configuration

gauntlet init writes a .gauntlet/ directory in your repo:

.gauntlet/config.yaml — agent profiles (adapter + model + flags), per-agent commit identities, run timeouts and budgets. References models, not credentials.
.gauntlet/pins.yaml — the CLI versions and exact flags verified by the contract suite; doctor checks the installed CLIs against it.

Pipelines live in pipelines/*.yaml; prompt templates (versioned data, not code) in prompts/; the judge fast-path allow/deny rules in policy.yaml.

To repoint a tier at a different provider, edit the agent profile's adapter and model in .gauntlet/config.yaml and set that provider's key in your environment (e.g. ANTHROPIC_API_KEY for an anthropic/* model). LiteLLM model naming applies to api adapter profiles.

Safety model

Agent tool calls (e.g. the builder's shell commands and file writes) pass through a PreToolUse hook → localhost judge service. The judge decides via a deterministic policy fast-path, then an LLM classifier rung, and fails closed (deny) on timeout, parse error, or any unexpected outcome.
The judge binds 127.0.0.1 only and rejects callers lacking the per-run token. Every decision is written to an audit log.
The reviewer runs read-only (codex sandbox read-only); any worktree mutation by a reviewer is a detected process violation.
Permission-bypass flags (e.g. --dangerously-skip-permissions) are rejected by config lint — they would disable the hook layer.

Development

Working on Gauntlet itself:

uv sync                       # create the venv, install deps + package (editable)
uv run pytest                 # unit suite (no credentials required)
uv run pytest -m integration  # contract tests against live CLIs/APIs (needs creds)
uv run gauntlet doctor        # validate your dev environment

uv run pytest runs unit tests only; the integration marker selects the live contract suite, which requires authenticated CLIs and API keys.

Troubleshooting

gauntlet errors with ModuleNotFoundError: No module named 'gauntlet' (or gauntlet.main) — you installed the unrelated PyPI package via uv tool install gauntlet. Run uv tool uninstall gauntlet, then reinstall the correct package: uv tool install gauntlet-spec (add --python 3.10 if your default interpreter is older).
gauntlet-judge-hook: command not found during a run — the hook console script isn't on the PATH the agent CLI sees. Re-run gauntlet init (or gauntlet init --from-repo) and confirm uv tool's bin directory is on your PATH (uv tool update-shell, then open a new terminal).
doctor reports a stale CLI version — your installed claude / codex differs from .gauntlet/pins.yaml. Re-verify with the integration suite, or update the pin file if the new version is intended.
A run parks unexpectedly / a step is failed — gauntlet status <slug> shows where; the step's transcript under runs/<slug>/<run>/steps/ has the detail. gauntlet resume <slug> re-enters safely once the cause is cleared.
An agent hits a provider session/usage limit mid-step — the engine fails the step closed (it does not fake success). Wait for the limit to reset, then gauntlet resume <slug>.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

johnpletka

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.4

Jun 15, 2026

0.1.2

Jun 14, 2026

This version

0.1.1

Jun 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gauntlet_spec-0.1.1.tar.gz (1.0 MB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gauntlet_spec-0.1.1-py3-none-any.whl (183.8 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file gauntlet_spec-0.1.1.tar.gz.

File metadata

Download URL: gauntlet_spec-0.1.1.tar.gz
Upload date: Jun 14, 2026
Size: 1.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gauntlet_spec-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`dbbe7a310fd71059f26609777e311db702f71fdee735945f4a7b1b5011d64595`
MD5	`4d84bc975f4a4a08fa6e22a38c4f1d0f`
BLAKE2b-256	`b605b486da2294b91de3492f4d787c42175fbc4f97d7c5c103e68f5c700766fd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gauntlet_spec-0.1.1.tar.gz:

Publisher: release.yml on johnpletka/gauntlet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gauntlet_spec-0.1.1.tar.gz
- Subject digest: dbbe7a310fd71059f26609777e311db702f71fdee735945f4a7b1b5011d64595
- Sigstore transparency entry: 1819615465
- Sigstore integration time: Jun 14, 2026
Source repository:
- Permalink: johnpletka/gauntlet@c5544b8b251cc99ca8e62ac76c611766ef7fefaf
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/johnpletka
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c5544b8b251cc99ca8e62ac76c611766ef7fefaf
- Trigger Event: push

File details

Details for the file gauntlet_spec-0.1.1-py3-none-any.whl.

File metadata

Download URL: gauntlet_spec-0.1.1-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 183.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gauntlet_spec-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b11fb4cf0b41a1514695858e6395ff8529f2fb81de37c4fb5fc27beac3c8ae8e`
MD5	`fafe8ca29c3ffe288f12e38493815b6b`
BLAKE2b-256	`490249ebcc99b572c7b6dda16a3974fe3ee78caaae73cf033b8f5579cb7bb9ec`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gauntlet_spec-0.1.1-py3-none-any.whl:

Publisher: release.yml on johnpletka/gauntlet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gauntlet_spec-0.1.1-py3-none-any.whl
- Subject digest: b11fb4cf0b41a1514695858e6395ff8529f2fb81de37c4fb5fc27beac3c8ae8e
- Sigstore transparency entry: 1819615493
- Sigstore integration time: Jun 14, 2026
Source repository:
- Permalink: johnpletka/gauntlet@c5544b8b251cc99ca8e62ac76c611766ef7fefaf
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/johnpletka
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c5544b8b251cc99ca8e62ac76c611766ef7fefaf
- Trigger Event: push

gauntlet-spec 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Gauntlet

Table of contents

How it works

Prerequisites

Install

macOS / Linux

Windows

Configure credentials

Quick start (≤ 3 commands)

The run lifecycle

Command reference

Configuration

Safety model

Development

Troubleshooting

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance