Who guards the agents? A framework for orchestrating AI coding agents through verified implementation phases.
Project description
Juvenal
Quis custodiet ipsos agentes? — Who guards the agents?
Juvenal is a framework for orchestrating AI coding agents through verified implementation phases. It prevents agents from cheating on success criteria, helps agents implement complex projects in phases, etc.
The Problem
Agents such at giant problems. This is probably only a temporary problem, but for now, an AI agent given a massive problem will fumble it. It'll take shortcuts, lie, cheat, steal, the works.
The Solution
There's no honor among agents! Agent B feels no obligation to cover for some shortcut that Agent A made. This makes an implementation-verification loop with separate agents pretty effective for catching cut corners. When Agent B catches Agent A's shoddy work, Agent C can be spun up to implement fixes, and so on.
How It Works
A non-agentic Python script orchestrates AI coding agents (Claude or Codex) through alternating steps:
- Implementation — an agent executes a prompt to build/modify code
- Verification — separate checkers (scripts, agents, or both) verify the work
- Bounce — if verification fails, the pipeline bounces back (to a configurable target phase or the most recent implement phase) with failure context injected. A global bounce limit (
max_bounces) prevents infinite loops.
The implementing agent and the checking agent are separate processes, so the implementer can't cheat by weakening tests, etc.
Other Such Frameworks
Juvenal is conceptually similar to ralph, but it works slightly better for my exact purposes and reinventing the wheel is cheap now!
Install
pip install -e ".[dev]"
Claude Code Skill
Juvenal ships as a Claude Code plugin, so you can use it directly from Claude Code with /juvenal.
Install the plugin
From the marketplace (pending approval):
/plugin install juvenal
From source (works now):
claude --plugin-dir /path/to/juvenal/plugin
Usage
Once installed, invoke the skill in Claude Code:
/juvenal add authentication to the Flask app
Claude will create a Juvenal workflow for your goal and run it. You can also ask for help with workflow formats or run existing workflows.
Quick Start
# Scaffold a workflow
juvenal init my-project
# Run a workflow
juvenal run workflow.yaml
# Generate a workflow from a goal
juvenal plan "implement a REST API with tests" -o workflow.yaml
# Plan and immediately run
juvenal do "add authentication to the Flask app"
Workflow Formats
YAML
name: "my-workflow"
backend: claude
max_bounces: 999
phases:
- id: implement
prompt: "Implement the feature."
checkers:
- type: script
run: "pytest tests/ -x"
- type: agent
role: tester
Directory Convention
my-workflow/
phases/
01-setup/
prompt.md # implementation prompt
check-build.sh # script checker (exit 0 = pass)
check-quality.md # agent checker
02-implement/
prompt.md
check-tests.sh # paired with .md = composite
check-tests.md # gets {script_output} injected
Bare Markdown
phases/
01-setup.md # single phase, default tester checker
Checker Types
| Type | Description |
|---|---|
script |
Shell command; exit 0 = PASS, nonzero = FAIL |
agent |
AI agent that emits VERDICT: PASS or VERDICT: FAIL: reason |
composite |
Script runs first, output fed to agent via {script_output} |
Built-in Roles
Agent checkers can use built-in verification personas:
tester— runs tests, checks for build errorsarchitect— validates design, checks for circular dependenciespm— confirms requirements are met, no TODOs remainsenior-tester— checks test integrity, looks for cheatingsenior-engineer— reviews code quality, completeness, security
CLI
juvenal run <workflow> [--resume] [--rewind N] [--rewind-to PHASE_ID] [--phase X]
[--max-bounces N] [--backend claude|codex] [--dry-run]
[--backoff SECONDS] [--notify WEBHOOK_URL]
[--working-dir DIR] [--state-file PATH]
juvenal plan "goal" [-o output.yaml] [--backend claude|codex]
juvenal do "goal" [--backend claude|codex] [--max-bounces N]
juvenal status [--state-file path]
juvenal init [directory] [--template name]
juvenal validate <workflow>
Resume & Rewind
# Resume from last saved state
juvenal run workflow.yaml --resume
# Rewind 2 phases back from the resume point
juvenal run workflow.yaml --rewind 2
# Rewind to a specific phase by ID
juvenal run workflow.yaml --rewind-to setup
--rewind and --rewind-to implicitly load existing state (no need for --resume) and invalidate from the target phase onward so everything from that point gets re-executed.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file juvenal-0.13.2.tar.gz.
File metadata
- Download URL: juvenal-0.13.2.tar.gz
- Upload date:
- Size: 61.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d7b8bdd56bd7c2c512ceb1b658eaf0b019461a3d7a6b0502bdbcfe336a71068
|
|
| MD5 |
9306211786085219a960f66aa4de1d86
|
|
| BLAKE2b-256 |
c51615ac80421d6ff654bbf2c791f0838bccf71777e7ddc1ab13b89df8589504
|
File details
Details for the file juvenal-0.13.2-py3-none-any.whl.
File metadata
- Download URL: juvenal-0.13.2-py3-none-any.whl
- Upload date:
- Size: 54.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a11124c19f767937ce075393d98ebf03e1f54ef868fc4515be529fc2780bbf2
|
|
| MD5 |
4658ee80f02ffb876ed439c41a54f5cb
|
|
| BLAKE2b-256 |
e8c181c96723c11cc5bb203d860a6d6c9494928134c093c47e28b473656e8c42
|