Who guards the agents? A framework for orchestrating AI coding agents through verified implementation phases.

Project description

Juvenal

Quis agit ipsos agentes? — Who acts upon the agents?

Juvenal

Juvenal is a framework for orchestrating AI coding agents through verified implementation phases. It prevents agents from cheating on success criteria, helps agents implement complex projects in phases, etc.

The Problem

Agents such at giant problems. This is probably only a temporary problem, but for now, an AI agent given a massive problem will fumble it. It'll take shortcuts, lie, cheat, steal, the works.

The Solution

There's no honor among agents! Agent B feels no obligation to cover for some shortcut that Agent A made. This makes an implementation-verification loop with separate agents pretty effective for catching cut corners. When Agent B catches Agent A's shoddy work, Agent C can be spun up to implement fixes, and so on.

How It Works

A deterministic Python runtime orchestrates AI coding agents (Claude or Codex) through alternating steps:

Implementation — an agent executes a prompt to build/modify code
Verification — separate checker agents verify the work, and can run commands when instructed to do so
Bounce — if verification fails, the pipeline bounces back (to a configurable target phase or the most recent implement phase) with failure context injected. A global bounce limit (max_bounces) prevents infinite loops.

The implementing agent and the checking agent are separate processes, so the implementer can't cheat by weakening tests, etc.

Other Such Frameworks

Juvenal is conceptually similar to ralph, but it works slightly better for my exact purposes and reinventing the wheel is cheap now!

Install

pip install -e ".[dev]"

Claude Code Skill

Juvenal ships as a Claude Code plugin, so you can use it directly from Claude Code with /juvenal.

Install the plugin

From the marketplace (pending approval):

/plugin install juvenal

From source (works now):

claude --plugin-dir /path/to/juvenal/plugin

Usage

Once installed, invoke the skill in Claude Code:

/juvenal add authentication to the Flask app

Claude will create a Juvenal workflow for your goal and run it. You can also ask for help with workflow formats or run existing workflows.

Quick Start

# Scaffold a workflow
juvenal init my-project

# Run a workflow
juvenal run workflow.yaml

# Decompose a complex goal into implement phases, then run planner + your fixed checker stack
juvenal run --standard-checkers --phased-implementer 'software-engineer:add authentication to the Flask app'

# Generate a workflow from a goal
juvenal plan "implement a REST API with tests" -o workflow.yaml

# Plan and immediately run
juvenal do "add authentication to the Flask app"

Embedded API

Juvenal exposes an embedded Python API. The repository includes a toy example at tests/mockup.py:

import subprocess, tempfile
from juvenal.api import do, goal, plan_and_do

tmpdir = tempfile.mkdtemp()
subprocess.run(["git", "init"], cwd=tmpdir, check=True)
subprocess.run(["git", "commit", "--allow-empty", "-m", "init"], cwd=tmpdir, check=True)

with goal("build a toy todo CLI", working_dir=tmpdir):
    do("write example-brief.md describing the toy todo CLI", checker="pm")
    do(
        ["create sample-interactions.md with happy-path transcripts",
         "add edge-case transcripts to sample-interactions.md"],
        checkers=["security-engineer"],
    )
    do(
        ["derive acceptance-checklist.md from the prep artifacts",
         "expand the checklist with missing coverage",
         "add persisted-state scenarios to the checklist",
         "write smoke-test.sh for shell validation"],
        checkers=["tester", "senior-tester"],
    )
    plan_and_do("build the toy todo CLI in toy_app/")

Run it from a repo checkout: python -m tests.mockup

Workflow Formats

YAML

name: "my-workflow"
backend: claude
max_bounces: 999
backoff: 2.0        # exponential backoff between bounces (seconds)
max_backoff: 60.0   # cap on backoff delay
notify:
  - https://example.com/webhook

phases:
  - id: implement
    prompt: "Implement the feature."
    timeout: 300
    env:
      NODE_ENV: production
    checks:
      - prompt: |
          Run `pytest tests/ -x` from the working directory and use the result to verify the implementation.
          Do not modify code while checking.
          Emit `VERDICT: FAIL: <reason>` if the command fails; otherwise emit `VERDICT: PASS`.
      - tester                          # built-in role shorthand
      - role: senior-engineer           # role as dict
      - prompt: "Check for security."   # inline prompt

Directory Convention

my-workflow/
  phases/
    01-setup/
      prompt.md            # implementation prompt
    02-parallel/           # "parallel" in name → parallel lane group
      feature-a/           #   each subdir is a lane
        prompt.md          #     implement phase
        check.md           #     check phase (auto-bounces to implement)
      feature-b/
        prompt.md
        check.md
    03-finish/
      prompt.md

Commands belong in checker prompts. .sh files are not auto-loaded as phases.

Lanes can also use subdirectories for more complex pipelines:

02-parallel/
  a/
    01-implement/
      prompt.md
    02-check-review/
      prompt.md

Phase IDs are derived from directory names. In simple mode (lane has prompt.md at root): a, a~check-1, a~check-2. In complex mode (subdirectories): a~01-implement, a~02-check-review.

Bare Markdown

juvenal run task.md  # single implement phase from a .md file

Phase Types

Type	Description
`implement`	Agent executes a prompt to build/modify code (default)
`check`	Separate agent verifies work, emits `VERDICT: PASS` or `VERDICT: FAIL: reason`
`workflow`	Dynamic sub-workflow: plans and executes a sub-pipeline from the prompt

Workflow Phases

A workflow phase dynamically generates and executes a sub-pipeline. Useful for open-ended tasks where the exact phases aren't known ahead of time:

- id: dynamic-feature
  type: workflow
  prompt: "Build a REST API with authentication and tests."
  max_depth: 2  # recursion depth limit (default: 3)

Inline Checkers

Checks are defined inline on implement phases. Each entry can be:

Bare string — built-in role shorthand
role: NAME — agent checker with built-in role
role: NAME + prompt: TEXT — built-in role plus extra checker instructions
prompt: TEXT — agent checker with inline prompt
prompt_file: PATH — agent checker with prompt from file

Checkers can also carry timeout and env.

- id: implement
  prompt: "Build the feature."
  checks:
    - prompt: |
        Run `pytest tests/ -x` from the working directory and verify the result.
        Emit `VERDICT: FAIL: <reason>` on failure, otherwise emit `VERDICT: PASS`.
    - tester
    - role: senior-engineer
      prompt: "Focus on migration safety and rollback behavior."
    - prompt: "Check for security vulnerabilities."
    - prompt_file: checkers/review.md
    - prompt: "Run `npm run lint` and emit `VERDICT: PASS` only if it succeeds."
      timeout: 60
      env:
        CI: "true"

Built-in Roles

Agent checkers can use built-in verification personas:

always-fail — deliberately emits VERDICT: FAIL: keep going to force another implementation pass
tester — runs tests, checks for build errors
architect — validates design, checks for circular dependencies
pm — confirms requirements are met, no TODOs remain
senior-tester — checks test integrity, looks for cheating
senior-engineer — reviews code quality, completeness, security
security-engineer — reviews security boundaries, exploitability, and risky defaults
technical-writer — reviews technical accuracy, structure, and clarity of written output
llm-writing — reviews English prose for excessive evidence of AI-style writing and generic LLM phrasing
professor — reviews research vision, experiment design, and scientific rigor
grant-reviewer — reviews grant proposals for significance, feasibility, and reviewer-facing gaps

Implementer roles (via --implementer):

software-engineer — structured implementation approach
professor-writer — structured approach for grants and scientific papers

Bounce Targets

On verification failure, the pipeline bounces back to re-implement. Two modes:

bounce_target (fixed): always bounces to this phase
bounce_targets (agent-guided): checker picks which phase via VERDICT: FAIL(target-id): reason

- id: review
  type: check
  bounce_targets:
    - design-experiments   # agent can bounce here
    - write-paper          # or here

These are mutually exclusive. If neither is set, bounces to the most recent implement phase.

Parallel Groups

Lanes

Each lane is a mini-pipeline (e.g., implement + check, or a workflow phase followed by checks) with its own internal bounce loop. All lanes run concurrently and share the global bounce budget. The group completes when every lane passes.

parallel_groups:
  - lanes:
      - [feature-a, check-a]
      - [feature-b, check-b]
      - [feature-c, check-c]

Lane constraints:

Bounce targets must stay within their lane
workflow phases are allowed in lanes and execute like any other lane step
No phase in multiple lanes

Legacy Flat Format

Run implement phases concurrently with no per-phase checking. A single failure aborts the group.

parallel_groups:
  - phases: [independent-a, independent-b]

Workflow Includes

Compose workflows from reusable pieces. Included phases and parallel groups are merged in order before the current workflow's phases:

include:
  - shared/setup.yaml
  - shared/linting.yaml
phases:
  - id: feature
    prompt: "Build the feature."

Nested includes are supported. Circular includes are detected.

Exponential Backoff

Add a delay between bounces to avoid hammering APIs:

backoff: 2.0       # base delay in seconds (doubles each bounce)
max_backoff: 60.0  # cap

Or via CLI: --backoff 2.0

Notifications

Get webhook notifications on pipeline completion or failure:

notify:
  - https://hooks.slack.com/services/T.../B.../xxx

Or via CLI: --notify URL (repeatable). The webhook receives a JSON payload with workflow name, status, bounces, duration, token usage, and per-phase summaries.

Context Preservation

By default, bounces resume the agent's session so it retains the full conversation context from the previous attempt. The failure details are sent as a follow-up message rather than re-rendering the full prompt. Use --clear-context-on-bounce to start a fresh session on each bounce instead.

Token Tracking

Juvenal tracks input and output token usage per phase. Token counts are shown in the run summary and included in webhook notifications. Token data is persisted in the state file for resume scenarios.

CLI

juvenal run <workflow> [--resume] [--rewind N] [--rewind-to PHASE_ID] [--phase X]
                       [--max-bounces N] [--backend claude|codex] [--dry-run]
                       [--backoff SECONDS] [--notify URL] [--working-dir DIR]
                       [--state-file PATH] [--checker SPEC] [--implementer ROLE]
                       [--phased-implementer SPEC] [--clear-context-on-bounce]
                       [-D VAR=VAL] [-i|--interactive] [--serialize]
juvenal plan "goal" [-o output.yaml] [--backend claude|codex] [-i|--interactive]
juvenal do "goal" [--backend claude|codex] [--max-bounces N] [-D VAR=VAL]
                  [-i|--interactive] [--serialize]
juvenal status [--state-file path]
juvenal init [directory] [--template name]
juvenal validate <workflow>

Key Flags

Flag	Description
`--resume`	Resume from last saved state
`--rewind N`	Rewind N phases back from the resume point
`--rewind-to ID`	Rewind to a specific phase by ID
`--phase ID`	Start from a specific phase
`--dry-run`	Print execution plan without running
`--checker SPEC`	Inject checker on every implement phase (`tester`, `tester:"extra instructions"`, or `prompt:"TEXT"`). Repeatable.
`--implementer ROLE`	Prepend implementer role prompt to every implement phase
`--phased-implementer SPEC`	On `run`, first plan a complex goal into implement phases with their planner-authored checks, then execute them with your CLI-injected checker stack appended after each one. Accepts either `GOAL` or `ROLE:"GOAL"`. Implies `--linear`.
`--linear`	On `run --phased-implementer` or `do`, enforce that the planner produces a strictly linear workflow (implement phases followed by check phases that bounce to the immediately preceding implement). Implied by `--phased-implementer`.
`-i`, `--interactive`	For `run --phased-implementer`, `plan`, and `do`, allow the planner/refinement phase to ask the user one question at a time before execution continues.
`--clear-context-on-bounce`	Start fresh agent session on bounce (default: resume session)
`-D VAR=VAL`	Set a Jinja2 template variable. Repeatable.
`--backoff SECONDS`	Exponential backoff base delay between bounces
`--notify URL`	Webhook URL for completion/failure notifications. Repeatable.
`--serialize`	Disable all parallelization (run everything sequentially)

Resume & Rewind

# Resume from last saved state
juvenal run workflow.yaml --resume

# Rewind 2 phases back from the resume point
juvenal run workflow.yaml --rewind 2

# Rewind to a specific phase by ID
juvenal run workflow.yaml --rewind-to setup

--rewind and --rewind-to implicitly load existing state (no need for --resume) and invalidate from the target phase onward so everything from that point gets re-executed.

Checker Injection

Inject checkers at the CLI without modifying the workflow file:

# Add a tester role checker to every implement phase
juvenal run workflow.yaml --checker tester

# Add a checker with explicit instructions
juvenal run workflow.yaml --checker 'prompt:"Run pytest tests/ -x and emit VERDICT based on the result."'

# Add a built-in checker role with extra instructions
juvenal run workflow.yaml --checker 'tester:"Focus on API error handling and regression coverage."'

# Add both
juvenal run workflow.yaml --checker tester --checker "prompt:Run make lint and emit VERDICT based on the result."

Phased Complex Goals

If you want the built-in planner to split a large goal into multiple implement phases and run them with your own checker stack layered on top of the planner-authored verifiers, use run --phased-implementer.

Juvenal will:

plan the goal using the same planning pipeline that powers do
preserve the planned implement phases and their planner-authored check phases
append your CLI-selected checkers after each implement phase's existing checks

Example:

juvenal run \
  --standard-checkers \
  --phased-implementer 'software-engineer:add authentication, audit logging, and tests to the Flask app' \
  --interactive

That produces an execution shape like:

implement-a -> <planner checks> -> tester -> senior-tester -> senior-engineer -> architect -> pm
implement-b -> <planner checks> -> tester -> senior-tester -> senior-engineer -> architect -> pm
implement-c -> <planner checks> -> tester -> senior-tester -> senior-engineer -> architect -> pm

License

MIT

Project details

Release history Release notifications | RSS feed

0.28.25

Apr 26, 2026

This version

0.28.24

Apr 18, 2026

0.28.23

Apr 15, 2026

0.28.22

Apr 12, 2026

0.28.21

Apr 11, 2026

0.28.20

Apr 11, 2026

0.28.19

Apr 11, 2026

0.28.18

Apr 11, 2026

0.28.17

Apr 11, 2026

0.28.16

Apr 7, 2026

0.28.15

Apr 5, 2026

0.28.14

Apr 3, 2026

0.28.13

Apr 3, 2026

0.28.12

Apr 3, 2026

0.28.11

Apr 3, 2026

0.28.10

Apr 1, 2026

0.28.9

Mar 26, 2026

0.28.8

Mar 24, 2026

0.28.7

Mar 24, 2026

0.28.6

Mar 19, 2026

0.28.5

Mar 19, 2026

0.28.4

Mar 18, 2026

0.28.3

Mar 17, 2026

0.28.2

Mar 17, 2026

0.28.1

Mar 17, 2026

0.28.0

Mar 17, 2026

0.27.4

Mar 17, 2026

0.27.3

Mar 17, 2026

0.27.2

Mar 17, 2026

0.27.1

Mar 17, 2026

0.27.0

Mar 17, 2026

0.26.0

Mar 14, 2026

0.25.0

Mar 14, 2026

0.24.0

Mar 14, 2026

0.23.2

Mar 14, 2026

0.23.1

Mar 14, 2026

0.23.0

Mar 14, 2026

0.22.0

Mar 14, 2026

0.21.0

Mar 14, 2026

0.20.0

Mar 14, 2026

0.19.1

Mar 14, 2026

0.19.0

Mar 14, 2026

0.18.6

Mar 14, 2026

0.18.5

Mar 14, 2026

0.18.4

Mar 14, 2026

0.18.3

Mar 14, 2026

0.18.2

Mar 14, 2026

0.18.1

Mar 13, 2026

0.18.0

Mar 13, 2026

0.17.0

Mar 13, 2026

0.16.0

Mar 13, 2026

0.15.0

Mar 13, 2026

0.14.0

Mar 12, 2026

0.13.2

Mar 12, 2026

0.13.1

Mar 12, 2026

0.13.0

Mar 11, 2026

0.12.0

Mar 11, 2026

0.11.0

Mar 11, 2026

0.10.2

Mar 9, 2026

0.10.1

Mar 9, 2026

0.10.0

Mar 9, 2026

0.9.3

Mar 9, 2026

0.9.2

Mar 9, 2026

0.9.1

Mar 9, 2026

0.9.0

Mar 9, 2026

0.8.0

Mar 9, 2026

0.7.0

Mar 1, 2026

0.6.0

Mar 1, 2026

0.5.0

Mar 1, 2026

0.4.0

Mar 1, 2026

0.3.2

Mar 1, 2026

0.3.1

Mar 1, 2026

0.3.0

Mar 1, 2026

0.2.2

Mar 1, 2026

0.2.1

Feb 28, 2026

0.2.0

Feb 28, 2026

0.1.1

Feb 28, 2026

0.1.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

juvenal-0.28.24.tar.gz (146.2 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

juvenal-0.28.24-py3-none-any.whl (102.9 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file juvenal-0.28.24.tar.gz.

File metadata

Download URL: juvenal-0.28.24.tar.gz
Upload date: Apr 18, 2026
Size: 146.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for juvenal-0.28.24.tar.gz
Algorithm	Hash digest
SHA256	`4766d2c050ba0d9c18c52f390c089a3620d468bce236fee13ebc13176343a9f6`
MD5	`c82d08f4a6a9289477972945bf174d52`
BLAKE2b-256	`579fc5279cdddf6a04753fe0cff5e2256b2a8ea4dc73484132f928ade662eb94`

See more details on using hashes here.

File details

Details for the file juvenal-0.28.24-py3-none-any.whl.

File metadata

Download URL: juvenal-0.28.24-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 102.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for juvenal-0.28.24-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8faaf1140fa3866e51159adfd639d1beae3faeb73dc56cb5809e1b92729e54f5`
MD5	`0004e718492dc93eba48dfbc060e9294`
BLAKE2b-256	`e086be8d29e97ea99733c6fc8a52e6a5b1a7382d42c8afd8940a7d7277d7a937`

See more details on using hashes here.

juvenal 0.28.24

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Juvenal

The Problem

The Solution

How It Works

Other Such Frameworks

Install

Claude Code Skill

Install the plugin

Usage

Quick Start

Embedded API

Workflow Formats

YAML

Directory Convention

Bare Markdown

Phase Types

Workflow Phases

Inline Checkers

Built-in Roles

Bounce Targets

Parallel Groups

Lanes

Legacy Flat Format

Workflow Includes

Exponential Backoff

Notifications

Context Preservation

Token Tracking

CLI

Key Flags

Resume & Rewind

Checker Injection

Phased Complex Goals

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes