Skip to main content

Who guards the agents? A framework for orchestrating AI coding agents through verified implementation phases.

Project description

Juvenal

Quis agit ipsos agentes? — Who acts upon the agents?

Juvenal

Juvenal is a framework for orchestrating AI coding agents through verified implementation phases. It prevents agents from cheating on success criteria, helps agents implement complex projects in phases, etc.

The Problem

Agents such at giant problems. This is probably only a temporary problem, but for now, an AI agent given a massive problem will fumble it. It'll take shortcuts, lie, cheat, steal, the works.

The Solution

There's no honor among agents! Agent B feels no obligation to cover for some shortcut that Agent A made. This makes an implementation-verification loop with separate agents pretty effective for catching cut corners. When Agent B catches Agent A's shoddy work, Agent C can be spun up to implement fixes, and so on.

How It Works

A non-agentic Python script orchestrates AI coding agents (Claude or Codex) through alternating steps:

  1. Implementation — an agent executes a prompt to build/modify code
  2. Verification — separate checkers (scripts, agents, or both) verify the work
  3. Bounce — if verification fails, the pipeline bounces back (to a configurable target phase or the most recent implement phase) with failure context injected. A global bounce limit (max_bounces) prevents infinite loops.

The implementing agent and the checking agent are separate processes, so the implementer can't cheat by weakening tests, etc.

Other Such Frameworks

Juvenal is conceptually similar to ralph, but it works slightly better for my exact purposes and reinventing the wheel is cheap now!

Install

pip install -e ".[dev]"

Claude Code Skill

Juvenal ships as a Claude Code plugin, so you can use it directly from Claude Code with /juvenal.

Install the plugin

From the marketplace (pending approval):

/plugin install juvenal

From source (works now):

claude --plugin-dir /path/to/juvenal/plugin

Usage

Once installed, invoke the skill in Claude Code:

/juvenal add authentication to the Flask app

Claude will create a Juvenal workflow for your goal and run it. You can also ask for help with workflow formats or run existing workflows.

Quick Start

# Scaffold a workflow
juvenal init my-project

# Run a workflow
juvenal run workflow.yaml

# Generate a workflow from a goal
juvenal plan "implement a REST API with tests" -o workflow.yaml

# Plan and immediately run
juvenal do "add authentication to the Flask app"

Workflow Formats

YAML

name: "my-workflow"
backend: claude
max_bounces: 999
backoff: 2.0        # exponential backoff between bounces (seconds)
max_backoff: 60.0   # cap on backoff delay
notify:
  - https://example.com/webhook

phases:
  - id: implement
    prompt: "Implement the feature."
    timeout: 300
    env:
      NODE_ENV: production
    checkers:
      - run: "pytest tests/ -x"       # script checker
      - tester                          # built-in role shorthand
      - role: senior-engineer           # role as dict
      - prompt: "Check for security."   # inline prompt

Directory Convention

my-workflow/
  phases/
    01-setup/
      prompt.md            # implementation prompt
    02-parallel/           # "parallel" in name → parallel lane group
      feature-a/           #   each subdir is a lane
        prompt.md          #     implement phase
        check.md           #     check phase (auto-bounces to implement)
        tests.sh           #     script phase (auto-bounces to implement)
      feature-b/
        prompt.md
        check.md
    03-finish/
      prompt.md

Lanes can also use subdirectories for more complex pipelines:

02-parallel/
  a/
    01-implement/
      prompt.md
    02-check-review/
      prompt.md

Phase IDs are derived from directory names. In simple mode (lane has prompt.md at root): a, a~check-1, a~script-1. In complex mode (subdirectories): a~01-implement, a~02-check-review.

Bare Markdown

juvenal run task.md  # single implement phase from a .md file

Phase Types

Type Description
implement Agent executes a prompt to build/modify code (default)
check Separate agent verifies work, emits VERDICT: PASS or VERDICT: FAIL: reason
script Shell command; exit 0 = PASS, nonzero = FAIL
workflow Dynamic sub-workflow: plans and executes a sub-pipeline from the prompt

Workflow Phases

A workflow phase dynamically generates and executes a sub-pipeline. Useful for open-ended tasks where the exact phases aren't known ahead of time:

- id: dynamic-feature
  type: workflow
  prompt: "Build a REST API with authentication and tests."
  max_depth: 2  # recursion depth limit (default: 3)

Inline Checkers

Checkers are defined inline on implement phases. Each entry can be:

  • Bare string — built-in role shorthand
  • run: CMD — script checker (exit 0 = pass)
  • role: NAME — agent checker with built-in role
  • prompt: TEXT — agent checker with inline prompt
  • prompt_file: PATH — agent checker with prompt from file

Checkers can also carry timeout and env.

- id: implement
  prompt: "Build the feature."
  checkers:
    - run: "pytest tests/ -x"
    - tester
    - role: senior-engineer
    - prompt: "Check for security vulnerabilities."
    - prompt_file: checkers/review.md
    - run: "npm run lint"
      timeout: 60
      env:
        CI: "true"

Built-in Roles

Agent checkers can use built-in verification personas:

  • tester — runs tests, checks for build errors
  • architect — validates design, checks for circular dependencies
  • pm — confirms requirements are met, no TODOs remain
  • senior-tester — checks test integrity, looks for cheating
  • senior-engineer — reviews code quality, completeness, security

Implementer roles (via --implementer):

  • software-engineer — structured implementation approach

Bounce Targets

On verification failure, the pipeline bounces back to re-implement. Two modes:

  • bounce_target (fixed): always bounces to this phase
  • bounce_targets (agent-guided): checker picks which phase via VERDICT: FAIL(target-id): reason
- id: review
  type: check
  bounce_targets:
    - design-experiments   # agent can bounce here
    - write-paper          # or here

These are mutually exclusive. If neither is set, bounces to the most recent implement phase.

Parallel Groups

Lanes

Each lane is a mini-pipeline (e.g., implement + check) with its own internal bounce loop. All lanes run concurrently and share the global bounce budget. The group completes when every lane passes.

parallel_groups:
  - lanes:
      - [feature-a, check-a]
      - [feature-b, check-b]
      - [feature-c, check-c]

Lane constraints:

  • Bounce targets must stay within their lane
  • No workflow-type phases in lanes
  • No phase in multiple lanes

Legacy Flat Format

Run implement phases concurrently with no per-phase checking. A single failure aborts the group.

parallel_groups:
  - phases: [independent-a, independent-b]

Workflow Includes

Compose workflows from reusable pieces. Included phases and parallel groups are merged in order before the current workflow's phases:

include:
  - shared/setup.yaml
  - shared/linting.yaml
phases:
  - id: feature
    prompt: "Build the feature."

Nested includes are supported. Circular includes are detected.

Exponential Backoff

Add a delay between bounces to avoid hammering APIs:

backoff: 2.0       # base delay in seconds (doubles each bounce)
max_backoff: 60.0  # cap

Or via CLI: --backoff 2.0

Notifications

Get webhook notifications on pipeline completion or failure:

notify:
  - https://hooks.slack.com/services/T.../B.../xxx

Or via CLI: --notify URL (repeatable). The webhook receives a JSON payload with workflow name, status, bounces, duration, token usage, and per-phase summaries.

Context Preservation

By default, each bounce starts a fresh agent session. With --preserve-context-on-bounce, the agent's session is resumed instead, so it retains the full conversation context from the previous attempt. The failure details are sent as a follow-up message rather than re-rendering the full prompt.

Token Tracking

Juvenal tracks input and output token usage per phase. Token counts are shown in the run summary and included in webhook notifications. Token data is persisted in the state file for resume scenarios.

CLI

juvenal run <workflow> [--resume] [--rewind N] [--rewind-to PHASE_ID] [--phase X]
                       [--max-bounces N] [--backend claude|codex] [--dry-run]
                       [--backoff SECONDS] [--notify URL] [--working-dir DIR]
                       [--state-file PATH] [--checker SPEC] [--implementer ROLE]
                       [--preserve-context-on-bounce]
juvenal plan "goal" [-o output.yaml] [--backend claude|codex]
juvenal do "goal" [--backend claude|codex] [--max-bounces N]
juvenal status [--state-file path]
juvenal init [directory] [--template name]
juvenal validate <workflow>

Key Flags

Flag Description
--resume Resume from last saved state
--rewind N Rewind N phases back from the resume point
--rewind-to ID Rewind to a specific phase by ID
--phase ID Start from a specific phase
--dry-run Print execution plan without running
--checker SPEC Inject checker on every implement phase (role, run:CMD, prompt:TEXT). Repeatable.
--implementer ROLE Prepend implementer role prompt to every implement phase
--preserve-context-on-bounce Resume agent session on bounce (preserves conversation context)
--backoff SECONDS Exponential backoff base delay between bounces
--notify URL Webhook URL for completion/failure notifications. Repeatable.

Resume & Rewind

# Resume from last saved state
juvenal run workflow.yaml --resume

# Rewind 2 phases back from the resume point
juvenal run workflow.yaml --rewind 2

# Rewind to a specific phase by ID
juvenal run workflow.yaml --rewind-to setup

--rewind and --rewind-to implicitly load existing state (no need for --resume) and invalidate from the target phase onward so everything from that point gets re-executed.

Checker Injection

Inject checkers at the CLI without modifying the workflow file:

# Add a tester role checker to every implement phase
juvenal run workflow.yaml --checker tester

# Add a script checker
juvenal run workflow.yaml --checker "run:pytest tests/ -x"

# Add both
juvenal run workflow.yaml --checker tester --checker "run:make lint"

License

MIT

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

juvenal-0.18.2.tar.gz (76.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

juvenal-0.18.2-py3-none-any.whl (65.7 kB view details)

Uploaded Python 3

File details

Details for the file juvenal-0.18.2.tar.gz.

File metadata

  • Download URL: juvenal-0.18.2.tar.gz
  • Upload date:
  • Size: 76.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for juvenal-0.18.2.tar.gz
Algorithm Hash digest
SHA256 36cdd2af08e6b0ea7cd1dd204f453b6811073876d735dd4e2debf5d2cfe2f8f0
MD5 88dae4fc1ca79c75946cca6a940a2628
BLAKE2b-256 1b5de088a76068740d53d491fe23ea91b5f559cc41c97c086a8ac393ca1fbfc0

See more details on using hashes here.

File details

Details for the file juvenal-0.18.2-py3-none-any.whl.

File metadata

  • Download URL: juvenal-0.18.2-py3-none-any.whl
  • Upload date:
  • Size: 65.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for juvenal-0.18.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bb4cc8397d31915f13d33451c8016e58c0c9db61ba74700b4ac6c4ea1e535fd8
MD5 d52b220b6146eed2dadd773677c9c76c
BLAKE2b-256 927626cda0f5df7578defb195ceff4e7005e22fc4a8897a4b1e6ffdb2b1db8ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page