Skip to main content

AI-driven autonomous software iteration loop: set a goal, let CLI coding agents architect, build, test, review and evolve a project version by version.

Project description

AutoDevLoop

Set one goal. Watch AI coding agents architect, build, test, review, and keep evolving a real project — version by version — until your target version count is reached.

English · 简体中文

AutoDevLoop is a small, dependency-free Python tool that drives a CLI coding agent (Claude Code by default; Codex / Gemini CLI also supported) through a standardised, multi-stage development loop. You give it a goal and a number of versions; it designs an architecture, plans each version, writes the code, runs tests, reviews the result, and — once your goal is met — proposes and value-gates new features to keep improving the product on its own.

Every version is a usable, snapshotted build. A glanceable FEATURES.md table records what each version delivers and what changed since the last one.


Highlights

  • Goal-driven, two-phase loop. A build phase drives straight at your goal; once an independent check decides the goal is genuinely met, it flips to an expand phase that builds valuable adjacent features.
  • Value gate for new features. In the expand phase, one agent scouts ideas and a separate agent scores each for value/effort — only accepted ideas enter the backlog and get built. No random feature bloat.
  • Standard flow, dynamic detail. The pipeline (architecture → plan → develop → test → review → fix → scout → evaluate) is fixed and predictable, but the planner decides how many dev agents to spawn, what each does, and the prompts leave room for the model's judgement. Prompts are editable template files, not hard-coded strings.
  • Simple vs advanced modes. simple runs a cheap core loop (plan → develop → test → review) to save tokens; advanced adds goal checks, a test-planning agent, docs, scouting, and the value gate. Individual steps are toggleable.
  • One working folder, many snapshots. Agents only ever edit current/; each finished version is copied to versions/vN/. If git is available, every version is also committed and tagged, and the version where your goal is first met gets a special goal-complete tag.
  • Cost & token tracking. Every provider call's cost and tokens are recorded and surfaced live.
  • Local web dashboard. Start projects, watch live progress (current version, agent, step, cost, per-agent output), read the changelog/feature table, and edit configuration + prompts — all in the browser. No build step, no dependencies.
  • Resilient. Threaded provider I/O (no stdin/stdout deadlocks), retry with exponential backoff on transient failures, atomic state writes, and automatic rollback of the working copy if a version errors out.

Requirements

  • Python 3.10+
  • A coding-agent CLI installed and authenticated locally, one of:
    • Claude Codeclaude (default)
    • Codex CLI — codex
    • Gemini CLI — gemini

AutoDevLoop never asks for API keys. You authenticate your CLI of choice beforehand; switching providers just changes which command is invoked. (Using a third-party API endpoint behind the claude CLI works fine — the tool just calls claude.)

No Python runtime dependencies are required. PyYAML is optional (a built-in fallback YAML parser ships with the tool).


Install

# from the project root
pip install -e .
# now the `autodevloop` command is available
autodevloop --version

Or run without installing:

python -m autodevloop --help
# or the backward-compatible shim:
python autodev.py --help

Quick start

CLI

# Interactive (prompts for directory, goal, versions, mode):
autodevloop run

# Non-interactive:
autodevloop run --project-dir ./my-app \
  --goal "Build a WeChat-like app: real-time chat plus a moments feed" \
  --max-versions 8 --mode advanced

# Brainstorm the design first (interactive Q&A, then the loop runs):
autodevloop run --project-dir ./my-app --goal "a todo CLI" --brainstorm

Brainstorm mode (--brainstorm): before the autonomous loop starts, the AI asks you one question at a time to refine purpose, scope, constraints and success criteria — turning a rough idea into an agreed design. The transcript is saved to .autodev/brainstorm.json (so it survives interruptions and is not re-run on resume) and the final design to docs/brainstorm-spec.md; the refined goal then feeds the run. Type /done to wrap up early or /skip to cancel. It is also available as a checkbox + chat panel when creating a project in the web dashboard. Skipped automatically in --non-interactive runs.

Watch / control a run:

autodevloop status --project-dir ./my-app
autodevloop stop   --project-dir ./my-app     # graceful stop after the current step

Web dashboard

autodevloop web            # http://127.0.0.1:8787
autodevloop web --port 9000

The UI is available in English / 简体中文 / 日本語 (compact 🌐 switcher, top-right). A built-in Help guide and hover tooltips (the ? icons) explain every setting, agent, and button, so first-time users aren't left guessing.

From the dashboard you can:

  1. Create a project — directory, goal, version count, mode, provider, architecture hint. Creating only creates it; you then review/edit settings and press Run when ready (it does not auto-start).
  2. Watch live — status, phase, current version, a per-agent live timer for every agent running right now (multiple at once when agents run in parallel), agent-call count, token usage, total run time, a scrollable activity log with a divider between versions, and each agent's full output (persistent viewer).
  3. Edit settings — pipeline mode and step toggles, max versions, review and value thresholds, retries, test command, provider command/model, and every prompt template. Required agents (plan, develop, test, review, fix) are shown but locked on; only optional steps can be toggled. Prompt edits are format-checked — rewrite the wording in any language, but the {{placeholders}} and JSON field names the engine depends on can't be removed. Settings are locked while a run is active and take effect on the next run.
  4. Stop two waysgraceful (finish the current version, then stop) or discard (kill immediately, throw away the unfinished version, and roll the working copy back to the last completed version). Each shows a confirmation explaining exactly what happened.
  5. Read the docs — the FEATURES.md overview table and CHANGELOG.md.

Cost in money is intentionally not shown (third-party API pricing behind a CLI is unreliable); the dashboard reports agent-call count and tokens instead.


How the loop works

            ┌─────────────────────────── once, at the start ──────────────────────────┐
            │  AgentARCH → picks a mainstream stack, layout, run & test strategy        │
            └──────────────────────────────────────────────────────────────────────────┘
 per version:
   AgentPLAN ── decides this version's goal + how many dev agents and what they own
       │
   AgentDEV_* ─ one or more (parallel) agents implement in isolated workspaces,
       │        then merge back into current/ (only changed files; first-writer-wins
       │        on conflict, with a warning)
   AgentDOC ── (advanced) keeps README / design docs accurate
       │
   AgentTEST ─ runs tests: built-in detection in simple mode, or an agent picks
       │        the test commands in advanced mode
   AgentREVIEW  scores quality, flags blockers, judges goal completeness, and
       │        writes the human-readable "what's new" summary
       │
   (fix loop) ─ if tests fail / blocking / below threshold, AgentFIX repairs and re-tests
       │
   AgentGOALCHECK (advanced) independently confirms whether the goal is met
       │
   ── if goal met for the first time → switch to EXPAND phase, tag goal-complete ──
       │
   AgentSCOUT + AgentEVALUATE (expand phase) propose & value-gate new features
       │        into a persistent backlog the planner draws from next time
       ▼
   snapshot → versions/vN/, git commit + tag vN, update CHANGELOG.md & FEATURES.md

The loop never stops early for being "good enough" — it runs until your max_versions (or you stop it). Reaching the goal switches what it works on, not whether it keeps going.


Output layout

Each project directory gets:

Path What
current/ The single working copy agents edit (git repo if enabled)
versions/vN/ A full snapshot of every completed version
FEATURES.md At-a-glance table: each version's features + what changed
CHANGELOG.md Per-version changelog with summaries and test status
.autodev/state.json Full run state
.autodev/progress.json Live progress + event feed (used by the web UI)
.autodev/backlog.json Scouted features and their accept/reject verdicts
.autodev/architecture.md The initial architecture report
.autodev/prompts/templates/ Editable prompt templates
.autodev/plans/, reviews/, tests/, logs/ Per-stage artifacts
.autodev/final_report.md Summary written at the end of a run

Configuration

Settings live in .autodevloop.yml in the project directory (the web settings page and CLI flags write to it). Everything has a sane default; a full file looks like:

project:
  name: My App
  max_versions: 8
  arch_hint: "React + FastAPI + SQLite"   # optional hint for AgentARCH

provider:
  name: claude          # claude | codex | gemini
  command: ""           # blank = use the profile's default command (e.g. "claude")
  model: ""             # optional model alias/name
  extra_args: []        # extra CLI args appended to every call
  profiles:
    claude: {command: "", model: ""}
    codex: {command: "", model: ""}
    gemini: {command: "", model: ""}
  assignments:          # blank/missing = inherit provider.name
    arch: codex
    plan: codex
    dev: claude
    review: codex
    bugfix: claude
    bugverify: codex

pipeline:
  mode: advanced        # simple | advanced
  steps:                # override individual steps on top of the mode defaults
    goal_check: true
    test_agent: true
    doc: true
    scout: true
    evaluate: true
    features_doc: true

agents:
  timeout: 1800         # seconds per provider call
  allow_parallel: true
  max_parallel: 3
  retries: 3            # retries on transient provider failures
  backoff_seconds: 5

review:
  threshold: 80         # review score below this triggers a fix pass

value:
  threshold: 65         # feature value below this is rejected by the gate

fix:
  retries: 2

tests:
  timeout: 120
  command: ""           # blank = auto-detected / agent-chosen

vcs:
  git: true             # commit + tag each version inside current/

Useful CLI flags: --mode, --provider, --provider-command, --model, --max-versions, --review-threshold, --fix-retries, --max-parallel-agents, --no-parallel, --no-git, --test-command, --reset, --non-interactive.


Agent checkpoints, repairs, and CLI routing

The dashboard now has four distinct controls. Pause agent kills the active CLI process tree, discards only that in-flight agent (or parallel DEV batch), and preserves completed agents in .autodev/checkpoint.json. Continue restarts at that exact agent. Graceful stop still finishes the current version; discard-version stop rolls current/ back to the last completed version.

docs/development-progress.md is the human-readable checkpoint. While paused, you can add highest-priority human directives for the next agent, this version, or all future versions; their audit trail lives in .autodev/directives.json.

The Bug repair tab creates immutable branches such as repairs/v2/fix-001/, then runs BugFix, tests, and BugVerify without changing versions/v2/. Accepted repairs can optionally be promoted to current/ after the existing working copy is backed up.

Settings can independently route each agent role to Claude, Codex, or Gemini. Dynamic development agents inherit the DEV assignment. Blank assignments inherit the legacy default provider, keeping old projects compatible.


⚠️ Security notice — please read

AutoDevLoop is an autonomous code generator that runs code on your machine. Treat it like any tool that executes untrusted code:

  • It writes and executes code. Agents run with file edit permissions, and AgentTEST runs shell test/build commands in your project directory. Generated code is not reviewed by a human before it runs.
  • It runs unattended and can spend money. The loop keeps calling your provider CLI until it reaches the version count or you stop it. Watch the live cost readout, set a sensible --max-versions, and keep an eye on your provider billing.
  • Run it in an isolated environment. Prefer a dedicated directory, a container, or a VM. Don't point it at a directory containing secrets or important unrelated files.
  • The web dashboard is unauthenticated and binds to localhost. It can start runs and execute commands. Do not expose the port to a network you don't fully trust. There is no auth layer.
  • No API keys are handled by this tool — your provider CLI manages its own credentials. AutoDevLoop only invokes the command you configured.

By running AutoDevLoop you accept that you are responsible for the code it generates and the commands it executes.


Troubleshooting

  • "Provider command not found" — install the CLI and ensure it's on PATH, or set provider.command to the full path / wrapper command.
  • Git commits don't appear — git is optional; the tool falls back to folder snapshots. Corporate git hooks that block commits are tolerated silently.
  • Garbled characters on a legacy Windows console — output is forced to UTF-8; if your terminal still struggles, run inside Windows Terminal or set PYTHONIOENCODING=utf-8.

License

MIT. Contributions welcome — see CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autodevloop-0.1.2.tar.gz (103.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autodevloop-0.1.2-py3-none-any.whl (96.4 kB view details)

Uploaded Python 3

File details

Details for the file autodevloop-0.1.2.tar.gz.

File metadata

  • Download URL: autodevloop-0.1.2.tar.gz
  • Upload date:
  • Size: 103.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for autodevloop-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b6902d3346e72e29d962081513c590daf4e12b94cb665aac4f8c5dc292f38d83
MD5 c7b25f7a72c6d88c0b34619a487ed224
BLAKE2b-256 7268be24168fdefd8ef8c18744423af6f083b780aa72dfa369e767a6d122a1cc

See more details on using hashes here.

File details

Details for the file autodevloop-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: autodevloop-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 96.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for autodevloop-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d96d8a48d0411e8ad717ae4e6a9be411da4090fd42855ddbb37c6e90f0783a54
MD5 6eebfa6a40418a979629ce369255948e
BLAKE2b-256 83b9effa488dbae1ad009b1876350c5f3b9b3ae5f7fed64ce87fc933b5269228

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page