PM-driven review-loop orchestrator for AI coding agents

These details have not been verified by PyPI

Project description

agent-task-runner

PM-driven review-loop orchestrator for AI coding agents.

agent-task-runner runs a multi-round review loop: a Worker writes code, a Reviewer checks it, and the loop repeats until approval or max rounds. It ships with built-in support for OpenAI Codex, Anthropic Claude, and OpenCode as worker/reviewer backends, with automatic dispatch and real-time streaming output. Need something different? Use register_backend() to plug in your own — no core modifications required.

Quick Start

# Install
pip install agent-task-runner
# or: uv add agent-task-runner

# Initialize loop directory in your project
loop init

# (optional) Build offline module map for the codebase
loop index

# Write a task card (see .loop/examples/task_card.json)
# Then run the loop with auto-dispatch
loop run --task .loop/task_card.json --auto-dispatch --worker-backend codex --reviewer-backend codex

Prerequisites

Python >= 3.11
Git repository (the orchestrator uses git commits as the source of truth)
At least one AI backend installed: codex, claude, or opencode

CI

GitHub Actions workflow loop-ci.yml runs on push to main/master and on pull_request.

uv sync --frozen --group dev
uv run --group dev --with pytest-cov pytest --cov=src/loop_kit --cov-report=xml
uv run --group dev pytest -m integration when integration-marked tests exist
uv run --group dev ruff check src/loop_kit tests
uv run --group dev --with mypy mypy src/loop_kit (optional, non-blocking)

The workflow uploads coverage.xml to Codecov and stores JUnit XML test results as workflow artifacts.

CLI Reference

loop init                  Create .loop/ directory structure and templates
loop index                 Build offline module map for src/loop_kit
loop run                   Run the full PM-controlled review loop
loop knowledge             Manage built-in defaults knowledge JSONL files
loop status                Show current loop state
loop health                Show worker/reviewer heartbeat health
loop dispatch-metrics      Summarize dispatch phase latency metrics from feed logs
loop heartbeat             Write role heartbeat continuously
loop archive               List or restore archived bus files
loop extract-diff BASE HEAD  Print git diff between two commits

`loop run` flags

Flag	Default	Description
`--task PATH`	`.loop/task_card.json`	Path to task card JSON
`--max-rounds N`	3	Maximum review rounds
`--timeout N`	0	Per-phase timeout in seconds (0=unlimited)
`--auto-dispatch`	off	Automatically invoke worker/reviewer backends each round
`--dispatch-backend native`	native	Subprocess transport
`--worker-backend codex\|claude\|opencode`	codex	Backend for worker dispatch
`--reviewer-backend codex\|claude\|opencode`	codex	Backend for reviewer dispatch
`--dispatch-timeout N`	0	Per-dispatch timeout in seconds (0=unlimited)
`--dispatch-retries N`	2	Retries on non-zero dispatch exit
`--dispatch-retry-base-sec N`	5	Base backoff seconds between dispatch retries (max delay 60s)
`--max-session-rounds N`	0	Max rounds to reuse one backend session before rotating (0 disables rotation)
`--artifact-timeout N`	90	Post-dispatch artifact wait in seconds
`--require-heartbeat`	off	Require live heartbeat while waiting
`--heartbeat-ttl N`	30	Heartbeat freshness threshold in seconds
`--single-round`	off	Run exactly one round and exit
`--round N`	-	Round number for single-round mode
`--resume`	off	Resume from .loop/state.json
`--reset`	off	Reset stale bus files before running
`--allow-dirty`	off	Allow starting with dirty tracked files
`--verbose`	off	Stream full backend stdout
`--loop-dir PATH`	.loop	Loop bus directory

`loop dispatch-metrics` flags

Flag	Default	Description
`--task-id ID`	all task IDs	Filter `dispatch_phase_metrics` rows by task ID
`--role all\|worker\|reviewer`	all	Filter rows by role
`--loop-dir PATH`	.loop	Loop bus directory

Example:

loop dispatch-metrics --task-id T-715 --role worker

`loop knowledge` commands

Command	Description
`loop knowledge list [--category CAT]`	Print facts, pitfalls, and patterns from `src/loop_kit/defaults/*.jsonl` in a table
`loop knowledge add --pattern TEXT --category CAT --confidence 0..1 --source ORIGIN`	Append a pattern entry to `src/loop_kit/defaults/patterns.jsonl`
`loop knowledge prune --older-than DAYS`	Remove entries whose `source_version` timestamp is older than `DAYS`
`loop knowledge dedupe`	Deduplicate defaults knowledge entries and report removals

All write operations are atomic (temp file -> rename) and modify src/loop_kit/defaults/*.jsonl in place.

Configuration files and env vars

loop run also reads defaults from .loop/config.yaml (preferred) or .loop/config.json (backward compatible).

YAML support is optional and uses PyYAML when installed. If config.yaml exists but PyYAML is unavailable, it is skipped with a warning.
Existing .loop/config.json files continue to work unchanged.

Environment variable overrides:

LOOP_MAX_ROUNDS -> RunConfig.max_rounds
LOOP_DISPATCH_TIMEOUT -> RunConfig.dispatch_timeout
LOOP_BACKEND_PREFERENCE -> RunConfig.backend_preference (comma-separated, e.g. codex,claude,opencode)

Resolution order:

CLI args > environment variables > config file > built-in defaults

File Bus Protocol

All state passes through JSON files in .loop/:

PM  -> Worker:    task_card.json / fix_list.json
Worker -> PM:     work_report.json
PM  -> Reviewer:  review_request.json
Reviewer -> PM:   review_report.json

Context files

loop init also creates knowledge/context files in .loop/context/:

project_facts.md — Project-specific facts and conventions
pitfalls.md — Known pitfalls (auto-appended from review blocking issues)
patterns.jsonl — High-confidence patterns (auto-populated from review issues on approval)
module_map.json — Offline module index (populated by loop index)
handoff/ — Per-task role handoff artifacts (.loop/handoff/{task_id}/{role}_r{round}.json)

The worker first reads project AGENTS.md and docs/roles/code-writer.md, and the reviewer first reads project docs/roles/reviewer.md. If any of those files are missing, agent-task-runner falls back to built-in defaults in src/loop_kit/defaults/. Project files always override built-in defaults when present.

Quickstart vs Handoff vs Warm Resume

Quickstart context: injected for cold task starts (round 1) with stable project constraints and execution contract.
Handoff context: structured bridge built every round for both roles (done, open_questions, next_actions, evidence, must_read_files) and injected into later prompts when available.
Warm resume: reuses backend session IDs from state.json for low-latency continuation.
Session rotation: set --max-session-rounds to intentionally start a fresh backend session after N rounds while preserving continuity via handoff context.
Fallback behavior: invalid resume sessions are detected, logged, and retried with a fresh session.

Dispatch phase metrics

The feed keeps backward-compatible dispatch events (dispatch_start, dispatch_artifact_written) and adds phase markers for timing analysis:

dispatch_first_stdout: first streamed stdout line from the backend process.
dispatch_first_work_action: first concrete execution signal (for example Codex item.started command/tool work), not summary prose.
dispatch_first_meaningful_action: first meaningful summary signal (message/tool summary). This is intentionally not the work-start boundary.
dispatch_phase_metrics: one aggregated event per role/round with:
- startup_ms = t(first_stdout) - t(dispatch_start)
- context_to_work_ms = t(first_work_action) - t(first_stdout)
- work_to_artifact_ms = t(dispatch_artifact_written) - t(first_work_action)
- total_ms = t(dispatch_artifact_written) - t(dispatch_start)
- within work_to_artifact_ms, classified subphases:
  - read_ms, search_ms, edit_ms, test_ms, unknown_ms
  - read_count, search_count, edit_count, test_count, unknown_count

Interpretation boundaries:

startup_ms captures process startup + first output availability.
context_to_work_ms captures initial context understanding before real execution begins.
work_to_artifact_ms captures execution-to-artifact completion.
If a boundary is missing (for example no concrete work signal), the missing segment is emitted as null while total_ms is still reported.
Subphase classification is deterministic from streamed backend action events:
- read: file-read tools/commands
- search: grep/glob/web-search style tools/commands
- edit: file-change events and write/edit tools/commands
- test: test-running commands (for example pytest, go test, npm test)
- unknown: actionable events that do not match the above

loop dispatch-metrics now prints:

phase latency table (startup/context_to_work/work_to_artifact/total)
work subphase table (read/search/edit/test/unknown)

Use loop dispatch-metrics to aggregate these events directly from .loop/logs/feed.jsonl with count, missing, avg_ms, p50_ms, and p95_ms per segment, and use --task-id/--role filters to scope both tables.

Interpretation limits for the command output:

count is the number of non-null numeric values for that segment.
missing counts rows where the segment is null, absent, or non-numeric.
p50_ms/p95_ms use nearest-rank percentiles on the filtered rows.
The report only reflects events present in the local feed file; it does not infer missing telemetry from other artifacts.
For subphases, durations are bounded by observed action events; missing/truncated boundaries degrade to deterministic partial accounting rather than crashes.
unknown_ms and unknown_count are expected when commands/tools are not recognized by current classifiers.

Key JSON schemas

task_card.json

{
  "task_id": "T-001",
  "status": "todo",
  "goal": "One-sentence goal",
  "in_scope": ["file or module"],
  "out_of_scope": [],
  "acceptance_criteria": ["measurable criterion"],
  "constraints": []
}

status is system-managed while the loop runs: it is written to in_progress at start, done on approved completion, and blocked on non-approved terminal failures.

work_report.json

{
  "task_id": "T-001",
  "round": 1,
  "head_sha": "abc123...",
  "files_changed": ["src/main.py"],
  "notes": "What was done",
  "tests": [{"name": "test_foo", "result": "pass"}]
}

review_request.json

{
  "task_id": "T-001",
  "round": 1,
  "base_sha": "def456...",
  "head_sha": "abc123...",
  "commits": ["abc123 Fix foo"],
  "diff": "diff output...",
  "acceptance_criteria": ["measurable criterion"],
  "constraints": [],
  "worker_notes": "What was done",
  "worker_tests": [{"name": "test_foo", "result": "pass"}]
}

fix_list.json

{
  "task_id": "T-001",
  "round": 2,
  "base_sha": "def456...",
  "head_sha": "abc123...",
  "fixes": [{"severity": "high", "file": "src/main.py", "reason": "..."}],
  "prior_round_notes": "What was done",
  "prior_review_non_blocking": ["..."]
}

review_report.json

{
  "task_id": "T-001",
  "round": 1,
  "decision": "approve|changes_required",
  "blocking_issues": [{"severity": "high", "file": "src/main.py", "reason": "..."}],
  "non_blocking_suggestions": ["..."]
}

state.json — Internal orchestrator state, the single source of truth between rounds.

Architecture

                   ┌──────────┐
                   │   PM     │  orchestrator.py
                   │(outer)   │
                   └────┬─────┘
             ┌──────────┼──────────┐
             ▼          ▼          ▼
      ┌──────────┐ ┌──────────┐
      │  Worker  │ │ Reviewer │   (codex/claude/opencode subprocess)
      │(codex/   │ │(codex/   │
      │claude/   │ │claude/   │
      │opencode) │ │opencode) │
      └──────────┘ └──────────┘

Each round runs as a fresh subprocess (python -m loop_kit run --single-round), so code changes in the orchestrator itself take effect immediately — the orchestrator can improve itself.

Backend discovery uses shutil.which() plus known install paths. The backend registry (register_backend()) supports adding custom backends.

Prompt Templates

loop init creates prompt templates in .loop/templates/:

worker_prompt.txt — Worker prompt with {task_id}, {round_num}, {agents_md}, {role_md}, {task_card_section}, {prior_context_section}, {work_report_path} placeholders
reviewer_prompt.txt — Reviewer prompt with {task_id}, {round_num}, {role_md}, {review_report_path} placeholders

Development

# Clone and install editable
git clone <repo-url>
cd <repo-dir>
uv sync

# Run tests
uv run --group dev pytest

# Run as module
uv run python -m loop_kit init

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.1

Apr 3, 2026

0.3.0

Apr 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_task_runner-0.3.1.tar.gz (103.5 kB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_task_runner-0.3.1-py3-none-any.whl (62.4 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file agent_task_runner-0.3.1.tar.gz.

File metadata

Download URL: agent_task_runner-0.3.1.tar.gz
Upload date: Apr 3, 2026
Size: 103.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for agent_task_runner-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`807e9f27c19170ab66a26efd8d651c829d352a45378c48c3d0badebacd0aa991`
MD5	`2a8a433828df4c2babf34b207ce12442`
BLAKE2b-256	`dac4e8b17fac3c5d705d3e267fc3d7daa9e6efcf3ee907530babdf9ef6aadf61`

See more details on using hashes here.

File details

Details for the file agent_task_runner-0.3.1-py3-none-any.whl.

File metadata

Download URL: agent_task_runner-0.3.1-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 62.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for agent_task_runner-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ebc60b654ebf75fdead5726db30d88fa04769c1a33431d6b87567c84b029df52`
MD5	`4d2f4b6411cfb69f85a1fa2602e2b63f`
BLAKE2b-256	`310245d6003e2320a085054af442b8052683ed80fd97ecb359df27ed1100a612`

See more details on using hashes here.

agent-task-runner 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

agent-task-runner

Quick Start

Prerequisites

CI

CLI Reference

loop run flags

loop dispatch-metrics flags

loop knowledge commands

Configuration files and env vars

File Bus Protocol

Context files

Quickstart vs Handoff vs Warm Resume

Dispatch phase metrics

Key JSON schemas

Archive

Architecture

Prompt Templates

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`loop run` flags

`loop dispatch-metrics` flags

`loop knowledge` commands