Multi-agent adversarial review & deliberation for plans/specs on subscription CLIs (reduce rework before execution)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

AdrianChen

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
Programming Language
- Python :: 3
- Python :: 3.10
Topic
- Software Development :: Quality Assurance

Project description

challenge-plans

中文文档: README-zh.md

Adversarially review your plan or spec before you execute it — across the coding CLIs you already have logged in. No API keys.

challenge-plans orchestrates the subscription AI coding CLIs already on your machine (Claude Code, Codex, …) to cross-examine a plan/spec and surface the flaws that cause rework downstream — and to vote across options when you're unsure. It also reviews a raw git diff as a lightweight code review pass, and drops in as an agent skill. It runs on your existing subscriptions, so there are no per-token API charges. It slots into the superpowers plan lifecycle: writing-plans → challenge-plans → executing-plans.

$ challenge-plans run plan.md --type spec --profile standard --sink markdown

# challenge-plans · challenge · verdict: request_changes
- panel: expected 3 / collected 3 · complete ✓
- diversity: 2 families
- verified: 3 high/critical reviewed by Verifier (✓ verified, may hard-gate; ? unverified, advisory)
- surviving objections: 4

- [high✓]   sensitive data sent to a third-party LLM with no privacy boundary  @L42-43  (security_or_privacy_boundary, by claude:scope-boundary)
- [high✓]   "schema-aligned" claimed but there's no contract test             @L12-30  (integration_contract_gap, by gpt:correctness)
- [high✓]   no measurable acceptance threshold                               @L1      (contract_violation, by preflight)
- [medium?] missing_fields vs null semantics left undefined                  @L32-34  (ambiguity_to_wrong_implementation)

Why challenge-plans

🔑 No API keys, no per-token charges — it drives the subscription CLIs you're already logged into (Claude Code, Codex). Bring at least one.
🧪 Evidence beats headcount — a minority objection with a reproduction can override a majority vote; correctness is not decided by voting.
🤝 Cross-family verification — an objection only earns hard-gate authority (✓) when an independent model family reproduces it with concrete, line-anchored evidence. Single-model claims stay advisory.
🛡️ Guards 7 known multi-agent failure modes — vote loss, option anchoring, premature hand-off, majority-over-minority, single-round complacency, false consensus, false convergence. Each was hit (and fixed) while building this tool with its own adversarial process.
🌍 Reads in your language — the codebase is English, but --lang zh (or ja, de, fr, …) makes every reviewer write its findings in your language while JSON keys and line anchors stay machine-stable. One flag, no separate build — see Output in your language.

Quickstart

Requires Python ≥ 3.10 (PyYAML installs automatically). Bring at least one logged-in coding CLI — Claude Code (claude) or OpenAI Codex (codex); two different vendors unlock cross-family verification.

git clone https://github.com/hiadrianchen/challenge-plans && cd challenge-plans
pip install -e .                                                          # exposes the `challenge-plans` command
challenge-plans doctor                                                    # which backend CLIs are logged in
challenge-plans run examples/spec-sample.md --type spec --sink markdown   # see a verdict on the bundled sample

Hand the repo to your coding agent instead — "Install and set up challenge-plans from this repo, then run challenge-plans doctor" — and it'll do the above. To use it as an agent skill, drop SKILL.md where your agent discovers skills.

Use

challenge-plans doctor                                                                 # which backends are ready
challenge-plans run path/to/spec.md --type spec --profile standard --sink markdown     # harden a plan/spec
challenge-plans run change.diff --type diff --sink markdown                             # review a git diff
challenge-plans weigh path/to/options.yaml --profile standard --sink markdown           # vote across options
challenge-plans run path/to/spec.md --enforce                                           # CI gate: non-approve exits non-zero
challenge-plans run path/to/spec.md --type spec --sink markdown --lang zh                # findings written in Chinese
# not pip-installed? prefix with: PYTHONPATH=src python3 -m challenge_plans.cli ...

Ready-to-run samples live in examples/ (spec-sample.md, options.yaml). options.yaml:

question: Refactor auth with approach A or B?
options:
  - id: A
    text: One-shot rewrite — concentrated risk, clean result
  - id: B
    text: Incremental migration — slower, every step reversible

--profile fast|standard|deep, --sink stdout|markdown, --enforce (non-approve verdicts exit non-zero; advisory exit 0 by default).
--lang <code> writes the human-readable output in your language (default en) — see below.
[sev✓] = cross-family verified, may hard-gate; [sev?] = unverified, advisory only.
Artifact types: --type spec and --type diff are supported; plan / decision are reserved (rubric pending).

The bundled SKILL.md routes review/QA of a plan/spec to run automatically; option-voting is the weigh subcommand.

Output in your language

challenge-plans ships an English codebase, but the reviewers can answer in any language — just add --lang:

challenge-plans run plan.md --type spec --lang zh     # objections, evidence, reproductions in Chinese
challenge-plans weigh options.yaml --lang ja          # deliberation reasons in Japanese

--lang only switches the human-readable prose (steelman, titles, evidence, reproductions, vote reasons). JSON keys, enum values, and L12-15 line anchors stay verbatim, so parsing, dedup, and CI gates are unaffected. It's equivalent to exporting CHALLENGE_PLANS_LANG once. There's no separate translated build to maintain — the same English source localizes on demand.

As an agent skill: your agent just passes --lang <your-language> and the whole cross-review comes back localized. The bundled SKILL.md documents the flag so the calling agent can set it from the user's language automatically.

Two modes

challenge-plans isn't one feature — it's two modes on one engine. The calling agent routes by intent; the user never has to pick:

	challenge (adversarial)	weigh-options (deliberation)
When	You have a drafted plan/spec to poke holes in / harden	You have several options / a pile of to-dos and aren't sure which
Routing signal	a single drafted artifact + "review / find flaws / can this execute"	multiple candidates + "which one / rank these / is it worth it"
Aggregation	Evidence survival — a minority can be right, no majority vote	Weighted majority + exposed dissent — only genuine trade-offs get voted on
Output	6-state verdict + surviving objections + reproductions / counter-evidence	ranked options + vote tally + strongest dissent

The agent picks the mode — it isn't dumped on the user: it reads the intent and routes "review a drafted artifact" to adversarial mode and "choose among options" to deliberation, with deterministic routing signals defining the boundary. During deliberation, if an option is flagged with a mechanically verifiable blocker, the recommendation is downgraded to discuss and you're asked to verify it in challenge mode rather than adopting it outright — so a vote can never outweigh a falsifiable minority objection.

How it works

Adversarial mode (reduce-rework loop):

drafted artifact + bounded context
  → multiple persona/CLI challengers each steelman → find flaws (bound to specific text, no hedging)
  → Verifier (cross-family) produces a minimal reproduction / contradicting source line
  → dedup by canonical key + evidence-survival
  → single verdict pipeline → 6-state verdict + panel-integrity check
  → (--deep: multi-round to two-condition convergence)

Deliberation mode — the methodology is a strict three-phase flow. The weigh CLI implements phase ③ (it votes on the options you hand it); phases ①② are the calling agent's responsibility before invoking it — no shortcuts:

① align    (agent) share full background with every voter first — the question, constraints, known facts — don't pre-supply options
② collect  (agent) each voter independently, unseen by the others and not fed the orchestrator's preferences, generates candidates → dedup/cluster into an option pool
③ vote     `challenge-plans weigh` votes on that option pool (model_family-weighted to block false consensus) → ranking + tally + dissent
           hands back to a human only on a tie / missing votes; otherwise closes the loop and returns a result

What it guards against — 7 multi-agent failure modes

These traps are ones a naive multi-agent setup almost always falls into — and ones we hit ourselves while building this tool with its own adversarial process. Each guard is built into the design, and the design is dogfooded:

Vote/finding loss — a challenger is truncated/timed-out/unparseable and the system silently aggregates a partial panel. Guard: machine-readable capture + per-voter integrity self-check; missing votes never approve or declare a majority.
Option anchoring — the orchestrator only offers its own pre-picked options, so agents merely ratify the framing. Guard: deliberation always diverges (generate first, vote second); voters aren't fed the orchestrator's preferences.
Premature hand-off — the orchestrator bounces the open decision back to the human mid-way instead of finishing the vote. Guard: close the loop and return a result; hand back only on a tie / missing votes.
Majority over minority — out-voting a minority that has a reproducible blocker. Guard: two modes with split aggregation + the escape gate; adversarial mode bans voting and lets evidence beat headcount.
Single-round complacency — one pass declared sufficient. Guard: --deep multi-round to convergence + adversarial review of the code itself before shipping.
False consensus — same-model personas counted as independent votes, so one model's bias gets cloned into a "majority". Guard: per-model_family weight cap, raw/weighted both shown, single-family warning.
False convergence — declaring "done" when no new objection appeared but an old blocker is still open. Guard: two-condition convergence (new_surviving == 0 and unresolved_required == 0).

Backends

challenge-plans drives whatever subscription coding CLI you already have logged in — e.g. Claude Code (claude) or OpenAI Codex (codex). You don't need any specific one. With two different vendors it can cross-verify findings; with one, results stay advisory. No API keys, and no per-token API charges from this tool (doctor checks the CLIs are logged in, not your billing; usage still counts against your normal subscription limits).

Status

v1 — usable. Both modes work end-to-end, validated against a real spec and pinned by a pytest suite, hardened across multiple cross-agent adversarial-review rounds.

Known boundaries (also reflected in the run output): concern dedup is exact-anchor only; no idle-timeout (wall-clock only); deliberation blockers are flagged, not yet auto-verified by the Verifier; the open-decision divergence phase is the calling agent's job; manual_paste/Gemini adapters are follow-ups.

Testing

pip install -e ".[dev]" && pytest      # pythonpath/testpaths preconfigured

The suite pins every invariant established across the adversarial-review rounds.

Contributing

Issues and PRs welcome — see CONTRIBUTING.md. The project is dogfooded: reviewing your own change with challenge-plans run <change>.diff --type diff before opening a PR is encouraged.

License

Apache-2.0.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

AdrianChen

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
Programming Language
- Python :: 3
- Python :: 3.10
Topic
- Software Development :: Quality Assurance

Release history Release notifications | RSS feed

0.1.1

Jun 24, 2026

This version

0.1.0

Jun 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

challenge_plans-0.1.0.tar.gz (42.8 kB view details)

Uploaded Jun 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

challenge_plans-0.1.0-py3-none-any.whl (36.7 kB view details)

Uploaded Jun 24, 2026 Python 3

File details

Details for the file challenge_plans-0.1.0.tar.gz.

File metadata

Download URL: challenge_plans-0.1.0.tar.gz
Upload date: Jun 24, 2026
Size: 42.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for challenge_plans-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3b09940f0019410d797707565ae5883e7395f8d006e89bbbb4fce4394c413331`
MD5	`456ab2eec5e3755be4374cdbcb1ededf`
BLAKE2b-256	`aa084ea066a8b4ccdca01e98bdb37c999f07f21b08b3f2cb313a1c5cae2bb3b9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for challenge_plans-0.1.0.tar.gz:

Publisher: release.yml on hiadrianchen/challenge-plans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: challenge_plans-0.1.0.tar.gz
- Subject digest: 3b09940f0019410d797707565ae5883e7395f8d006e89bbbb4fce4394c413331
- Sigstore transparency entry: 1935219570
- Sigstore integration time: Jun 24, 2026
Source repository:
- Permalink: hiadrianchen/challenge-plans@1044316bb0c51396953e1d424c137d59334714cf
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/hiadrianchen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1044316bb0c51396953e1d424c137d59334714cf
- Trigger Event: push

File details

Details for the file challenge_plans-0.1.0-py3-none-any.whl.

File metadata

Download URL: challenge_plans-0.1.0-py3-none-any.whl
Upload date: Jun 24, 2026
Size: 36.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for challenge_plans-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2cdacc3f6aace3a76d64a6618042a54295e61775f7fcf735e735837a0b307705`
MD5	`7ce98a3b318162c6ae7835757116be3c`
BLAKE2b-256	`de0193393fd832d65733f9dc8234e8b0b99a1564d033ef9c1adfbb32d9b179ff`

See more details on using hashes here.

Provenance

The following attestation bundles were made for challenge_plans-0.1.0-py3-none-any.whl:

Publisher: release.yml on hiadrianchen/challenge-plans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: challenge_plans-0.1.0-py3-none-any.whl
- Subject digest: 2cdacc3f6aace3a76d64a6618042a54295e61775f7fcf735e735837a0b307705
- Sigstore transparency entry: 1935219618
- Sigstore integration time: Jun 24, 2026
Source repository:
- Permalink: hiadrianchen/challenge-plans@1044316bb0c51396953e1d424c137d59334714cf
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/hiadrianchen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1044316bb0c51396953e1d424c137d59334714cf
- Trigger Event: push

challenge-plans 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

challenge-plans

Why challenge-plans

Quickstart

Use

Output in your language

Two modes

How it works

What it guards against — 7 multi-agent failure modes

Backends

Status

Testing

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance