Skip to main content

Ask one question to multiple local AI coding CLIs in parallel and collect their answers.

Project description

moa - mixture of agents

CI

MOA - Mixture of Agents

Ask one question to multiple local AI coding CLIs in parallel and collect their answers. MOA detects which agent CLIs you have installed (Claude Code, Codex, agy, opencode), fans your prompt out to them, and streams each answer back the moment that agent finishes. Or run moa distill to have a strong aggregator merge those answers into a single unified response, or moa debate to have them critique each other across rounds before a neutral judge gives the verdict.

It's a drop-in, batteries-included replacement for hand-rolling parallel claude -p / codex exec / opencode run calls (or a "peer review" agent skill): one command, clean attributed output, made to be called by a human or by another agent.

The package is named moa-cli but installs the command moa.

uv tool install moa-cli
moa ask "Is Postgres or SQLite better for a desktop app?"

Or run it once without installing:

uvx --from moa-cli moa ask "Review this plan."

Requirements. MOA drives agent CLIs you install separately - it ships no model or API key of its own. You need at least two of claude (Claude Code), codex, agy (Antigravity), and opencode on your PATH and logged in. Run moa doctor first to see which ones MOA can find; with only one installed, the "council" collapses to a single answer.

Why

A single model gives you one perspective. Asking three frontier models the same question - and seeing where they agree, diverge, or contradict - is a fast, cheap way to pressure-test an answer. MOA makes that a one-liner using the CLIs you already pay for, with no API keys of its own.

Example

$ moa ask "Is Postgres or SQLite better for a desktop app?"
Asking claude, codex, agy (timeout 180s, read-only)

──────────────── claude (opus) · OK · 3.2s ────────────────

For a single-user desktop app, SQLite is almost always the right call:
zero-config, serverless, the whole DB is one file you can ship... [trimmed]

─────────────── codex (gpt-5.5) · OK · 4.1s ───────────────

Use SQLite unless you expect concurrent writers or need network access.
For a desktop app neither is likely, so SQLite wins on simplicity... [trimmed]

The selection note goes to stderr; the attributed answers go to stdout. In a terminal each answer gets the rule shown above; when piped or read by another agent, the same blocks render as plain ## ... headings. Add --json for machine-readable JSONL.

Usage

MOA has three prompt verbs that share the same selection/output options:

  • moa ask PROMPT - council / peer review: N agents answer the same prompt in parallel; every answer is returned with attribution, streamed as it lands.
  • moa distill PROMPT - synthesis: run the council, then one strong aggregator merges the answers into a single unified response.
  • moa debate PROMPT - sequential debate: two debaters answer and adversarially critique each other across rounds, then a separate neutral judge writes the final verdict. The costliest mode; read the caveats below before reaching for it.
moa doctor                                  # show installed CLIs and their default models
moa ask "Should this feature use SQLite?"   # ask the top 3 installed agents (read-only)
moa ask -n 2 "..."                          # ask only the top 2 (priority order)
moa ask -p claude -p agy "..."              # pin specific agents
moa ask -x claude "..."                     # drop an agent (e.g. exclude the caller's own model)
moa ask -m claude=sonnet "..."              # override which model a tool uses
moa ask --yolo "..."                        # grant full write access (default is read-only)
moa ask --json "..."                        # machine-readable JSONL (for agents/pipes)
git diff | moa ask -f - "Review this diff." # read the prompt from stdin
moa distill "Design a rate limiter."        # council, then merge into one answer
moa distill -s codex "..."                  # pick who distills (auto | random | provider)
moa debate "Is this race condition real?"   # 2 debaters + a judge (default n=3)
moa debate -r 3 "..."                        # more rounds (default 2, hard max 4)
moa debate -j claude "..."                   # pin who judges (must not be a debater)

The shared options (-n/--num, -p/--provider, -x/--exclude, -m/--model, -t/--timeout, -f/--file, --json, --yolo) work identically on all three verbs. distill adds -s/--synthesizer; debate adds -r/--rounds and -j/--judge.

Read-only by default

MOA is built to be called autonomously, so by default no agent can write files or run mutating commands. Each agent runs in its tool's safest mode: it may read local files (and, where the tool allows, research online), but it cannot edit anything. This is enforced by spawning each CLI with its own read-only flags:

Provider Read-only (default) Reads files Web research
claude --permission-mode plan yes yes
codex -s read-only yes no (sandbox blocks network)
opencode --agent plan yes yes
agy --sandbox (partial: shell only - can still edit files) yes yes

codex's read-only mode is a kernel sandbox that also blocks network, so codex does no web research in the default mode (it still reads local files). agy has no true read-only mode: its --sandbox flag restricts agy's terminal/shell but does not stop its write_file tool, so agy can still edit files even in the default mode. This is partial protection (it closes the shell vector only), not read-only. moa applies --sandbox as the next-best safeguard and the selection note on stderr states honestly that agy is shell-sandboxed but can still edit files.

--yolo (full write access)

Pass --yolo to grant every agent full write access (file edits and shell commands, auto-approved). Use it only when you actually want the agents to change your working tree.

moa ask --yolo "Refactor this module and run the tests."

Under --yolo every agent gets full write access. For agy this means dropping --sandbox, so agy --yolo runs with no shell restrictions at all. In the default mode, agy runs with --sandbox (partial protection: shell only - it can still edit files), and MOA states that honestly on stderr.

How agents are selected

-n/--num (default 3) picks the first N installed agents from a popularity-ordered priority list:

claude  ->  codex  ->  agy  ->  opencode

So moa ask -n 3 on a machine with all four installed asks Claude, Codex, and agy (opencode is #4). agy has no true read-only mode, so in the default mode it runs with --sandbox (partial protection: shell only - it can still edit files) and MOA flags that with an honest note on stderr; it is not excluded. Use -p/--provider (repeatable) to pin an exact set and ignore -n.

Use -x/--exclude (repeatable) to drop one or more agents from the run. Exclusion is applied before -n takes the first N, and it also drops excluded names from an explicit -p set. It is off by default. The motivating case: an agent (e.g. Claude Code) calls moa for other opinions; moa ask -x claude makes sure one "peer" isn't just the caller's own model. So moa ask -n 3 -x claude asks Codex, agy, and opencode.

Choosing models

Each tool ships with a reasonable default model, but you can override which model any tool uses with -m/--model PROVIDER=MODEL (repeatable). Only the providers you name change; the rest keep their defaults.

moa ask -m claude=sonnet -m agy="Gemini 3.1 Pro (Low)" "..."

The model-string format differs per tool and is passed through verbatim (the tool's own CLI validates it):

Provider Default -m format
claude opus short id, e.g. claude=sonnet
codex gpt-5.5 model id, e.g. codex=gpt-5.5
agy Gemini 3.1 Pro (High) exact display name, e.g. agy="Gemini 3.1 Pro (Low)"
opencode (tool's authed default) provider/model slug, e.g. opencode=anthropic/claude-sonnet-4

opencode has no built-in default; without an override it omits -m and lets opencode pick. Pass -m opencode=provider/model to pin one.

Configuration

To avoid repeating the same flags on every call, persist your own defaults in a config file. MOA reads it for every verb and merges it under your flags.

Location. ~/.moa/config.toml (the dir is created on first write). Set $MOA_CONFIG_DIR to point the whole config layer somewhere else (useful in tests/CI).

Precedence. built-in default < config file < CLI flag. A flag always wins; the config file only changes a default when that flag is omitted; an absent file means today's built-in behaviour.

Keys (all shared across ask/distill/debate):

Key Type Example
num int (>= 1) num = 2
timeout seconds (> 0) timeout = 120
exclude list of provider names exclude = ["claude"]
synthesizer auto/random/provider synthesizer = "codex"
[models] provider -> model table claude = "sonnet"
# ~/.moa/config.toml
num = 2
timeout = 120
exclude = ["claude"]
synthesizer = "auto"

[models]
claude = "sonnet"
agy = "Gemini 3.1 Pro (Low)"

moa config inspects and edits the file (it creates the dir/file as needed and validates provider names):

moa config show                       # effective config (defaults + file) + path
moa config path                       # print the config file path
moa config set num 2                  # set a scalar
moa config set exclude claude,codex   # set the exclude list (comma-separated)
moa config set model claude=sonnet    # set one entry in [models]
moa config unset num                  # remove a key
moa config unset model claude         # remove one [models] entry

The synthesizer default is persistable too (e.g. moa config set synthesizer codex); debate's -r/--rounds and -j/--judge are not persisted. CLI -m overrides win per-provider over the config [models] table.

Output

  • stdout carries only content. In a terminal, each agent's answer is fronted by a centered box-drawing rule naming it (──── claude (opus) · OK · 3.5s ────) with blank lines for separation, flushed the instant that agent finishes. When stdout is piped or read by an agent (not a TTY), the same block renders as a plain, low-noise ## claude (opus) · OK · 3.5s heading instead - no box-drawing. moa distill emits only the final merged block.
  • stderr carries progress and selection notes (Asking claude, codex ...), so piping stdout stays clean.
  • --json emits one JSON object per line (JSONL): ask writes a {"type": "response", ...} record per agent as it completes; distill writes a single {"type": "synthesis", ...} record (only the merged answer); debate writes a {"type": "debate_turn", "round": N, ...} record per turn plus a final {"type": "verdict", ...} record. Ideal when another agent calls MOA and parses the result.

moa distill (synthesis)

distill runs the same council fan-out as ask, then one more pass where a strong aggregator merges the collected answers into a single, unified answer. It returns only that merged answer - the individual proposer responses are intermediates and are not printed (each one's arrival is noted on stderr so the wait isn't silent). It needs at least two successful proposer answers; with fewer it skips the merge and says so on stderr. The aggregator is chosen with -s/--synthesizer:

  • auto (default) - the highest-priority agent that ran (deterministic)
  • random - pick one of the agents that ran, at random
  • a provider name (claude, codex, agy, opencode)

The aggregator prompt is adapted from the Mixture-of-Agents "Aggregate-and-Synthesize" prompt (Wang et al. 2024): it tells the aggregator to critically evaluate the inputs (some may be biased or incorrect) and not to simply replicate them but offer a refined, accurate, comprehensive reply.

moa debate (sequential debate + neutral judge)

debate is the opt-in, highest-cost mode. Instead of fanning out in parallel, it runs a sequential, adversarial exchange and then asks a separate neutral judge to write the final answer.

Roles. By default the top 2 selected agents are the debaters and the 3rd is the judge - so the default -n 3 maps to 2 debaters + 1 judge. Pin a specific judge with -j/--judge PROVIDER; the judge must be one of the selected agents and must not also be a debater. Debate needs at least 2 debaters and 1 distinct judge, so it needs at least 3 agents; with fewer it exits with a clear message rather than silently degrading.

Rounds. -r/--rounds defaults to 2 (gains plateau around 2-3 rounds while token cost grows multiplicatively) and is hard-capped at 4 - higher values are clamped with a warning on stderr.

The loop. Round 1: debater A answers cold; debater B sees A's answer with an adversarial-stance instruction ("identify errors/weaknesses before giving your own answer; do not agree merely to reach consensus"). Each later round, every debater sees the other's latest answer and responds in the same spirit. If every debater signals it has no substantive change (it may open its reply with NO SUBSTANTIVE CHANGE), the debate stops early before the cap.

The judge. A model that is not a debater reads the full transcript - presented anonymized and order-shuffled (a model is judging, so brand/position bias is killed) - and writes the final answer. Its prompt instructs it to weigh correctness and evidence above confidence and fluency. The judge's verdict is the final block (──── verdict · judge <name> · ... ────).

Streaming/output. Each debater's turn streams as it completes (──── round N · <provider> · ... ────), then the judge's verdict last. --json emits a {"type": "debate_turn", "round": N, ...} record per turn plus a final {"type": "verdict", ...} record.

Safety. Debaters and the judge run in the same read-only (or --yolo) mode as the other verbs - there is no permission bypass. agy's partial-sandbox caveat (shell only; it can still edit files) applies here too.

Caveat - use sparingly. Debate is the costliest mode (roughly debaters x rounds + 1 model calls) and the least reliably beneficial. The research is mixed-to-negative: multi-agent debate can converge on a wrong answer through conformity, a confident-but-incorrect debater can win on persuasiveness over correctness, and more rounds can entrench an error rather than fix it. The separate neutral judge and the adversarial-stance prompt are there to fight these failure modes, but they do not eliminate them. For most questions, ask or distill is the better default; reach for debate when you specifically want to surface and stress-test disagreement. (See Can LLM Agents Really Debate? arXiv:2511.07784, Talk Isn't Always Cheap arXiv:2509.05396, and the conformity/position-bias work cited in the design notes.)

Attribution policy

The human (or agent) reading MOA's output always gets correct attribution: every response block shows the real provider name. There is no human-facing anonymization toggle.

The distill aggregator is a different story. To stop it picking favourites by brand, it always receives the proposer answers anonymized as "Response A / B / C" and order-shuffled (no toggle). The merged answer itself is brand-agnostic prose, and the A/B/C labels never leak into stdout, stderr, or the JSON.

Supported agents

Invocations below show the default (read-only) flags; --yolo swaps in each tool's full-access mode.

Provider CLI Invocation (read-only default)
claude claude claude --model opus --permission-mode plan -p PROMPT
codex codex codex exec -m gpt-5.5 --skip-git-repo-check -s read-only PROMPT
agy agy agy --sandbox --model "Gemini 3.1 Pro (High)" -p PROMPT (partial: shell only - can still edit files)
opencode opencode opencode run --agent plan PROMPT

Adding a new agent is a single entry in the PROVIDERS table in src/moa_cli/cli.py (executable, default model, command builder, permission flags); it then participates in detection, -n selection, and distill automatically.

Agent skill

If you drive MOA from an agent (e.g. Claude Code), there's a ready-made skill at skills/moa/SKILL.md: it tells an agent when to reach for MOA and how to use it (verb choice, self-exclusion via -x <self>, parsing the JSONL output). It supersedes hand-rolling a "peer review" skill.

Development

uv sync
uv run pytest
uv run ruff check src tests

MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moa_cli-0.2.2.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moa_cli-0.2.2-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file moa_cli-0.2.2.tar.gz.

File metadata

  • Download URL: moa_cli-0.2.2.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for moa_cli-0.2.2.tar.gz
Algorithm Hash digest
SHA256 79836603708332310d11b4cbbea6dc536b7dab82e30eedaa91abc19f733d1060
MD5 5ac71950c1257b1526c33a6cc73fa3ee
BLAKE2b-256 8c80b1331bba51a23e40531d71b51dfd9ae83dc31db72216d6933cb5cef52924

See more details on using hashes here.

File details

Details for the file moa_cli-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: moa_cli-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for moa_cli-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3f66ba578822b406d2fbd50ccddb1d681bf38ab02a1861253a68bd9296a5fea7
MD5 41d68e73a76d562b620f472bb934dea2
BLAKE2b-256 c656fa033f8c84f5d3c23022f8266df6c6b5fc520e8b600193ef35b09d4f1ba1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page