Ask one question to multiple local AI coding CLIs in parallel and collect their answers.
Project description
MOA - Mixture of Agents
Ask one question to multiple local AI coding CLIs in parallel and collect their answers. MOA detects which agent CLIs you have installed (Claude Code, Codex, agy, opencode), fans your prompt out to them, and streams each answer back the moment that agent finishes. Or run moa distill to have a strong aggregator merge those answers into a single unified response, or moa debate to have them critique each other across rounds while a moderator checks for convergence and writes the verdict.
It's a drop-in, batteries-included replacement for hand-rolling parallel claude -p / codex exec / opencode run calls (or a "peer review" agent skill): one command, clean attributed output, made to be called by a human or by another agent.
The package is named moa-cli but installs the command moa.
uv tool install moa-cli
moa ask "Is Postgres or SQLite better for a desktop app?"
Or run it once without installing:
uvx --from moa-cli moa ask "Review this plan."
Requirements. MOA drives agent CLIs you install separately - it ships no model or API key of its own. You need at least two of
claude(Claude Code),codex,agy(Antigravity), andopencodeon yourPATHand logged in. Runmoa doctorfirst to see which ones MOA can find; with only one installed, the "council" collapses to a single answer.
Why
A single model gives you one perspective. Asking three frontier models the same question - and seeing where they agree, diverge, or contradict - is a fast, cheap way to pressure-test an answer. MOA makes that a one-liner using the CLIs you already pay for, with no API keys of its own.
Example
$ moa ask "Is Postgres or SQLite better for a desktop app?"
Asking claude, codex, agy (timeout 900s, read-only)
──────────────── claude (opus) · OK · 3.2s ────────────────
For a single-user desktop app, SQLite is almost always the right call:
zero-config, serverless, the whole DB is one file you can ship... [trimmed]
─────────────── codex (gpt-5.5) · OK · 4.1s ───────────────
Use SQLite unless you expect concurrent writers or need network access.
For a desktop app neither is likely, so SQLite wins on simplicity... [trimmed]
The selection note goes to stderr; the attributed answers go to stdout. In a terminal
each answer gets the rule shown above; when piped or read by another agent, the same
blocks render as plain ## ... headings. Add --json for machine-readable JSONL.
Usage
MOA has three prompt verbs that share the same selection/output options:
moa ask PROMPT- council / peer review: N agents answer the same prompt in parallel; every answer is returned with attribution, streamed as it lands.moa distill PROMPT- synthesis: run the council, then one strong aggregator merges the answers into a single unified response.moa debate PROMPT- sequential debate: two debaters answer and adversarially critique each other across rounds, with a moderator that checks for convergence between rounds and writes the final verdict. The costliest mode; read the caveats below before reaching for it.
moa doctor # show installed CLIs and their default models
moa ask "Should this feature use SQLite?" # ask the top 3 installed agents (read-only)
moa ask -n 2 "..." # ask only the top 2 (priority order)
moa ask -p claude -p agy "..." # pin specific agents
moa ask -x claude "..." # drop an agent (e.g. exclude the caller's own model)
moa ask -m claude=sonnet "..." # override which model a tool uses
moa ask --yolo "..." # grant full write access (default is read-only)
moa ask --json "..." # machine-readable JSONL (for agents/pipes)
git diff | moa ask -f - "Review this diff." # read the prompt from stdin
moa distill "Design a rate limiter." # council, then merge into one answer
moa distill -s codex "..." # pick who distills (auto | random | provider)
moa debate "Is this race condition real?" # 2 debaters; the first also moderates (default 2 agents)
moa debate -r 3 "..." # more rounds (default 2, hard max 4)
moa debate --moderator agy "..." # pin a neutral moderator (a non-debater)
The shared options (-n/--num, -p/--provider, -x/--exclude, -m/--model, -t/--timeout, -f/--file, --json, --yolo) work identically on all three verbs. distill adds -s/--synthesizer; debate adds -r/--rounds and --moderator.
Read-only by default
MOA is built to be called autonomously, so by default no agent can write files or run mutating commands. Each agent runs in its tool's safest mode: it may read local files (and, where the tool allows, research online), but it cannot edit anything. This is enforced by spawning each CLI with its own read-only flags:
| Provider | Read-only (default) | Reads files | Web research |
|---|---|---|---|
claude |
--permission-mode default |
yes | yes |
codex |
-s read-only |
yes | no (sandbox blocks network) |
opencode |
--agent plan |
yes | yes |
agy |
--sandbox (partial: shell only - can still edit files) |
yes | yes |
claude's --permission-mode default is read-only in moa's non-interactive use: it reads
files and researches online with the full toolset, but any write or edit needs an interactive
approval that never comes under -p, so all mutations are denied. (plan mode is not
usable headless - it emits a plan and waits for approval instead of answering.)
codex's read-only mode is a kernel sandbox that also blocks network, so codex does no
web research in the default mode (it still reads local files). agy has no true
read-only mode: its --sandbox flag restricts agy's terminal/shell but does not stop
its write_file tool, so agy can still edit files even in the default mode. This is
partial protection (it closes the shell vector only), not read-only. moa applies
--sandbox as the next-best safeguard and the selection note on stderr states honestly that
agy is shell-sandboxed but can still edit files.
--yolo (full write access)
Pass --yolo to grant every agent full write access (file edits and shell commands,
auto-approved). Use it only when you actually want the agents to change your working tree.
moa ask --yolo "Refactor this module and run the tests."
Under --yolo every agent gets full write access. For agy this means dropping
--sandbox, so agy --yolo runs with no shell restrictions at all. In the default mode,
agy runs with --sandbox (partial protection: shell only - it can still edit files), and
MOA states that honestly on stderr.
How agents are selected
-n/--num (default 3) picks the first N installed agents from a popularity-ordered priority list:
claude -> codex -> agy -> opencode
So moa ask -n 3 on a machine with all four installed asks Claude, Codex, and agy (opencode is #4). agy has no true read-only mode, so in the default mode it runs with --sandbox (partial protection: shell only - it can still edit files) and MOA flags that with an honest note on stderr; it is not excluded. Use -p/--provider (repeatable) to pin an exact set and ignore -n.
Use -x/--exclude (repeatable) to drop one or more agents from the run. Exclusion is applied before -n takes the first N, and it also drops excluded names from an explicit -p set. It is off by default. The motivating case: an agent (e.g. Claude Code) calls moa for other opinions; moa ask -x claude makes sure one "peer" isn't just the caller's own model. So moa ask -n 3 -x claude asks Codex, agy, and opencode.
Choosing models
Each tool ships with a reasonable default model, but you can override which model any tool uses with -m/--model PROVIDER=MODEL (repeatable). Only the providers you name change; the rest keep their defaults.
moa ask -m claude=sonnet -m agy="Gemini 3.1 Pro (Low)" "..."
The model-string format differs per tool and is passed through verbatim (the tool's own CLI validates it):
| Provider | Default | -m format |
|---|---|---|
claude |
opus |
short id, e.g. claude=sonnet |
codex |
gpt-5.5 |
model id, e.g. codex=gpt-5.5 |
agy |
Gemini 3.1 Pro (High) |
exact display name, e.g. agy="Gemini 3.1 Pro (Low)" |
opencode |
(tool's authed default) | provider/model slug, e.g. opencode=anthropic/claude-sonnet-4 |
opencode has no built-in default; without an override it omits -m and lets opencode pick. Pass -m opencode=provider/model to pin one.
Configuration
To avoid repeating the same flags on every call, persist your own defaults in a config file. MOA reads it for every verb and merges it under your flags.
Location. ~/.moa/config.toml (the dir is created on first write). Set $MOA_CONFIG_DIR to point the whole config layer somewhere else (useful in tests/CI).
Precedence. built-in default < config file < CLI flag. A flag always wins; the config file only changes a default when that flag is omitted; an absent file means today's built-in behaviour.
Keys (all shared across ask/distill/debate):
| Key | Type | Example |
|---|---|---|
num |
int (>= 1) | num = 2 |
timeout |
seconds (> 0) | timeout = 120 |
exclude |
list of provider names | exclude = ["claude"] |
synthesizer |
auto/random/provider |
synthesizer = "codex" |
[providers.<name>] |
per-provider model + effort |
see below |
[models] |
DEPRECATED provider -> model table | claude = "sonnet" |
# ~/.moa/config.toml
num = 2
timeout = 120
exclude = ["claude"]
synthesizer = "auto"
[providers.codex]
model = "gpt-5.5"
effort = "high"
[providers.opencode]
model = "zai-coding-plan/glm-5.2"
effort = "high"
Model and effort are grouped per provider under [providers.<name>]. The flat [models] table still works as a deprecated alias for [providers.<name>].model; when both set a model for the same provider, the [providers.<name>] block wins (MOA prints a one-line note, not an error).
moa config inspects and edits the file (it creates the dir/file as needed and validates provider names):
moa config show # effective config (defaults + file) + path
moa config path # print the config file path
moa config set num 2 # set a scalar
moa config set exclude claude,codex # set the exclude list (comma-separated)
moa config set model codex=gpt-5.5 # set a provider's model
moa config set effort codex=high # set a provider's reasoning effort
moa config unset num # remove a key
moa config unset model codex # remove one provider's model
moa config unset effort codex # remove one provider's effort
The role defaults are persistable too: the distill synthesizer and the debate moderator (e.g. moa config set synthesizer codex, moa config set moderator agy). debate's -r/--rounds is not persisted. CLI -m overrides win per-provider over the config model.
Reasoning / effort
Pin a per-provider reasoning/effort level in config so the council runs each tool at the depth you want without repeating flags. This is config-only: there is intentionally no -e/--effort CLI flag.
MOA uses raw pass-through with zero value mapping. It does not normalize effort across providers or invent a canonical low/med/high scale. You write the exact value the target tool expects, and MOA pastes it verbatim into that provider's native flag. The only thing MOA maps is where the value lands in each provider's argv, never the value itself:
| Provider | effort lands in |
Notes |
|---|---|---|
codex |
-c model_reasoning_effort=<value> |
generic config override |
opencode |
--variant <value> |
opencode's "model variant (provider-specific reasoning effort)" |
agy |
(none) | reasoning is part of the model name, e.g. Gemini 3.1 Pro (High) |
claude |
(none) | no per-call effort flag |
Values are tool-specific and not validated by MOA (only "non-empty if present"): a value the target tool rejects fails at that tool, not in MOA. When no effort is configured for a provider, MOA passes no effort flag at all, so the tool's own default stands. Setting effort for agy/claude is stored but inert (they have no effort flag); MOA notes this when you set it.
Output
- stdout carries only content. In a terminal, each agent's answer is fronted by a centered box-drawing rule naming it (
──── claude (opus) · OK · 3.5s ────) with blank lines for separation, flushed the instant that agent finishes. When stdout is piped or read by an agent (not a TTY), the same block renders as a plain, low-noise## claude (opus) · OK · 3.5sheading instead - no box-drawing.moa distillemits only the final merged block. - stderr carries progress and selection notes (
Asking claude, codex ...), so piping stdout stays clean. --jsonemits one JSON object per line (JSONL):askwrites a{"type": "response", ...}record per agent as it completes;distillwrites a single{"type": "synthesis", ...}record (only the merged answer);debatewrites a{"type": "debate_turn", "round": N, ...}record per turn plus a final{"type": "verdict", ...}record. Ideal when another agent calls MOA and parses the result.
moa distill (synthesis)
distill runs the same council fan-out as ask, then one more pass where a strong aggregator merges the collected answers into a single, unified answer. It returns only that merged answer - the individual proposer responses are intermediates and are not printed (each one's arrival is noted on stderr so the wait isn't silent). It needs at least two successful proposer answers; with fewer it skips the merge and says so on stderr. The aggregator is chosen with -s/--synthesizer:
auto(default) - the highest-priority agent that ran (deterministic)random- pick one of the agents that ran, at random- a provider name (
claude,codex,agy,opencode)
The aggregator prompt is adapted from the Mixture-of-Agents "Aggregate-and-Synthesize" prompt (Wang et al. 2024): it tells the aggregator to critically evaluate the inputs (some may be biased or incorrect) and not to simply replicate them but offer a refined, accurate, comprehensive reply.
moa debate (sequential debate + moderator)
debate is the opt-in, highest-cost mode. Instead of fanning out in parallel, it runs a sequential, adversarial exchange overseen by a moderator that checks for convergence between rounds and writes the final answer.
Roles. The top 2 selected agents are the debaters. The moderator runs the per-round convergence check and writes the verdict; by default it is the top-priority selected agent (so the default 2-agent debate has agent #1 also moderate). Debate only needs 2 agents; with fewer it exits cleanly rather than silently degrading. For a neutral moderator that doesn't also debate, select a third agent and pin it: moa debate -n 3 --moderator <provider> (the moderator must be one of the selected agents). The moderator only ever sees the transcript anonymized + shuffled, so even when it is itself a debater it can't favour its own answer.
Rounds. -r/--rounds defaults to 2 (gains plateau around 2-3 rounds while token cost grows multiplicatively) and is hard-capped at 4 - higher values are clamped with a warning on stderr.
The loop. Round 1: debater A answers cold; debater B sees A's answer with an adversarial-stance instruction ("identify errors/weaknesses before giving your own answer; do not agree merely to reach consensus"). Each later round, every debater sees the other's latest answer and responds in the same spirit. After each non-final round the moderator reads the debaters' latest answers and replies DONE (they've converged or fully aired their disagreement) or CONTINUE; a DONE stops the debate before the cap.
The verdict. The moderator reads the full transcript - presented anonymized and order-shuffled (so brand/position bias is killed, even when the moderator was a debater) - and writes the final answer. Its prompt instructs it to weigh correctness and evidence above confidence and fluency. The verdict is the final block (──── verdict · moderator <name> · ... ────).
Streaming/output. Each debater's turn streams as it completes (──── round N · <provider> · ... ────), then the moderator's verdict last. --json emits a {"type": "debate_turn", "round": N, ...} record per turn plus a final {"type": "verdict", "moderator": "<name>", ...} record.
Safety. Debaters and the moderator run in the same read-only (or --yolo) mode as the other verbs - there is no permission bypass. agy's partial-sandbox caveat (shell only; it can still edit files) applies here too.
Caveat - use sparingly. Debate is the costliest mode (roughly
debaters × roundscalls, plus a moderator check per round and the verdict) and the least reliably beneficial. The research is mixed-to-negative: multi-agent debate can converge on a wrong answer through conformity, a confident-but-incorrect debater can win on persuasiveness over correctness, and more rounds can entrench an error rather than fix it. The moderator and the adversarial-stance prompt are there to fight these failure modes, but they do not eliminate them. For most questions,askordistillis the better default; reach fordebatewhen you specifically want to surface and stress-test disagreement. (See Can LLM Agents Really Debate? arXiv:2511.07784, Talk Isn't Always Cheap arXiv:2509.05396, and the conformity/position-bias work cited in the design notes.)
Attribution policy
The human (or agent) reading MOA's output always gets correct attribution: every response block shows the real provider name. There is no human-facing anonymization toggle.
The distill aggregator is a different story. To stop it picking favourites by brand, it always receives the proposer answers anonymized as "Response A / B / C" and order-shuffled (no toggle). The merged answer itself is brand-agnostic prose, and the A/B/C labels never leak into stdout, stderr, or the JSON.
Supported agents
Invocations below show the default (read-only) flags; --yolo swaps in each tool's full-access mode.
| Provider | CLI | Invocation (read-only default) |
|---|---|---|
claude |
claude |
claude --model opus --permission-mode default -p PROMPT |
codex |
codex |
codex exec -m gpt-5.5 --skip-git-repo-check -s read-only PROMPT |
agy |
agy |
agy --sandbox --model "Gemini 3.1 Pro (High)" -p PROMPT (partial: shell only - can still edit files) |
opencode |
opencode |
opencode run --agent plan PROMPT |
Adding a new agent is a single entry in the PROVIDERS table in
src/moa_cli/providers.py (executable, default model, command builder,
permission flags); it then participates in detection, -n selection, and
distill automatically.
Agent skill
If you drive MOA from an agent (e.g. Claude Code), there's a ready-made skill at
skills/moa/SKILL.md: it tells an agent when to reach for MOA and
how to use it (verb choice, self-exclusion via -x <self>, parsing the JSONL output). It
supersedes hand-rolling a "peer review" skill.
Development
uv sync
uv run pytest
uv run ruff check src tests
MIT licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file moa_cli-0.3.4.tar.gz.
File metadata
- Download URL: moa_cli-0.3.4.tar.gz
- Upload date:
- Size: 22.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32e979da7e7a8711c12dfb16270e8c4010422c09cca7cb91a40442b2b3b07402
|
|
| MD5 |
23e6d9fdecdc6c9bbdd249955688b27c
|
|
| BLAKE2b-256 |
62ab1de7d7a64679a4c3ed67d29d7b46b892b88f14f7a5cf1d6f3fa66222f527
|
File details
Details for the file moa_cli-0.3.4-py3-none-any.whl.
File metadata
- Download URL: moa_cli-0.3.4-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4eb4d943eefb28d9043c6b6dc6f6300883d3a8967e89f115ca793d1e3845d4af
|
|
| MD5 |
fafe7b8b2c0210c9543ec9223adf044d
|
|
| BLAKE2b-256 |
33770eb03ecd90876330c3cb13123cb5af896e1725146e2c4a9e78f53d051dbe
|