Multi-model panel orchestration engine with an MCP adapter. The engine (consult.*) is MCP-free and usable as a library; the MCP adapter (consult.mcp.*) is an optional extra.
Project description
consult-mcp-server
Get a second opinion from a parallel panel of LLMs — without bloating your agent's context window.
consult is an MCP server that lets your agent (Claude Desktop, Cursor,
Claude Code, etc.) fan a single prompt out to many LLMs in parallel, then
return either the synthesised answer or a manifest of structured
~200-token capsules — so panel breadth doesn't cost parent-context tokens.
┌────────────┐ consult tool call ┌──────────────────┐ parallel ┌──────────┐
│ Your agent │ ───────────────────────▶ │ consult-mcp │ ─────────────▶ │ Claude │
│ (Claude │ "what's your take?" │ (this server) │ │ GPT │
│ Desktop / │ ◀─────────────────────── │ │ ◀───────────── │ Gemini │
│ Cursor / │ synthesis + manifest │ capsules ~200t │ capsules │ Grok │
│ …) │ │ + resources │ │ DeepSeek │
└────────────┘ └──────────────────┘ │ … │
└──────────┘
Why this exists
If your agent already calls claude once, you might wonder why you'd want to
ask 8 more models the same question. Three reasons:
- One pass, many perspectives. Different families catch different things. Anthropic finds different bugs than OpenAI; Gemini calls out different risks; DeepSeek often surfaces the contrarian take.
- Cheap structured second opinion. The manifest's per-panellist capsule is ~200 tokens — your agent can synthesise it in-band without paying for another flagship round-trip.
- No context-window bloat. Full panellist bodies live as MCP resources
at
consult://runs/<id>/responses/<slug>; your agent only fetches them when it needs depth.
Alternatives fall short: PAL consensus serialises calls (sum of latencies);
multi_mcp parallelises but no escape hatch from server-side synth;
skill-only fan-outs assembled by the LLM via bash are brittle (token traps,
key handling, endpoint drift).
Install
Claude Desktop
Claude Desktop does not inherit your shell's
PATHor environment variables — you must give it the absolute path toconsult-mcpand declare API keys inside theenvblock.
Tip: run consult-doctor --config after install to print a ready-to-paste
JSON block populated with the absolute binary path and whichever keys are
present in your shell environment.
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"consult": {
"command": "/Users/you/.local/bin/uvx",
"args": ["--from", "consult-mcp-server[mcp]", "consult-mcp"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-…",
"OPENAI_API_KEY": "sk-…",
"GEMINI_API_KEY": "AIza…",
"OPENROUTER_API_KEY": "sk-or-…"
}
}
}
}
Restart Claude Desktop, then ask: "use the consult tool to ask 3 models which Python package manager I should use."
Cursor
Edit ~/.cursor/mcp.json:
{
"mcpServers": {
"consult": {
"command": "/Users/you/.local/bin/uvx",
"args": ["--from", "consult-mcp-server[mcp]", "consult-mcp"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-…",
"OPENAI_API_KEY": "sk-…"
}
}
}
}
Same caveat as Claude Desktop: absolute path to uvx, env keys in the block.
Claude Code CLI
claude mcp add consult -- uvx --from "consult-mcp-server[mcp]" consult-mcp
The CLI inherits your shell env, so the keys you already have in .env /
your shell rc will be visible.
Docker
docker run -i --rm \
-e ANTHROPIC_API_KEY -e OPENAI_API_KEY -e GEMINI_API_KEY -e OPENROUTER_API_KEY \
-v ~/.consult:/root/.consult \
ghcr.io/irwin-r/consult-mcp-server:latest
Stdio in / stdio out, just like the local binary. Image published per release to GHCR (multi-stage Python 3.12-slim base, ~150MB).
Smithery
https://smithery.ai/server/consult-mcp-server
Smithery's hosted UI prompts for keys; the same smithery.yaml config-schema
applies.
From source (development)
git clone https://github.com/irwin-r/consult-mcp-server
cd consult-mcp-server
uv venv
uv pip install -e ".[dev]"
cp .env.example .env # fill in keys
uv run pytest -v
Verify the install
consult-doctor # offline: config + paths + key presence
consult-doctor --ping # also fires a 1-token call per provider (~$0.0001)
consult-doctor --config # print copy-paste-ready MCP client JSON
The five tools
| Tool | What it does | Use when |
|---|---|---|
consult |
Parallel panel + server-side synthesis. Hero. | "Just give me the answer." |
panel |
Parallel panel, returns raw manifest (no synth). | You want to synthesise yourself. |
refine |
Iterative consortium with arbiter scoring (≤3 rounds). | High-stakes; disagreement-heavy. |
sequence |
Chained multi-step where step N depends on N-1. | Decompose-then-answer; plan-then-execute. |
synthesise |
Re-collapse an existing run via a flagship model. | Different rubric/synthesiser on a prior run_id. |
Tool descriptions are intentionally written as prompts for the calling agent (verb-first, explicit "use when…/don't use for…") so the agent reliably picks the right one without you having to spell it out.
Tiers & cost
Aliases are <family>-<tier> — version-neutral. The registry maps each alias
to the current best model; the resolved LiteLLM ID is captured per run in
registry_snapshot.json for reproducibility.
| Tier | Models | Typical run cost | Use |
|---|---|---|---|
nano (3) |
claude-haiku, gemini-flash, gpt-nano | < $0.01 | smoke tests / trivia |
quick (5) |
claude-haiku, gemini-pro, grok, qwen-max, kimi | ~$0.05 | snap second opinions |
standard (10) |
opus, sonnet, gpt-pro, gpt, gemini-pro, grok, qwen-max, kimi, glm, llama | $0.30–0.60 | normal decisions |
wide (10) |
as standard, openrouter-routed where possible | $0.20–0.50 | maximum diversity |
deep (14) |
standard + mistral, deepseek, mimo, sonar-pro | $0.50–1.00 | high-stakes, includes web search |
code (5) |
opus, gpt-codex, gpt-mini, gemini-pro, deepseek | $0.20–0.40 | code-heavy questions |
review (6) |
opus, gpt-codex, gpt-pro, gemini-pro, deepseek, grok | $0.30–0.60 | PR / code review |
A per-run cap (max_run_usd, default $5.00) refuses panels whose estimated
cost exceeds the limit before any provider is called.
The manifest capsule
Each panellist returns a ~200-token structured extract (decision shape shown
below; review and research kinds also supported):
{
"slug": "claude-opus-1",
"model_id": "anthropic/claude-opus-4-7",
"status": "OK",
"capsule": {
"kind": "decision",
"position": "supports B with caveats",
"recommendation": "Use B with fallback to A",
"key_points": ["…"],
"unique_claims": ["Only model to flag cold-start regression"],
"caveats": ["Assumes >100 RPS steady-state"],
"confidence": 0.85
},
"resource_uri": "consult://runs/abc/responses/claude-opus-1",
"latency_ms": 3420,
"cost_usd": 0.04
}
Your agent can synthesise from this alone in most cases. Read the full body via the resource URI only when depth is needed.
Quickstart
After installing, from any connected agent:
> consult: prompt="Polars vs DuckDB for a 10GB Parquet timeseries?", tier="code"
Returns {run_id, synthesis, manifest, cost_usd, synthesiser}. The synthesis
is markdown, ready to drop into your conversation.
For iterative consensus:
> refine:
prompt="Should we migrate from REST to gRPC for the internal mesh?",
models=[{model:"claude-opus"},{model:"gpt-pro"},{model:"gemini-pro"},{model:"deepseek"}],
threshold=0.85
For chained reasoning:
> sequence:
prompts=[
"Decompose 'how should we scale our event pipeline?' into 4 sub-questions",
"Answer sub-question 1: throughput requirements",
"Answer sub-question 2: ordering guarantees",
"Synthesise the final recommendation across the prior steps"
],
models=[{model:"claude-opus"},{model:"gpt-pro"}]
End-to-end walkthrough
A full tour. Assumes the install above and at least one provider key in
.env.
1. Smoke-test the install (no API spend)
.venv/bin/python -c "
import asyncio
from consult import panel, ModelSpec
async def go():
h = await panel('hello', [ModelSpec(model='claude-haiku')], dry_run=True)
print('partial:', h.partial, '| reason:', h.partial_reason)
asyncio.run(go())
"
# partial: True | reason: dry_run: estimated cost $0.0001
2. First real consult (~$0.20 on the code tier)
> consult: prompt="Polars vs DuckDB for 10GB Parquet timeseries?", tier=code
3. Inspect a panellist's full body
> read resource: consult://runs/<run_id>/responses/claude-opus-1
4. Tail progress in real time
tail -f ~/.consult/runs/<run_id>/_progress.log
Agents that send a progressToken get the same events as
notifications/progress.
5. Follow-up via continuation_id
> refine: prompt="OK now what about Iceberg vs Delta on top of that?",
continuation_id="<prior run_id>",
models=[{model:"claude-opus"},{model:"deepseek"}]
The prior run's synthesis is prepended as "Prior consultation summary".
6. Stochastic averaging with model:N
> panel: models=[{model:"claude-haiku:3"},{model:"gpt-mini:3"}], prompt="…"
Six panellists total — three runs each of two cheap models.
7. Check today's spend
consult-ledger today
# {"date":"2026-05-21","total_usd":2.36,"total_known":false,"runs":[…]}
total_known: false means at least one panellist had pricing missing from
the LiteLLM table.
8. View a run as a rich HTML page
consult-view <run_id> # writes ~/.consult/runs/<run_id>/feed.html
consult-view <run_id> --open # also opens in default browser
Self-contained HTML — header pills, prompt, synthesis (markdown), per-round
arbiter verdicts (refine), per-panellist cards with capsule + full body, and
a chronological timeline from _progress.log. No external assets, no JS.
Driving the engine without MCP
The engine package (consult.*) is MCP-free and reusable as a library:
from consult import consult, panel, refine, ModelSpec
# Hero tool
result = await consult("question?", tier="standard")
print(result.synthesis, result.cost_usd)
# Lower-level
handle = await panel("question?", [ModelSpec(model="claude-opus"), ModelSpec(model="gpt-pro")])
# Iterative
verdict = await refine(
"tough decision?",
[ModelSpec(model="claude-opus"), ModelSpec(model="deepseek")],
threshold=0.85,
)
Swap the URI scheme for a non-MCP transport:
from consult import artifacts
artifacts.set_resource_uri_formatter(
lambda run_id, slug: f"https://api.example.com/runs/{run_id}/{slug}"
)
Security
Read SECURITY.md for the full threat model. Short version:
- File attachments and
git_diffmust resolve underCONSULT_TRUSTED_REPO_ROOTS(defaults to CWD). Symlinks resolved withstrict=True; escape attempts fail closed. - Run artefacts are
chmod 0o700— per-run prompts (often containing pasted credentials or code) are not world-readable on shared hosts. git diffruns with global/system git config neutralised so a malicious.gitattributesfilter can't execute.- LiteLLM exception strings are scrubbed for
sk-…,AIza…,Bearer …,x-api-key:and similar before anything hits disk or the manifest.
Privacy note
The model registry tags each entry with a privacy_tier:
first_party— direct API to Anthropic / OpenAI / Google.aggregator— routed via OpenRouter (Grok, Kimi, Qwen, DeepSeek, Llama, Mistral, GLM, MiMo, Sonar-Pro).
Mixing tiers in one panel broadcasts the same prompt to providers with
different data-retention policies. For prompts containing sensitive
material, prefer tier="standard" (mostly first-party) over tier="wide"
or tier="deep" (heavily aggregator-routed).
Repo layout
consult/ # ENGINE — no mcp.* imports
runner.py # async fanout + LiteLLM + progress log
capsule.py # post-fanout structured extraction
synth.py # flagship synthesiser
refine.py # arbiter-driven loop (max 3 rounds) + continuation
sequence.py # chained multi-step
orchestrate.py # consult() hero
ledger.py # daily cost ledger (consult-ledger)
viewer.py # static HTML run renderer (consult-view)
doctor.py # diagnostic CLI (consult-doctor)
registry.py # models.json + stances.json loader
artifacts.py # ~/.consult/runs/<id>/ layout + URI formatter
attachments.py # file/diff inlining + trusted-roots enforcement
sources.py # git_diff resolver (hardened subprocess)
context.py # per-run bundle + blinding
progress.py # typed ProgressEvent union
status.py # LiteLLM response → Status
types.py # Pydantic models (StrictModel base)
mcp/ # MCP ADAPTER — only thing that imports mcp.*
server.py, handlers.py, schemas.py, errors.py, __main__.py
config/
models.json # registry with privacy_tier annotations
stances.json # persona prompts
tests/ # pytest (offline + live, gated on keys)
.github/workflows/ # CI: ruff + pytest on Py 3.11/3.12/3.13
FRICTION.md # internal dogfooding log (kept for transparency)
SECURITY.md # threat model + disclosure path
CONTRIBUTING.md # dev setup + style
Contributing
See CONTRIBUTING.md. Issues and PRs welcome; please open
an issue first for non-trivial changes so we can agree on shape.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file consult_mcp_server-0.2.0.tar.gz.
File metadata
- Download URL: consult_mcp_server-0.2.0.tar.gz
- Upload date:
- Size: 402.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b202439f6f1a2cae65bdf0cc5d7bbba823fc68aa007b612e23eae829be6f2be
|
|
| MD5 |
1ba9804b9ea935891ea17242b0602d5e
|
|
| BLAKE2b-256 |
be184e8b52f761b0757d0c7a92befd0a530bdd142f08272290144f4110a256a6
|
File details
Details for the file consult_mcp_server-0.2.0-py3-none-any.whl.
File metadata
- Download URL: consult_mcp_server-0.2.0-py3-none-any.whl
- Upload date:
- Size: 180.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6154ebb839d5589ed2199a7f6d121e3c4638ce0a1a9e0cc480005aea65f5afa
|
|
| MD5 |
dbba5c97f9acdf2aaaf511a3d24855e6
|
|
| BLAKE2b-256 |
ba827cd88d13f17d9ceb4f50c36db9cefec7e109b7b2af758d1e64636402b9cc
|