Skip to main content

Multi-model panel orchestration engine with an MCP adapter. The engine (consult.*) is MCP-free and usable as a library; the MCP adapter (consult.mcp.*) is an optional extra.

Project description

consult-mcp-server

Get a second opinion from a parallel panel of LLMs — without bloating your agent's context window.

CI PyPI Python License: MIT Smithery

consult is an MCP server that lets your agent (Claude Desktop, Cursor, Claude Code, etc.) fan a single prompt out to many LLMs in parallel, then return either the synthesised answer or a manifest of structured ~200-token capsules — so panel breadth doesn't cost parent-context tokens.

┌────────────┐    consult tool call     ┌──────────────────┐    parallel    ┌──────────┐
│ Your agent │ ───────────────────────▶ │  consult-mcp     │ ─────────────▶ │ Claude   │
│ (Claude    │   "what's your take?"    │  (this server)   │                │ GPT      │
│  Desktop / │ ◀─────────────────────── │                  │ ◀───────────── │ Gemini   │
│  Cursor /  │  synthesis + manifest    │  capsules ~200t  │   capsules     │ Grok     │
│  …)        │                          │  + resources     │                │ DeepSeek │
└────────────┘                          └──────────────────┘                │ …        │
                                                                            └──────────┘

Why this exists

If your agent already calls claude once, you might wonder why you'd want to ask 8 more models the same question. Three reasons:

  1. One pass, many perspectives. Different families catch different things. Anthropic finds different bugs than OpenAI; Gemini calls out different risks; DeepSeek often surfaces the contrarian take.
  2. Cheap structured second opinion. The manifest's per-panellist capsule is ~200 tokens — your agent can synthesise it in-band without paying for another flagship round-trip.
  3. No context-window bloat. Full panellist bodies live as MCP resources at consult://runs/<id>/responses/<slug>; your agent only fetches them when it needs depth.

Alternatives fall short: PAL consensus serialises calls (sum of latencies); multi_mcp parallelises but no escape hatch from server-side synth; skill-only fan-outs assembled by the LLM via bash are brittle (token traps, key handling, endpoint drift).


Install

Claude Desktop

Claude Desktop does not inherit your shell's PATH or environment variables — you must give it the absolute path to consult-mcp and declare API keys inside the env block.

Tip: run consult-doctor --config after install to print a ready-to-paste JSON block populated with the absolute binary path and whichever keys are present in your shell environment.

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "consult": {
      "command": "/Users/you/.local/bin/uvx",
      "args": ["--from", "consult-mcp-server[mcp]", "consult-mcp"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-…",
        "OPENAI_API_KEY": "sk-…",
        "GEMINI_API_KEY": "AIza…",
        "OPENROUTER_API_KEY": "sk-or-…"
      }
    }
  }
}

Restart Claude Desktop, then ask: "use the consult tool to ask 3 models which Python package manager I should use."

Cursor

Edit ~/.cursor/mcp.json:

{
  "mcpServers": {
    "consult": {
      "command": "/Users/you/.local/bin/uvx",
      "args": ["--from", "consult-mcp-server[mcp]", "consult-mcp"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-…",
        "OPENAI_API_KEY": "sk-…"
      }
    }
  }
}

Same caveat as Claude Desktop: absolute path to uvx, env keys in the block.

Claude Code CLI

claude mcp add consult -- uvx --from "consult-mcp-server[mcp]" consult-mcp

The CLI inherits your shell env, so the keys you already have in .env / your shell rc will be visible.

Docker

docker run -i --rm \
  -e ANTHROPIC_API_KEY -e OPENAI_API_KEY -e GEMINI_API_KEY -e OPENROUTER_API_KEY \
  -v ~/.consult:/root/.consult \
  ghcr.io/irwin-r/consult-mcp-server:latest

Stdio in / stdio out, just like the local binary. Image published per release to GHCR (multi-stage Python 3.12-slim base, ~150MB).

Smithery

https://smithery.ai/server/consult-mcp-server

Smithery's hosted UI prompts for keys; the same smithery.yaml config-schema applies.

From source (development)

git clone https://github.com/irwin-r/consult-mcp-server
cd consult-mcp-server
uv venv
uv pip install -e ".[dev]"
cp .env.example .env   # fill in keys
uv run pytest -v

Verify the install

consult-doctor          # offline: config + paths + key presence
consult-doctor --ping   # also fires a 1-token call per provider (~$0.0001)
consult-doctor --config # print copy-paste-ready MCP client JSON

The five tools

Tool What it does Use when
consult Parallel panel + server-side synthesis. Hero. "Just give me the answer."
panel Parallel panel, returns raw manifest (no synth). You want to synthesise yourself.
refine Iterative consortium with arbiter scoring (≤3 rounds). High-stakes; disagreement-heavy.
sequence Chained multi-step where step N depends on N-1. Decompose-then-answer; plan-then-execute.
synthesise Re-collapse an existing run via a flagship model. Different rubric/synthesiser on a prior run_id.

Tool descriptions are intentionally written as prompts for the calling agent (verb-first, explicit "use when…/don't use for…") so the agent reliably picks the right one without you having to spell it out.


Tiers & cost

Aliases are <family>-<tier> — version-neutral. The registry maps each alias to the current best model; the resolved LiteLLM ID is captured per run in registry_snapshot.json for reproducibility.

Tier Models Typical run cost Use
nano (3) claude-haiku, gemini-flash, gpt-nano < $0.01 smoke tests / trivia
quick (5) claude-haiku, gemini-pro, grok, qwen-max, kimi ~$0.05 snap second opinions
standard (10) opus, sonnet, gpt-pro, gpt, gemini-pro, grok, qwen-max, kimi, glm, llama $0.30–0.60 normal decisions
wide (10) as standard, openrouter-routed where possible $0.20–0.50 maximum diversity
deep (14) standard + mistral, deepseek, mimo, sonar-pro $0.50–1.00 high-stakes, includes web search
code (5) opus, gpt-codex, gpt-mini, gemini-pro, deepseek $0.20–0.40 code-heavy questions
review (6) opus, gpt-codex, gpt-pro, gemini-pro, deepseek, grok $0.30–0.60 PR / code review

A per-run cap (max_run_usd, default $5.00) refuses panels whose estimated cost exceeds the limit before any provider is called.


The manifest capsule

Each panellist returns a ~200-token structured extract (decision shape shown below; review and research kinds also supported):

{
  "slug": "claude-opus-1",
  "model_id": "anthropic/claude-opus-4-7",
  "status": "OK",
  "capsule": {
    "kind": "decision",
    "position": "supports B with caveats",
    "recommendation": "Use B with fallback to A",
    "key_points": ["…"],
    "unique_claims": ["Only model to flag cold-start regression"],
    "caveats": ["Assumes >100 RPS steady-state"],
    "confidence": 0.85
  },
  "resource_uri": "consult://runs/abc/responses/claude-opus-1",
  "latency_ms": 3420,
  "cost_usd": 0.04
}

Your agent can synthesise from this alone in most cases. Read the full body via the resource URI only when depth is needed.


Quickstart

After installing, from any connected agent:

> consult: prompt="Polars vs DuckDB for a 10GB Parquet timeseries?", tier="code"

Returns {run_id, synthesis, manifest, cost_usd, synthesiser}. The synthesis is markdown, ready to drop into your conversation.

For iterative consensus:

> refine:
    prompt="Should we migrate from REST to gRPC for the internal mesh?",
    models=[{model:"claude-opus"},{model:"gpt-pro"},{model:"gemini-pro"},{model:"deepseek"}],
    threshold=0.85

For chained reasoning:

> sequence:
    prompts=[
      "Decompose 'how should we scale our event pipeline?' into 4 sub-questions",
      "Answer sub-question 1: throughput requirements",
      "Answer sub-question 2: ordering guarantees",
      "Synthesise the final recommendation across the prior steps"
    ],
    models=[{model:"claude-opus"},{model:"gpt-pro"}]

End-to-end walkthrough

A full tour. Assumes the install above and at least one provider key in .env.

1. Smoke-test the install (no API spend)

.venv/bin/python -c "
import asyncio
from consult import panel, ModelSpec
async def go():
    h = await panel('hello', [ModelSpec(model='claude-haiku')], dry_run=True)
    print('partial:', h.partial, '| reason:', h.partial_reason)
asyncio.run(go())
"
# partial: True | reason: dry_run: estimated cost $0.0001

2. First real consult (~$0.20 on the code tier)

> consult: prompt="Polars vs DuckDB for 10GB Parquet timeseries?", tier=code

3. Inspect a panellist's full body

> read resource: consult://runs/<run_id>/responses/claude-opus-1

4. Tail progress in real time

tail -f ~/.consult/runs/<run_id>/_progress.log

Agents that send a progressToken get the same events as notifications/progress.

5. Follow-up via continuation_id

> refine: prompt="OK now what about Iceberg vs Delta on top of that?",
          continuation_id="<prior run_id>",
          models=[{model:"claude-opus"},{model:"deepseek"}]

The prior run's synthesis is prepended as "Prior consultation summary".

6. Stochastic averaging with model:N

> panel: models=[{model:"claude-haiku:3"},{model:"gpt-mini:3"}], prompt="…"

Six panellists total — three runs each of two cheap models.

7. Check today's spend

consult-ledger today
# {"date":"2026-05-21","total_usd":2.36,"total_known":false,"runs":[…]}

total_known: false means at least one panellist had pricing missing from the LiteLLM table.

8. View a run as a rich HTML page

consult-view <run_id>          # writes ~/.consult/runs/<run_id>/feed.html
consult-view <run_id> --open   # also opens in default browser

Self-contained HTML — header pills, prompt, synthesis (markdown), per-round arbiter verdicts (refine), per-panellist cards with capsule + full body, and a chronological timeline from _progress.log. No external assets, no JS.


Driving the engine without MCP

The engine package (consult.*) is MCP-free and reusable as a library:

from consult import consult, panel, refine, ModelSpec

# Hero tool
result = await consult("question?", tier="standard")
print(result.synthesis, result.cost_usd)

# Lower-level
handle = await panel("question?", [ModelSpec(model="claude-opus"), ModelSpec(model="gpt-pro")])

# Iterative
verdict = await refine(
    "tough decision?",
    [ModelSpec(model="claude-opus"), ModelSpec(model="deepseek")],
    threshold=0.85,
)

Swap the URI scheme for a non-MCP transport:

from consult import artifacts
artifacts.set_resource_uri_formatter(
    lambda run_id, slug: f"https://api.example.com/runs/{run_id}/{slug}"
)

Security

Read SECURITY.md for the full threat model. Short version:

  • File attachments and git_diff must resolve under CONSULT_TRUSTED_REPO_ROOTS (defaults to CWD). Symlinks resolved with strict=True; escape attempts fail closed.
  • Run artefacts are chmod 0o700 — per-run prompts (often containing pasted credentials or code) are not world-readable on shared hosts.
  • git diff runs with global/system git config neutralised so a malicious .gitattributes filter can't execute.
  • LiteLLM exception strings are scrubbed for sk-…, AIza…, Bearer …, x-api-key: and similar before anything hits disk or the manifest.

Privacy note

The model registry tags each entry with a privacy_tier:

  • first_party — direct API to Anthropic / OpenAI / Google.
  • aggregator — routed via OpenRouter (Grok, Kimi, Qwen, DeepSeek, Llama, Mistral, GLM, MiMo, Sonar-Pro).

Mixing tiers in one panel broadcasts the same prompt to providers with different data-retention policies. For prompts containing sensitive material, prefer tier="standard" (mostly first-party) over tier="wide" or tier="deep" (heavily aggregator-routed).


Repo layout

consult/                # ENGINE — no mcp.* imports
  runner.py             # async fanout + LiteLLM + progress log
  capsule.py            # post-fanout structured extraction
  synth.py              # flagship synthesiser
  refine.py             # arbiter-driven loop (max 3 rounds) + continuation
  sequence.py           # chained multi-step
  orchestrate.py        # consult() hero
  ledger.py             # daily cost ledger (consult-ledger)
  viewer.py             # static HTML run renderer (consult-view)
  doctor.py             # diagnostic CLI (consult-doctor)
  registry.py           # models.json + stances.json loader
  artifacts.py          # ~/.consult/runs/<id>/ layout + URI formatter
  attachments.py        # file/diff inlining + trusted-roots enforcement
  sources.py            # git_diff resolver (hardened subprocess)
  context.py            # per-run bundle + blinding
  progress.py           # typed ProgressEvent union
  status.py             # LiteLLM response → Status
  types.py              # Pydantic models (StrictModel base)
  mcp/                  # MCP ADAPTER — only thing that imports mcp.*
    server.py, handlers.py, schemas.py, errors.py, __main__.py
  config/
    models.json         # registry with privacy_tier annotations
    stances.json        # persona prompts
tests/                  # pytest (offline + live, gated on keys)
.github/workflows/      # CI: ruff + pytest on Py 3.11/3.12/3.13
FRICTION.md             # internal dogfooding log (kept for transparency)
SECURITY.md             # threat model + disclosure path
CONTRIBUTING.md         # dev setup + style

Contributing

See CONTRIBUTING.md. Issues and PRs welcome; please open an issue first for non-trivial changes so we can agree on shape.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

consult_mcp_server-0.2.0.tar.gz (402.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

consult_mcp_server-0.2.0-py3-none-any.whl (180.7 kB view details)

Uploaded Python 3

File details

Details for the file consult_mcp_server-0.2.0.tar.gz.

File metadata

  • Download URL: consult_mcp_server-0.2.0.tar.gz
  • Upload date:
  • Size: 402.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for consult_mcp_server-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2b202439f6f1a2cae65bdf0cc5d7bbba823fc68aa007b612e23eae829be6f2be
MD5 1ba9804b9ea935891ea17242b0602d5e
BLAKE2b-256 be184e8b52f761b0757d0c7a92befd0a530bdd142f08272290144f4110a256a6

See more details on using hashes here.

File details

Details for the file consult_mcp_server-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: consult_mcp_server-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 180.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for consult_mcp_server-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e6154ebb839d5589ed2199a7f6d121e3c4638ce0a1a9e0cc480005aea65f5afa
MD5 dbba5c97f9acdf2aaaf511a3d24855e6
BLAKE2b-256 ba827cd88d13f17d9ceb4f50c36db9cefec7e109b7b2af758d1e64636402b9cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page