Multi-model panel orchestration engine with an MCP adapter. The engine (consult.*) is MCP-free and usable as a library; the MCP adapter (consult.mcp.*) is an optional extra.

These details have not been verified by PyPI

Project links

Project description

consult-mcp-server

Get a second opinion from a parallel panel of LLMs — without bloating your agent's context window.

consult is an MCP server that lets your agent (Claude Desktop, Cursor, Claude Code, etc.) fan a single prompt out to many LLMs in parallel, then return either the synthesised answer or a manifest of structured ~200-token capsules — so panel breadth doesn't cost parent-context tokens.

┌────────────┐    consult tool call     ┌──────────────────┐    parallel    ┌──────────┐
│ Your agent │ ───────────────────────▶ │  consult-mcp     │ ─────────────▶ │ Claude   │
│ (Claude    │   "what's your take?"    │  (this server)   │                │ GPT      │
│  Desktop / │ ◀─────────────────────── │                  │ ◀───────────── │ Gemini   │
│  Cursor /  │  synthesis + manifest    │  capsules ~200t  │   capsules     │ Grok     │
│  …)        │                          │  + resources     │                │ DeepSeek │
└────────────┘                          └──────────────────┘                │ …        │
                                                                            └──────────┘

Why this exists

If your agent already calls claude once, you might wonder why you'd want to ask 8 more models the same question. Three reasons:

One pass, many perspectives. Different families catch different things. Anthropic finds different bugs than OpenAI; Gemini calls out different risks; DeepSeek often surfaces the contrarian take.
Cheap structured second opinion. The manifest's per-panellist capsule is ~200 tokens — your agent can synthesise it in-band without paying for another flagship round-trip.
No context-window bloat. Full panellist bodies live as MCP resources at consult://runs/<id>/responses/<slug>; your agent only fetches them when it needs depth.

Alternatives fall short: PAL consensus serialises calls (sum of latencies); multi_mcp parallelises but no escape hatch from server-side synth; skill-only fan-outs assembled by the LLM via bash are brittle (token traps, key handling, endpoint drift).

Install

Claude Desktop

Claude Desktop does not inherit your shell's PATH or environment variables — you must give it the absolute path to consult-mcp and declare API keys inside the env block.

Tip: run consult-doctor --config after install to print a ready-to-paste JSON block populated with the absolute binary path and whichever keys are present in your shell environment.

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "consult": {
      "command": "/Users/you/.local/bin/uvx",
      "args": ["--from", "consult-mcp-server[mcp]", "consult-mcp"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-…",
        "OPENAI_API_KEY": "sk-…",
        "GEMINI_API_KEY": "AIza…",
        "OPENROUTER_API_KEY": "sk-or-…"
      }
    }
  }
}

Restart Claude Desktop, then ask: "use the consult tool to ask 3 models which Python package manager I should use."

Cursor

Edit ~/.cursor/mcp.json:

{
  "mcpServers": {
    "consult": {
      "command": "/Users/you/.local/bin/uvx",
      "args": ["--from", "consult-mcp-server[mcp]", "consult-mcp"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-…",
        "OPENAI_API_KEY": "sk-…"
      }
    }
  }
}

Same caveat as Claude Desktop: absolute path to uvx, env keys in the block.

Claude Code CLI

claude mcp add consult -- uvx --from "consult-mcp-server[mcp]" consult-mcp

The CLI inherits your shell env, so the keys you already have in .env / your shell rc will be visible.

Docker

docker run -i --rm \
  -e ANTHROPIC_API_KEY -e OPENAI_API_KEY -e GEMINI_API_KEY -e OPENROUTER_API_KEY \
  -v ~/.consult:/root/.consult \
  ghcr.io/irwin-r/consult-mcp-server:latest

Stdio in / stdio out, just like the local binary. Image published per release to GHCR (multi-stage Python 3.12-slim base, ~150MB).

Smithery

https://smithery.ai/server/consult-mcp-server

Smithery's hosted UI prompts for keys; the same smithery.yaml config-schema applies.

From source (development)

git clone https://github.com/irwin-r/consult-mcp-server
cd consult-mcp-server
uv venv
uv pip install -e ".[dev]"
cp .env.example .env   # fill in keys
uv run pytest -v

Verify the install

consult-doctor          # offline: config + paths + key presence
consult-doctor --ping   # also fires a 1-token call per provider (~$0.0001)
consult-doctor --config # print copy-paste-ready MCP client JSON

The five tools

Tool	What it does	Use when
`consult`	Parallel panel + server-side synthesis. Hero.	"Just give me the answer."
`panel`	Parallel panel, returns raw manifest (no synth).	You want to synthesise yourself.
`refine`	Iterative consortium with arbiter scoring (≤3 rounds).	High-stakes; disagreement-heavy.
`sequence`	Chained multi-step where step N depends on N-1.	Decompose-then-answer; plan-then-execute.
`synthesise`	Re-collapse an existing run via a flagship model.	Different rubric/synthesiser on a prior `run_id`.

Tool descriptions are intentionally written as prompts for the calling agent (verb-first, explicit "use when…/don't use for…") so the agent reliably picks the right one without you having to spell it out.

Tiers & cost

Aliases are <family>-<tier> — version-neutral. The registry maps each alias to the current best model; the resolved LiteLLM ID is captured per run in registry_snapshot.json for reproducibility.

Tier	Models	Typical run cost	Use
`nano` (3)	claude-haiku, gemini-flash, gpt-nano	< $0.01	smoke tests / trivia
`quick` (5)	claude-haiku, gemini-pro, grok, qwen-max, kimi	~$0.05	snap second opinions
`standard` (10)	opus, sonnet, gpt-pro, gpt, gemini-pro, grok, qwen-max, kimi, glm, llama	$0.30–0.60	normal decisions
`wide` (10)	as standard, openrouter-routed where possible	$0.20–0.50	maximum diversity
`deep` (14)	standard + mistral, deepseek, mimo, sonar-pro	$0.50–1.00	high-stakes, includes web search
`code` (5)	opus, gpt-codex, gpt-mini, gemini-pro, deepseek	$0.20–0.40	code-heavy questions
`review` (6)	opus, gpt-codex, gpt-pro, gemini-pro, deepseek, grok	$0.30–0.60	PR / code review

A per-run cap (max_run_usd, default $5.00) refuses panels whose estimated cost exceeds the limit before any provider is called.

The manifest capsule

Each panellist returns a ~200-token structured extract (decision shape shown below; review and research kinds also supported):

{
  "slug": "claude-opus-1",
  "model_id": "anthropic/claude-opus-4-7",
  "status": "OK",
  "capsule": {
    "kind": "decision",
    "position": "supports B with caveats",
    "recommendation": "Use B with fallback to A",
    "key_points": ["…"],
    "unique_claims": ["Only model to flag cold-start regression"],
    "caveats": ["Assumes >100 RPS steady-state"],
    "confidence": 0.85
  },
  "resource_uri": "consult://runs/abc/responses/claude-opus-1",
  "latency_ms": 3420,
  "cost_usd": 0.04
}

Your agent can synthesise from this alone in most cases. Read the full body via the resource URI only when depth is needed.

Quickstart

After installing, from any connected agent:

> consult: prompt="Polars vs DuckDB for a 10GB Parquet timeseries?", tier="code"

Returns {run_id, synthesis, manifest, cost_usd, synthesiser}. The synthesis is markdown, ready to drop into your conversation.

For iterative consensus:

> refine:
    prompt="Should we migrate from REST to gRPC for the internal mesh?",
    models=[{model:"claude-opus"},{model:"gpt-pro"},{model:"gemini-pro"},{model:"deepseek"}],
    threshold=0.85

For chained reasoning:

> sequence:
    prompts=[
      "Decompose 'how should we scale our event pipeline?' into 4 sub-questions",
      "Answer sub-question 1: throughput requirements",
      "Answer sub-question 2: ordering guarantees",
      "Synthesise the final recommendation across the prior steps"
    ],
    models=[{model:"claude-opus"},{model:"gpt-pro"}]

End-to-end walkthrough

A full tour. Assumes the install above and at least one provider key in .env.

1. Smoke-test the install (no API spend)

.venv/bin/python -c "
import asyncio
from consult import panel, ModelSpec
async def go():
    h = await panel('hello', [ModelSpec(model='claude-haiku')], dry_run=True)
    print('partial:', h.partial, '| reason:', h.partial_reason)
asyncio.run(go())
"
# partial: True | reason: dry_run: estimated cost $0.0001

2. First real consult (~$0.20 on the `code` tier)

> consult: prompt="Polars vs DuckDB for 10GB Parquet timeseries?", tier=code

3. Inspect a panellist's full body

> read resource: consult://runs/<run_id>/responses/claude-opus-1

4. Tail progress in real time

tail -f ~/.consult/runs/<run_id>/_progress.log

Agents that send a progressToken get the same events as notifications/progress.

5. Follow-up via `continuation_id`

> refine: prompt="OK now what about Iceberg vs Delta on top of that?",
          continuation_id="<prior run_id>",
          models=[{model:"claude-opus"},{model:"deepseek"}]

The prior run's synthesis is prepended as "Prior consultation summary".

6. Stochastic averaging with `model:N`

> panel: models=[{model:"claude-haiku:3"},{model:"gpt-mini:3"}], prompt="…"

Six panellists total — three runs each of two cheap models.

7. Check today's spend

consult-ledger today
# {"date":"2026-05-21","total_usd":2.36,"total_known":false,"runs":[…]}

total_known: false means at least one panellist had pricing missing from the LiteLLM table.

8. View a run as a rich HTML page

consult-view <run_id>          # writes ~/.consult/runs/<run_id>/feed.html
consult-view <run_id> --open   # also opens in default browser

Self-contained HTML — header pills, prompt, synthesis (markdown), per-round arbiter verdicts (refine), per-panellist cards with capsule + full body, and a chronological timeline from _progress.log. No external assets, no JS.

Driving the engine without MCP

The engine package (consult.*) is MCP-free and reusable as a library:

from consult import consult, panel, refine, ModelSpec

# Hero tool
result = await consult("question?", tier="standard")
print(result.synthesis, result.cost_usd)

# Lower-level
handle = await panel("question?", [ModelSpec(model="claude-opus"), ModelSpec(model="gpt-pro")])

# Iterative
verdict = await refine(
    "tough decision?",
    [ModelSpec(model="claude-opus"), ModelSpec(model="deepseek")],
    threshold=0.85,
)

Swap the URI scheme for a non-MCP transport:

from consult import artifacts
artifacts.set_resource_uri_formatter(
    lambda run_id, slug: f"https://api.example.com/runs/{run_id}/{slug}"
)

Security

Read SECURITY.md for the full threat model. Short version:

File attachments and git_diff must resolve under CONSULT_TRUSTED_REPO_ROOTS (defaults to CWD). Symlinks resolved with strict=True; escape attempts fail closed.
Run artefacts are chmod 0o700 — per-run prompts (often containing pasted credentials or code) are not world-readable on shared hosts.
git diff runs with global/system git config neutralised so a malicious .gitattributes filter can't execute.
LiteLLM exception strings are scrubbed for sk-…, AIza…, Bearer …, x-api-key: and similar before anything hits disk or the manifest.

Privacy note

The model registry tags each entry with a privacy_tier:

first_party — direct API to Anthropic / OpenAI / Google.
aggregator — routed via OpenRouter (Grok, Kimi, Qwen, DeepSeek, Llama, Mistral, GLM, MiMo, Sonar-Pro).

Mixing tiers in one panel broadcasts the same prompt to providers with different data-retention policies. For prompts containing sensitive material, prefer tier="standard" (mostly first-party) over tier="wide" or tier="deep" (heavily aggregator-routed).

Repo layout

consult/                # ENGINE — no mcp.* imports
  runner.py             # async fanout + LiteLLM + progress log
  capsule.py            # post-fanout structured extraction
  synth.py              # flagship synthesiser
  refine.py             # arbiter-driven loop (max 3 rounds) + continuation
  sequence.py           # chained multi-step
  orchestrate.py        # consult() hero
  ledger.py             # daily cost ledger (consult-ledger)
  viewer.py             # static HTML run renderer (consult-view)
  doctor.py             # diagnostic CLI (consult-doctor)
  registry.py           # models.json + stances.json loader
  artifacts.py          # ~/.consult/runs/<id>/ layout + URI formatter
  attachments.py        # file/diff inlining + trusted-roots enforcement
  sources.py            # git_diff resolver (hardened subprocess)
  context.py            # per-run bundle + blinding
  progress.py           # typed ProgressEvent union
  status.py             # LiteLLM response → Status
  types.py              # Pydantic models (StrictModel base)
  mcp/                  # MCP ADAPTER — only thing that imports mcp.*
    server.py, handlers.py, schemas.py, errors.py, __main__.py
  config/
    models.json         # registry with privacy_tier annotations
    stances.json        # persona prompts
tests/                  # pytest (offline + live, gated on keys)
.github/workflows/      # CI: ruff + pytest on Py 3.11/3.12/3.13
FRICTION.md             # internal dogfooding log (kept for transparency)
SECURITY.md             # threat model + disclosure path
CONTRIBUTING.md         # dev setup + style

Contributing

See CONTRIBUTING.md. Issues and PRs welcome; please open an issue first for non-trivial changes so we can agree on shape.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

consult_mcp_server-0.2.0.tar.gz (402.4 kB view details)

Uploaded Jun 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

consult_mcp_server-0.2.0-py3-none-any.whl (180.7 kB view details)

Uploaded Jun 1, 2026 Python 3

File details

Details for the file consult_mcp_server-0.2.0.tar.gz.

File metadata

Download URL: consult_mcp_server-0.2.0.tar.gz
Upload date: Jun 1, 2026
Size: 402.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for consult_mcp_server-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`2b202439f6f1a2cae65bdf0cc5d7bbba823fc68aa007b612e23eae829be6f2be`
MD5	`1ba9804b9ea935891ea17242b0602d5e`
BLAKE2b-256	`be184e8b52f761b0757d0c7a92befd0a530bdd142f08272290144f4110a256a6`

See more details on using hashes here.

File details

Details for the file consult_mcp_server-0.2.0-py3-none-any.whl.

File metadata

Download URL: consult_mcp_server-0.2.0-py3-none-any.whl
Upload date: Jun 1, 2026
Size: 180.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for consult_mcp_server-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e6154ebb839d5589ed2199a7f6d121e3c4638ce0a1a9e0cc480005aea65f5afa`
MD5	`dbba5c97f9acdf2aaaf511a3d24855e6`
BLAKE2b-256	`ba827cd88d13f17d9ceb4f50c36db9cefec7e109b7b2af758d1e64636402b9cc`

See more details on using hashes here.

consult-mcp-server 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

consult-mcp-server

Why this exists

Install

Claude Desktop

Cursor

Claude Code CLI

Docker

Smithery

From source (development)

Verify the install

The five tools

Tiers & cost

The manifest capsule

Quickstart

End-to-end walkthrough

1. Smoke-test the install (no API spend)

2. First real consult (~$0.20 on the code tier)

3. Inspect a panellist's full body

4. Tail progress in real time

5. Follow-up via continuation_id

6. Stochastic averaging with model:N

7. Check today's spend

8. View a run as a rich HTML page

Driving the engine without MCP

Security

Privacy note

Repo layout

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2. First real consult (~$0.20 on the `code` tier)

5. Follow-up via `continuation_id`

6. Stochastic averaging with `model:N`