Skip to main content

An offline Bayesian confidence gate for MCP tool calls — PROCEED / FLAG / BLOCK with calibrated, per-model confidence. Pure stdlib, BYO-LLM.

Project description

bayesian-cage

An offline Bayesian confidence gate for MCP tool calls — and any LLM output. It scores how much to trust an output and gates it PROCEED / FLAG / BLOCK with a calibrated, per-model confidence that sharpens over time from local memory. Pure stdlib. Bring your own LLM. Your data never leaves the machine.

Install

Requires Python 3.10+. Zero runtime dependencies.

# from PyPI
pip install bayesian-cage

# from source — use a venv; macOS's default python3 is often 3.9
python3.12 -m venv .venv && source .venv/bin/activate
pip install -e .

Verify your install (30 seconds, offline)

python -m bayesian_cage.eval.sqlbench.run --model stub

Runs a 55-task SQL calibration benchmark against a real SQLite database with a built-in stub model. No network, no LLM, no API keys. Prints a reliability table comparing the cage to raw model confidence; writes dataset.jsonl, results.json, and a reliability.svg to bench_out/. If this finishes, the package works end-to-end.

Quickstart (library)

from bayesian_cage import Kernel

k = Kernel(db_path="~/.bayescore/bayescore.db")
v = k.check("SELECT 1", {"expected": "select 1"}, model_id="phi3", task="sql")
print(v.decision, v.p)            # PROCEED 1.0
k.observe("phi3", "sql", correct=True, observation_id=v.observation_id)

Gate an MCP server

The cage is itself an MCP server. It speaks JSON-RPC over stdio to your host (Claude Desktop, Cursor, Claude Code, …) and spawns the real server as a subprocess via BAYESIAN_CAGE_DOWNSTREAM. It transparently forwards initialize / tools/list / tools/call; every tools/call result is graded by a Verifier and stamped with a _bayescore envelope. Under enforce, BLOCK verdicts withhold the result and return isError: true.

Install a stable binary

MCP-host integration needs bayesian-cage on a path the host can spawn without your shell PATH. pipx is the cleanest route:

brew install pipx && pipx ensurepath
pipx install bayesian-cage     # or:  pipx install .  from a clone
which bayesian-cage            # copy this absolute path

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) and add an mcpServers block. Use absolute paths — Claude Desktop spawns subprocesses without your shell's PATH:

{
  "mcpServers": {
    "filesystem-gated": {
      "command": "/Users/you/.local/bin/bayesian-cage",
      "env": {
        "BAYESIAN_CAGE_DOWNSTREAM": "/usr/local/bin/npx -y @modelcontextprotocol/server-filesystem /Users/you/Documents/Claude",
        "BAYESIAN_CAGE_MODE": "advisory",
        "BAYESIAN_CAGE_VERIFIER": "filesystem",
        "BAYESIAN_CAGE_MODEL": "fs-server"
      }
    }
  }
}

Quit and relaunch Claude Desktop (Cmd+Q, not just close the window). The filesystem-gated server appears in the tools picker; every tool result now carries a verdict. Same shape works for Cursor (~/.cursor/mcp.json) and any host that takes a stdio MCP server command.

One-command sanity check (no host required)

BAYESIAN_CAGE_DOWNSTREAM="npx -y @modelcontextprotocol/server-everything" \
BAYESIAN_CAGE_MODE=enforce bayesian-cage --selftest

Runs a canned initialize / tools/list / tools/call sequence against the configured downstream and prints the gated responses.

Knobs

  • BAYESIAN_CAGE_MODEadvisory (default; never blocks, just labels) · enforce (BLOCK halts) · iterate (retry instead of stop). Start advisory; switch to enforce once you trust the verifier.
  • BAYESIAN_CAGE_VERIFIERheuristic (default) · sql · json · filesystem · ensemble:heuristic+sql · your.pkg.module:CustomVerifier
  • BAYESIAN_CAGE_MODEL — the bucket beliefs are keyed by. Use one per downstream (fs-server, pg-server) — the LLM driving the host is not what's being scored, the server is.
  • BAYESIAN_CAGE_DB — SQLite belief store path (default ~/.bayescore/bayescore.db)

Closing the loop

Without outcome feedback, beliefs sit at the Beta(1,1) prior and p just equals the raw verifier signal — you get a useful day-one gate, but no learning. To sharpen calibration over time, feed real outcomes back from whatever ground truth you already have (test pass/fail, downstream success, manual labels):

bayesian-cage observe --model fs-server --task list_directory \
  --correct true --id <observation_id>

The observation_id is in the _bayescore envelope on each gated result. After a few dozen labeled outcomes the per-(model, task, signal-bin) Beta sharpens and the same raw signal earns a different p. bayesian-cage export --out labels.jsonl dumps the full observation log.

How it works

verify → calibrate → gate → observe. A pluggable Verifier scores the output; a per-(model, task, signal-bin) Beta belief in one local SQLite store calibrates that score into a probability; the gate decides by an explicit cost ratio; observed outcomes update the belief. Calibration is learned per model, so swapping the LLM recalibrates instead of conflating.

Verifiers

  • HeuristicVerifier — default, offline. Catches refusals, non-answers, error markers, placeholders, degenerate repetition, truncation, heavy hedging. Pass expected or reference for a grounded signal.
  • SqlVerifier — swallowed SQL errors, schema typos, broken-empty vs legitimately-empty results, all-NULL rows.
  • JsonVerifier — malformed/valid JSON and a required key list for structured outputs.
  • FilesystemVerifier — POSIX errno markers and access-denied envelopes block; file listings and plain content pass.
  • EnsembleVerifier — composes verifiers and gates on the strictest (minimum) signal.

Add your own by implementing Verifier.verify(output, context) -> VerifierResult.

Calibration benchmark — why not just ask the model?

Because a model's self-reported confidence is miscalibrated. Concretely, on a 55-task execution-graded text-to-SQL bench against phi3 (via Ollama, 5-fold, seed=7, accuracy 67.3%):

metric raw phi3 cage direction
ECE 0.325 0.081 lower better — 4× tighter
Brier 0.322 0.174 lower better — 46% lower
catch-rate 0.000 0.333 higher better
wrong-passed 18 12 lower better
over-block 0.000 0.000 lower better
AUROC 0.583 0.544 higher better

Phi3's raw confidence isn't discriminative — every one of the 55 answers came back at ~1.0, so AUROC stays near chance whether you ask the model or the cage. What the cage does is stop the model from lying to you about that confidence: calibration error drops 4×, and a third of phi3's wrong answers now get caught instead of passed through, at the cost of zero correct answers blocked.

Reproduce:

ollama pull phi3
python -m bayesian_cage.eval.sqlbench.run --model phi3 --seed 7 --out bench_out

Correctness is labeled by executing the model's SQL against a real SQLite database and comparing to a gold query — nothing hand-labeled. Each example is held out exactly once and scored by a cage that has already observed the other folds; the published number reflects learned calibration, not just the raw verifier signal. Artifacts (reliability diagram, metrics, labeled dataset) land in bench_out/. The same rig runs offline against a stub model (--model stub) for CI and install verification.

License

MIT — free for any use, including commercial.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bayesian_cage-0.1.1.tar.gz (59.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bayesian_cage-0.1.1-py3-none-any.whl (48.0 kB view details)

Uploaded Python 3

File details

Details for the file bayesian_cage-0.1.1.tar.gz.

File metadata

  • Download URL: bayesian_cage-0.1.1.tar.gz
  • Upload date:
  • Size: 59.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bayesian_cage-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d7764efcf97414a4ae73e3f34d6de2e9fa3d66f83ef9025e95d1c19309bc7469
MD5 cef1856b0a248847329dad71129cb579
BLAKE2b-256 ff67b87f579046ca27637ccc4ba9250a6c7b00b6eeb14791d5136e901da05fbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for bayesian_cage-0.1.1.tar.gz:

Publisher: release.yml on BayesCore/bayesian-cage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bayesian_cage-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: bayesian_cage-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 48.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bayesian_cage-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 03a751f192a7e37161ba042d902944716d632db300c8e5919c3a22e50c107cb7
MD5 4d4c219db98bff836c9a372fcda09e30
BLAKE2b-256 598cd7453bf686af598f5b767433f41aa76693481036509789e0bd13871425a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for bayesian_cage-0.1.1-py3-none-any.whl:

Publisher: release.yml on BayesCore/bayesian-cage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page