An offline Bayesian confidence gate for MCP tool calls — PROCEED / FLAG / BLOCK with calibrated, per-model confidence. Pure stdlib, BYO-LLM.
Project description
bayesian-cage
An offline Bayesian confidence gate for MCP tool calls — and any LLM output. It scores how much to trust an output and gates it PROCEED / FLAG / BLOCK with a calibrated, per-model confidence that sharpens over time from local memory. Pure stdlib. Bring your own LLM. Your data never leaves the machine.
Install
Requires Python 3.10+. Zero runtime dependencies.
# from PyPI (once published)
pip install bayesian-cage
# from source — use a venv; macOS's default python3 is often 3.9
python3.12 -m venv .venv && source .venv/bin/activate
pip install -e .
Verify your install (30 seconds, offline)
python -m bayesian_cage.eval.sqlbench.run --model stub
Runs a 55-task SQL calibration benchmark against a real SQLite database with a built-in
stub model. No network, no LLM, no API keys. Prints a reliability table comparing the
cage to raw model confidence; writes dataset.jsonl, results.json, and a
reliability.svg to bench_out/. If this finishes, the package works end-to-end.
Quickstart (library)
from bayesian_cage import Kernel
k = Kernel(db_path="~/.bayescore/bayescore.db")
v = k.check("SELECT 1", {"expected": "select 1"}, model_id="phi3", task="sql")
print(v.decision, v.p) # PROCEED 1.0
k.observe("phi3", "sql", correct=True, observation_id=v.observation_id)
Gate an MCP server
The cage is itself an MCP server. It speaks JSON-RPC over stdio to your host
(Claude Desktop, Cursor, Claude Code, …) and spawns the real server as a
subprocess via BAYESIAN_CAGE_DOWNSTREAM. It transparently forwards
initialize / tools/list / tools/call; every tools/call result is
graded by a Verifier and stamped with a _bayescore envelope. Under
enforce, BLOCK verdicts withhold the result and return isError: true.
Install a stable binary
MCP-host integration needs bayesian-cage on a path the host can spawn
without your shell PATH. pipx is the cleanest route:
brew install pipx && pipx ensurepath
pipx install bayesian-cage # or: pipx install . from a clone
which bayesian-cage # copy this absolute path
Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
and add an mcpServers block. Use absolute paths — Claude Desktop spawns
subprocesses without your shell's PATH:
{
"mcpServers": {
"filesystem-gated": {
"command": "/Users/you/.local/bin/bayesian-cage",
"env": {
"BAYESIAN_CAGE_DOWNSTREAM": "/usr/local/bin/npx -y @modelcontextprotocol/server-filesystem /Users/you/Documents/Claude",
"BAYESIAN_CAGE_MODE": "advisory",
"BAYESIAN_CAGE_VERIFIER": "filesystem",
"BAYESIAN_CAGE_MODEL": "fs-server"
}
}
}
}
Quit and relaunch Claude Desktop (Cmd+Q, not just close the window). The
filesystem-gated server appears in the tools picker; every tool result now
carries a verdict. Same shape works for Cursor (~/.cursor/mcp.json) and any
host that takes a stdio MCP server command.
One-command sanity check (no host required)
BAYESIAN_CAGE_DOWNSTREAM="npx -y @modelcontextprotocol/server-everything" \
BAYESIAN_CAGE_MODE=enforce bayesian-cage --selftest
Runs a canned initialize / tools/list / tools/call sequence against the
configured downstream and prints the gated responses.
Knobs
BAYESIAN_CAGE_MODE— advisory (default; never blocks, just labels) · enforce (BLOCK halts) · iterate (retry instead of stop). Start advisory; switch to enforce once you trust the verifier.BAYESIAN_CAGE_VERIFIER—heuristic(default) ·sql·json·filesystem·ensemble:heuristic+sql·your.pkg.module:CustomVerifierBAYESIAN_CAGE_MODEL— the bucket beliefs are keyed by. Use one per downstream (fs-server,pg-server) — the LLM driving the host is not what's being scored, the server is.BAYESIAN_CAGE_DB— SQLite belief store path (default~/.bayescore/bayescore.db)
Closing the loop
Without outcome feedback, beliefs sit at the Beta(1,1) prior and p just
equals the raw verifier signal — you get a useful day-one gate, but no
learning. To sharpen calibration over time, feed real outcomes back from
whatever ground truth you already have (test pass/fail, downstream success,
manual labels):
bayesian-cage observe --model fs-server --task list_directory \
--correct true --id <observation_id>
The observation_id is in the _bayescore envelope on each gated result.
After a few dozen labeled outcomes the per-(model, task, signal-bin) Beta
sharpens and the same raw signal earns a different p. bayesian-cage export --out labels.jsonl dumps the full observation log.
How it works
verify → calibrate → gate → observe. A pluggable Verifier scores the output; a
per-(model, task, signal-bin) Beta belief in one local SQLite store calibrates that
score into a probability; the gate decides by an explicit cost ratio; observed outcomes
update the belief. Calibration is learned per model, so swapping the LLM recalibrates
instead of conflating.
Verifiers
HeuristicVerifier— default, offline. Catches refusals, non-answers, error markers, placeholders, degenerate repetition, truncation, heavy hedging. Passexpectedorreferencefor a grounded signal.SqlVerifier— swallowed SQL errors, schema typos, broken-empty vs legitimately-empty results, all-NULL rows.JsonVerifier— malformed/valid JSON and arequiredkey list for structured outputs.FilesystemVerifier— POSIX errno markers and access-denied envelopes block; file listings and plain content pass.EnsembleVerifier— composes verifiers and gates on the strictest (minimum) signal.
Add your own by implementing Verifier.verify(output, context) -> VerifierResult.
Calibration benchmark — why not just ask the model?
Because a model's self-reported confidence is miscalibrated. Concretely, on a
55-task execution-graded text-to-SQL bench against phi3 (via Ollama, 5-fold,
seed=7, accuracy 67.3%):
| metric | raw phi3 | cage | direction |
|---|---|---|---|
| ECE | 0.325 | 0.081 | lower better — 4× tighter |
| Brier | 0.322 | 0.174 | lower better — 46% lower |
| catch-rate | 0.000 | 0.333 | higher better |
| wrong-passed | 18 | 12 | lower better |
| over-block | 0.000 | 0.000 | lower better |
| AUROC | 0.583 | 0.544 | higher better |
Phi3's raw confidence isn't discriminative — every one of the 55 answers came back at ~1.0, so AUROC stays near chance whether you ask the model or the cage. What the cage does is stop the model from lying to you about that confidence: calibration error drops 4×, and a third of phi3's wrong answers now get caught instead of passed through, at the cost of zero correct answers blocked.
Reproduce:
ollama pull phi3
python -m bayesian_cage.eval.sqlbench.run --model phi3 --seed 7 --out bench_out
Correctness is labeled by executing the model's SQL against a real SQLite
database and comparing to a gold query — nothing hand-labeled. Each example is
held out exactly once and scored by a cage that has already observed the
other folds; the published number reflects learned calibration, not just the
raw verifier signal. Artifacts (reliability diagram, metrics, labeled dataset)
land in bench_out/. The same rig runs offline against a stub model
(--model stub) for CI and install verification.
License
MIT — free for any use, including commercial.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bayesian_cage-0.1.0.tar.gz.
File metadata
- Download URL: bayesian_cage-0.1.0.tar.gz
- Upload date:
- Size: 59.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38c13ac0b5261878e1523b4108da8ae7a277370a209ce9d1d71bd2569601d42f
|
|
| MD5 |
bae81ff58bd0452626f20a6c60f86d5b
|
|
| BLAKE2b-256 |
cde925160828f8298a8a53dd40f8a68375862e121e692a86140da5d5dbd37cb6
|
Provenance
The following attestation bundles were made for bayesian_cage-0.1.0.tar.gz:
Publisher:
release.yml on BayesCore/bayesian-cage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bayesian_cage-0.1.0.tar.gz -
Subject digest:
38c13ac0b5261878e1523b4108da8ae7a277370a209ce9d1d71bd2569601d42f - Sigstore transparency entry: 1818441869
- Sigstore integration time:
-
Permalink:
BayesCore/bayesian-cage@9ae0cc761499c5756d747795e121d771aea97367 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/BayesCore
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9ae0cc761499c5756d747795e121d771aea97367 -
Trigger Event:
push
-
Statement type:
File details
Details for the file bayesian_cage-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bayesian_cage-0.1.0-py3-none-any.whl
- Upload date:
- Size: 47.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
def760ccea51e223e841c51a6a73ac2419abc71ae983a1bac39b08c430a3a156
|
|
| MD5 |
2ac032f81f3a50542776db75bdc4f50e
|
|
| BLAKE2b-256 |
94a7a9d4d4c1f4da6691aaad4be77c74a316c6ea65ae3b414037a92ffdfe8d12
|
Provenance
The following attestation bundles were made for bayesian_cage-0.1.0-py3-none-any.whl:
Publisher:
release.yml on BayesCore/bayesian-cage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bayesian_cage-0.1.0-py3-none-any.whl -
Subject digest:
def760ccea51e223e841c51a6a73ac2419abc71ae983a1bac39b08c430a3a156 - Sigstore transparency entry: 1818441882
- Sigstore integration time:
-
Permalink:
BayesCore/bayesian-cage@9ae0cc761499c5756d747795e121d771aea97367 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/BayesCore
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9ae0cc761499c5756d747795e121d771aea97367 -
Trigger Event:
push
-
Statement type: