Skip to main content

Local-first MCP context budget and tool-selection verifier

Project description

mcp-context-budget

Local-first MCP context budget and tool-selection verifier for agentic coding environments.

MCP servers can load enough tool schema and response data to burn a large fraction of an agent context window before useful work starts. This tool gives developers a repeatable budget gate:

  • scan MCP config or tools/list fixtures
  • estimate schema and response-token cost
  • select a smaller task-relevant tool set with deterministic SQLite FTS5/BM25
  • optionally prove semantic tool selection from fixture or Ollama embeddings
  • write a lockfile for CI
  • fail builds when schema or response budgets regress
  • compress recorded response fixtures under a response budget
  • apply selected-tool locks back to caller-owned MCP config files
  • opt into local stdio tools/list introspection for command-discovered servers
  • audit MCP configs for plaintext secret exposure without printing values
  • prove the spine with a Docker demo

No private Orion services are required. The core CLI has no external runtime service dependency. Semantic selection can optionally call Ollama only when the --embedding-backend ollama flag is explicitly selected. Live stdio introspection can optionally start a caller-owned local MCP command only when --allow-start is explicitly selected.

Install

python3.11 -m venv .venv
. .venv/bin/activate
pip install -e '.[dev]'

Quick Demo

mcp-context-budget demo \
  --task "triage a GitHub issue and update one ticket" \
  --max-tools 8 \
  --max-schema-tokens 6000 \
  --max-response-tokens 4000

Expected spine proof:

DEMO_CATALOG_SERVERS=5
DEMO_CATALOG_TOOLS=120
BEFORE_SCHEMA_TOKENS=<large>
SELECTED_TOOLS=<8-or-less>
AFTER_SCHEMA_TOKENS=<cap-or-less>
OVERSIZED_RESPONSE_FIXTURE=flagged
BUDGET_STATUS=PASS

Commands

mcp-context-budget scan --tool-list fixtures/demo-tools.json --out mcp-budget.report.md --lock-out mcp-budget.lock.json
mcp-context-budget select --tool-list fixtures/demo-tools.json --task "triage a GitHub issue" --max-tools 8 --max-schema-tokens 6000 --out-lock mcp-budget.lock.json
mcp-context-budget semantic-select --tool-list fixtures/demo-tools.json --task "triage a GitHub issue" --embedding-backend fixture --embedding-file embeddings.json --out-lock mcp-budget.lock.json
mcp-context-budget check --lock mcp-budget.lock.json --max-schema-tokens 6000 --max-response-tokens 4000
mcp-context-budget compress-responses --fixtures responses/ --max-response-tokens 4000 --out-dir compressed-responses --report compression-report.json
mcp-context-budget config-apply --config mcp.json --lock mcp-budget.lock.json --dry-run --patch-out mcp-config.patch.json
mcp-context-budget config-audit --config mcp.json --json-out mcp-config-audit.json --fail-on high
mcp-context-budget export --lock mcp-budget.lock.json --format sarif --out mcp-budget.sarif

scan --config supports Claude/Cursor/Codex-style JSON with an mcpServers object. Server entries may include toolsListPath to point at a recorded tools/list JSON fixture. Server entries may also include stdioFraming (auto, json-lines, or content-length) when a local stdio server needs a fixed transport framing. Environment values are redacted in reports.

--allow-start is intentionally conservative. It is never implied by default, never required for static toolsListPath or inline-tool configs, and never starts a hosted service. When explicitly selected, the tool starts the caller-owned local stdio command as argv with shell=False, sends MCP initialize and tools/list, enforces timeout and stdio-byte caps, redacts env metadata, and exits the process after listing tools. The default --stdio-framing auto prefers the current MCP SDK JSON-lines stdio transport and falls back to the legacy Content-Length fixture transport; pass --stdio-framing json-lines or --stdio-framing content-length to force one.

For command-discovered servers that need to become enforceable by config-apply, combine --allow-start with --materialize-tools-list:

mcp-context-budget config-apply \
  --config mcp.json \
  --lock mcp-budget.lock.json \
  --write \
  --allow-start \
  --start-timeout-seconds 2 \
  --max-stdio-bytes 65536 \
  --stdio-framing auto \
  --materialize-tools-list materialized-tools/

This writes a local toolsListPath sidecar for the discovered tools, applies the selected-tool lock there, and leaves later scan/select runs static again.

Semantic Selection

semantic-select keeps the v0.1 lockfile shape but ranks tools by embedding similarity before applying --max-tools and --max-schema-tokens.

Fixture mode is deterministic and requires no service:

mcp-context-budget semantic-select \
  --tool-list tools.json \
  --task "diagnose bug report" \
  --embedding-backend fixture \
  --embedding-file embeddings.json \
  --out-lock semantic.lock.json

The fixture file must contain:

{
  "queries": {"diagnose bug report": [1.0, 0.0]},
  "tools": {"github/get_issue": [1.0, 0.0]}
}

Ollama mode uses stdlib HTTP and adds no Python package dependency:

mcp-context-budget semantic-select \
  --tool-list tools.json \
  --task "diagnose bug report" \
  --embedding-backend ollama \
  --ollama-url http://localhost:11434 \
  --ollama-model nomic-embed-text

Response Fixture Compression

compress-responses reads one response fixture or a directory of *.json fixtures, writes compressed copies, and emits a JSON report.

mcp-context-budget compress-responses \
  --fixtures responses/ \
  --max-response-tokens 4000 \
  --out-dir compressed-responses \
  --report compression-report.json

The v0.2 strategy is deterministic extractive compression. It preserves common identifier fields and writes a summary when large body fields need to be cut.

Config Apply

config-apply turns a selected-tool lock into a safe local MCP config patch. Dry-run is the default posture; --write is required before the config file is changed, and write mode creates a backup.

mcp-context-budget config-apply \
  --config mcp.json \
  --lock mcp-budget.lock.json \
  --mode disable-unselected \
  --dry-run \
  --patch-out mcp-config.patch.json

mcp-context-budget config-apply \
  --config mcp.json \
  --lock mcp-budget.lock.json \
  --mode disable-unselected \
  --write \
  --backup-dir backups/

Reports redact environment values.

The apply contract is enforced, not advisory:

  • Inline and toolsListPath tools are both patched. A server whose tools live in an external tools/list JSON has that file patched (and backed up) too — not silently skipped.
  • The lock is bound to the config. Each lock records a config_fingerprint of its tool universe; config-apply refuses a lock whose fingerprint does not match the target config (a foreign/stale lock would otherwise disable every tool and still report success). Override with --allow-fingerprint-mismatch.
  • Honest status, never a false PASS. A command-discovered server (no inline tools, no toolsListPath) cannot be enforced without live startup, so it is reported under not_patchable and the status is PARTIAL, not PASS.
  • Opt-in materialization closes the PARTIAL gap. With --allow-start and --materialize-tools-list, command-discovered tools are listed through local stdio, saved to a caller-owned sidecar, and enforced as a normal toolsListPath catalog.
  • Disabling takes effect. The loader honors enabled: false, so a disabled tool (or server) drops out of the budget on the next scan/select.

Config Secret Audit

config-audit is a read-only hygiene check for MCP config files. It flags high-confidence literal secrets in env values, args, and nested config fields, while treating ${TOKEN} references, op://... references, and redacted placeholders as safe references.

mcp-context-budget config-audit \
  --config mcp.json \
  --json-out mcp-config-audit.json \
  --fail-on high

Reports include the config path, finding path, severity, secret class, length bucket, and a short fingerprint. Literal secret values are never printed.

Docker

docker build -t mcp-context-budget:local .
docker run --rm mcp-context-budget:local demo \
  --task "triage a GitHub issue and update one ticket" \
  --max-tools 8 \
  --max-schema-tokens 6000 \
  --max-response-tokens 4000

The image exposes no service port.

v0.2 also includes independent Docker proof commands for the new capabilities:

docker run --rm mcp-context-budget:local semantic-demo \
  --task "diagnose bug report" \
  --max-tools 3 \
  --max-schema-tokens 3000
docker run --rm mcp-context-budget:local compress-demo --max-response-tokens 4000
docker run --rm mcp-context-budget:local config-demo
docker run --rm mcp-context-budget:local allow-start-demo --start-timeout-seconds 2 --max-stdio-bytes 65536 --stdio-framing auto
docker run --rm mcp-context-budget:local semantic-demo --task "diagnose bug report" --max-tools 3 --max-schema-tokens 3000 --embedding-backend fixture
docker run --rm mcp-context-budget:local prove-parallel-ollama-demo
docker run --rm mcp-context-budget:local config-audit-demo
docker run --rm mcp-context-budget:local config-multiserver-demo

Expected v0.3 proof lines:

LIVE_INTROSPECTION_STATUS=PASS
AFTER_CONFIG_NOT_PATCHABLE=0
CONFIG_AUDIT_STATUS=PASS
CONFIG_AUDIT_SECRET_VALUES_REDACTED=true
CONFIG_MULTISERVER_STATUS=PASS

Out of v0.3

Locked Out

  • Live runtime MCP proxy/gateway that intercepts and routes actual tool calls.
  • Browser UI.
  • Organization-wide background scanner.
  • Vendor-specific hosted dashboards.

These are not v0.4 commitments; they break the local-first CLI verifier shape.

Shipped in v0.4

  • Parallelized Ollama embeddings for semantic-select when --embedding-backend ollama is explicitly selected (fixture backend remains the default for CI/docker).

Deferred to v0.5

  • Automatic response compression for arbitrary live MCP servers.
  • Broader CLI polish.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_context_budget-0.4.0.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_context_budget-0.4.0-py3-none-any.whl (40.2 kB view details)

Uploaded Python 3

File details

Details for the file mcp_context_budget-0.4.0.tar.gz.

File metadata

  • Download URL: mcp_context_budget-0.4.0.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mcp_context_budget-0.4.0.tar.gz
Algorithm Hash digest
SHA256 dc628c4a3a8468f8009a251155ce2341f73ccdb0f60ae10c01eaaa015ffa5c69
MD5 ec607530bd9c4db75ce33d2746f1d318
BLAKE2b-256 8f9a70c512f6479994c09c4f57e8b7ee6ac4389f18c94139239cea8be63c6e44

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_context_budget-0.4.0.tar.gz:

Publisher: release.yml on OrionArchitekton/mcp-context-budget

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_context_budget-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_context_budget-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ceda20caf239f796e415152396dc9519610fc45e6eb851fd73d9cddbb92c227e
MD5 0e49dbbcf9ff4ef615cb1371d826037c
BLAKE2b-256 8eea80a4de47a3183503c4372092cc835267211190c01d4a5a0996fc8e5f708e

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_context_budget-0.4.0-py3-none-any.whl:

Publisher: release.yml on OrionArchitekton/mcp-context-budget

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page