Skip to main content

Plumbref verifies AI coding-agent claims against source references.

Project description

Plumbref

Plumbref verifies AI coding-agent claims against source references.

It exposes:

  • an MCP server for agent-driven verification workflows
  • a CLI for local smoke tests and report rendering
  • deterministic Markdown and JSON reports

Plumbref does not call a model API. It does not need an API key, database, vector store, hosted service, or UI.

Why A Harness

Prompts and skills can ask an agent to be careful, but they do not preserve a structured verification trail. Plumbref gives the agent a small protocol:

  1. start a verification session
  2. store atomic claims or predicted outcomes
  3. search the repository
  4. read bounded evidence snippets
  5. record conservative judgments
  6. render a report

The agent still extracts claims and reasons over evidence. Plumbref supplies the source-grounded workflow, budgets, redaction, status semantics, and report artifacts.

Install

Install the latest published package when it is available:

pipx install plumbref

Beta install from GitHub:

pipx install git+https://github.com/facundotaboada/plumbref.git

For local development:

python -m venv .venv
. .venv/bin/activate
python -m pip install -e ".[dev]"

Plumbref uses rg/ripgrep for repository search:

rg --version

CLI

Run the MCP server against a repository:

plumbref mcp --repo-root /path/to/repo

Run a local verification smoke test:

plumbref verify \
  --repo-root /path/to/repo \
  --question "What does this scheduled job do?" \
  --answer answer.md

Use a config file:

plumbref verify \
  --repo-root /path/to/repo \
  --config /path/to/plumbref.toml \
  --question "What happens if provider_id is missing?" \
  --answer answer.md

Explicit modes are available:

plumbref verify \
  --repo-root /path/to/repo \
  --mode scenario \
  --scenario "run_scheduled_job receives provider_id=None" \
  --budget-mode normal \
  --output-mode engineer \
  --output-mode json \
  --question "What happens if provider_id is missing?" \
  --answer answer.md

For change-impact checks:

plumbref verify \
  --repo-root /path/to/repo \
  --mode change_impact \
  --changed-file app/reports.py \
  --question "What does this change affect?" \
  --answer impact.md

The CLI does not extract claims automatically. For a full workflow, use MCP or pass a JSON claims file with --claims.

Config

Config discovery order:

  1. explicit --config
  2. <repo-root>/.plumbref.local.toml
  3. <repo-root>/.plumbref.toml
  4. ~/.config/plumbref/config.toml

Example:

ignored_paths = [
  ".git",
  ".venv",
  "node_modules",
  ".cache",
]

privacy_patterns = [
  "AKIA[0-9A-Z]{16}",
  "(?i)(api[_-]?key|secret|token|password)\\s*[:=]\\s*['\\\"][^'\\\"]+['\\\"]",
]

default_budget_mode = "normal"
default_output_modes = ["engineer", "json"]

redaction_patterns is accepted as an alias for privacy_patterns.

MCP Setup

Plumbref is a stdio MCP server. Any MCP-capable client can launch it with:

plumbref mcp --repo-root /path/to/repo

Cursor-style MCP config:

{
  "mcpServers": {
    "plumbref": {
      "command": "plumbref",
      "args": ["mcp", "--repo-root", "/path/to/repo"]
    }
  }
}

With explicit config:

{
  "mcpServers": {
    "plumbref": {
      "command": "plumbref",
      "args": [
        "mcp",
        "--repo-root",
        "/path/to/repo",
        "--config",
        "/path/to/repo/.plumbref.toml"
      ]
    }
  }
}

Claude Code, Codex, and other MCP clients generally use the same command/args shape for stdio servers. Use the client-specific location for MCP server configuration and point it at the same command.

MCP Workflow

Start a session:

{
  "question": "What does this scheduled job do?",
  "answer": "The scheduled job queues provider sync work when provider_id is present.",
  "mode": "explanation",
  "budget_mode": "normal",
  "output_modes": ["engineer", "json"]
}

Store claims extracted by the agent:

{
  "claims": [
    {
      "text": "The scheduled job queues provider sync work when provider_id is present.",
      "claim_type": "behavior",
      "risk": "medium"
    }
  ]
}

Then search, read evidence, record a judgment, and render the report with the MCP tools exposed by the server.

Explanation Mode

Use explanation mode for claims about current source behavior.

Question:

What does this scheduled job do?

Claim:

{
  "text": "The scheduled job queues provider sync work when provider_id is present.",
  "claim_type": "behavior",
  "risk": "medium"
}

Plumbref should mark it supported only when the agent cites source lines that show the queued path and has checked for relevant contradictions.

Scenario Mode

Use scenario mode for predicted outcomes.

Question:

What happens if provider_id is missing?

Start payload:

{
  "mode": "scenario",
  "scenario": "run_scheduled_job receives provider_id=None.",
  "question": "What happens if provider_id is missing?",
  "answer": "The scheduled job is skipped.",
  "output_modes": ["engineer", "json"]
}

Predicted outcome claim:

{
  "text": "run_scheduled_job returns skipped when provider_id is missing.",
  "expected_outcome": "The scheduled job is skipped.",
  "assumptions": ["provider_id is None."],
  "claim_type": "behavior",
  "risk": "medium"
}

Change-Impact Mode

Use change-impact mode to verify a factual impact statement against changed files, a diff, or a local git diff target.

Question:

This change only affects report wording.

Start payload:

{
  "mode": "change_impact",
  "question": "What does this change affect?",
  "answer": "This change only affects report wording.",
  "budget_mode": "normal",
  "output_modes": ["engineer", "json"]
}

Record explicit changed files:

{
  "source": "files",
  "changed_files": ["app/reports.py"],
  "changed_symbols": [
    {
      "name": "render_report_title",
      "kind": "function",
      "file": "app/reports.py"
    }
  ]
}

Claims containing absolute language such as "only", "always", or "never" require broader contradiction searches before they can be treated as supported.

Status Semantics

  • supported: cited source evidence supports the claim as written, and a contradiction pass was recorded.
  • too_broad: evidence supports a narrower or qualified version, but not the claim as written.
  • uncertain: relevant evidence exists, but it is insufficient for a confident judgment.
  • contradicted: source evidence conflicts with the claim.
  • not_found: searches did not find relevant evidence.
  • not_verifiable: the claim cannot be verified from local source evidence.

Reports And Cache

By default, reports are written under:

.cache/plumbref/reports/

Generated reports and caches are ignored by the project .gitignore.

Development

Run tests:

pytest

Run lint:

ruff check .

Limitations

  • Plumbref does not extract claims by itself.
  • Plumbref does not decide truth with an LLM.
  • Plumbref cannot verify claims that require production data, private services, or external systems unless the relevant evidence exists in the local repository.
  • Plumbref search is lexical and repo-local.
  • supported means supported by the cited source evidence, not globally true for every deployment or runtime state.

Non-Goals

  • no model API dependency
  • no hosted service
  • no database
  • no vector store
  • no UI
  • no automatic code review replacement
  • no production-data inspection

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plumbref-0.1.0.tar.gz (67.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plumbref-0.1.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file plumbref-0.1.0.tar.gz.

File metadata

  • Download URL: plumbref-0.1.0.tar.gz
  • Upload date:
  • Size: 67.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for plumbref-0.1.0.tar.gz
Algorithm Hash digest
SHA256 55476305cf17d6ab9410c5d14ff8aa77fa14a4ee8b97b3e76b0b6742f1e815d6
MD5 86c413efef1ac0797950797d702bb2f4
BLAKE2b-256 52af374906b60cdfce03cf3ceb33d5b2fab6f004b382a9981dcba5c96c69f28f

See more details on using hashes here.

File details

Details for the file plumbref-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: plumbref-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for plumbref-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5af0d89f26b29fdd89a54f8c8c48e0d706e3664ac31899fd91477ae23d0133c2
MD5 39f97a5523d54a2fe83661db3796fd5c
BLAKE2b-256 8240db3b588eb13ba52f63ad15bf2f56a575d8f3b7e92b4d6cd9ed179b6b0929

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page