Plumbref verifies AI coding-agent claims against source references.
Project description
Plumbref
Plumbref verifies AI coding-agent claims against source references.
It exposes:
- an MCP server for agent-driven verification workflows
- a CLI for local smoke tests and report rendering
- deterministic Markdown and JSON reports
Plumbref does not call a model API. It does not need an API key, database, vector store, hosted service, or UI.
Why A Harness
Prompts and skills can ask an agent to be careful, but they do not preserve a structured verification trail. Plumbref gives the agent a small protocol:
- start a verification session
- store atomic claims or predicted outcomes
- search the repository
- read bounded evidence snippets
- record conservative judgments
- render a report
The agent still extracts claims and reasons over evidence. Plumbref supplies the source-grounded workflow, budgets, redaction, status semantics, and report artifacts.
Install
Install the latest published package when it is available:
pipx install plumbref
Beta install from GitHub:
pipx install git+https://github.com/facundotaboada/plumbref.git
For local development:
python -m venv .venv
. .venv/bin/activate
python -m pip install -e ".[dev]"
Plumbref uses rg/ripgrep for repository search:
rg --version
CLI
Run the MCP server against a repository:
plumbref mcp --repo-root /path/to/repo
Run a local verification smoke test:
plumbref verify \
--repo-root /path/to/repo \
--question "What does this scheduled job do?" \
--answer answer.md
Use a config file:
plumbref verify \
--repo-root /path/to/repo \
--config /path/to/plumbref.toml \
--question "What happens if provider_id is missing?" \
--answer answer.md
Explicit modes are available:
plumbref verify \
--repo-root /path/to/repo \
--mode scenario \
--scenario "run_scheduled_job receives provider_id=None" \
--budget-mode normal \
--output-mode engineer \
--output-mode json \
--question "What happens if provider_id is missing?" \
--answer answer.md
For change-impact checks:
plumbref verify \
--repo-root /path/to/repo \
--mode change_impact \
--changed-file app/reports.py \
--question "What does this change affect?" \
--answer impact.md
The CLI does not extract claims automatically. For a full workflow, use MCP or
pass a JSON claims file with --claims.
Config
Config discovery order:
- explicit
--config <repo-root>/.plumbref.local.toml<repo-root>/.plumbref.toml~/.config/plumbref/config.toml
Example:
ignored_paths = [
".git",
".venv",
"node_modules",
".cache",
]
privacy_patterns = [
"AKIA[0-9A-Z]{16}",
"(?i)(api[_-]?key|secret|token|password)\\s*[:=]\\s*['\\\"][^'\\\"]+['\\\"]",
]
default_budget_mode = "normal"
default_output_modes = ["engineer", "json"]
redaction_patterns is accepted as an alias for privacy_patterns.
MCP Setup
Plumbref is a stdio MCP server. Any MCP-capable client can launch it with:
plumbref mcp --repo-root /path/to/repo
Cursor-style MCP config:
{
"mcpServers": {
"plumbref": {
"command": "plumbref",
"args": ["mcp", "--repo-root", "/path/to/repo"]
}
}
}
With explicit config:
{
"mcpServers": {
"plumbref": {
"command": "plumbref",
"args": [
"mcp",
"--repo-root",
"/path/to/repo",
"--config",
"/path/to/repo/.plumbref.toml"
]
}
}
}
Claude Code, Codex, and other MCP clients generally use the same command/args shape for stdio servers. Use the client-specific location for MCP server configuration and point it at the same command.
MCP Workflow
Start a session:
{
"question": "What does this scheduled job do?",
"answer": "The scheduled job queues provider sync work when provider_id is present.",
"mode": "explanation",
"budget_mode": "normal",
"output_modes": ["engineer", "json"]
}
Store claims extracted by the agent:
{
"claims": [
{
"text": "The scheduled job queues provider sync work when provider_id is present.",
"claim_type": "behavior",
"risk": "medium"
}
]
}
Then search, read evidence, record a judgment, and render the report with the MCP tools exposed by the server.
Explanation Mode
Use explanation mode for claims about current source behavior.
Question:
What does this scheduled job do?
Claim:
{
"text": "The scheduled job queues provider sync work when provider_id is present.",
"claim_type": "behavior",
"risk": "medium"
}
Plumbref should mark it supported only when the agent cites source lines
that show the queued path and has checked for relevant contradictions.
Scenario Mode
Use scenario mode for predicted outcomes.
Question:
What happens if provider_id is missing?
Start payload:
{
"mode": "scenario",
"scenario": "run_scheduled_job receives provider_id=None.",
"question": "What happens if provider_id is missing?",
"answer": "The scheduled job is skipped.",
"output_modes": ["engineer", "json"]
}
Predicted outcome claim:
{
"text": "run_scheduled_job returns skipped when provider_id is missing.",
"expected_outcome": "The scheduled job is skipped.",
"assumptions": ["provider_id is None."],
"claim_type": "behavior",
"risk": "medium"
}
Change-Impact Mode
Use change-impact mode to verify a factual impact statement against changed files, a diff, or a local git diff target.
Question:
This change only affects report wording.
Start payload:
{
"mode": "change_impact",
"question": "What does this change affect?",
"answer": "This change only affects report wording.",
"budget_mode": "normal",
"output_modes": ["engineer", "json"]
}
Record explicit changed files:
{
"source": "files",
"changed_files": ["app/reports.py"],
"changed_symbols": [
{
"name": "render_report_title",
"kind": "function",
"file": "app/reports.py"
}
]
}
Claims containing absolute language such as "only", "always", or "never" require broader contradiction searches before they can be treated as supported.
Status Semantics
supported: cited source evidence supports the claim as written, and a contradiction pass was recorded.too_broad: evidence supports a narrower or qualified version, but not the claim as written.uncertain: relevant evidence exists, but it is insufficient for a confident judgment.contradicted: source evidence conflicts with the claim.not_found: searches did not find relevant evidence.not_verifiable: the claim cannot be verified from local source evidence.
Reports And Cache
By default, reports are written under:
.cache/plumbref/reports/
Generated reports and caches are ignored by the project .gitignore.
Development
Run tests:
pytest
Run lint:
ruff check .
Limitations
- Plumbref does not extract claims by itself.
- Plumbref does not decide truth with an LLM.
- Plumbref cannot verify claims that require production data, private services, or external systems unless the relevant evidence exists in the local repository.
- Plumbref search is lexical and repo-local.
supportedmeans supported by the cited source evidence, not globally true for every deployment or runtime state.
Non-Goals
- no model API dependency
- no hosted service
- no database
- no vector store
- no UI
- no automatic code review replacement
- no production-data inspection
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file plumbref-0.1.0.tar.gz.
File metadata
- Download URL: plumbref-0.1.0.tar.gz
- Upload date:
- Size: 67.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55476305cf17d6ab9410c5d14ff8aa77fa14a4ee8b97b3e76b0b6742f1e815d6
|
|
| MD5 |
86c413efef1ac0797950797d702bb2f4
|
|
| BLAKE2b-256 |
52af374906b60cdfce03cf3ceb33d5b2fab6f004b382a9981dcba5c96c69f28f
|
File details
Details for the file plumbref-0.1.0-py3-none-any.whl.
File metadata
- Download URL: plumbref-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5af0d89f26b29fdd89a54f8c8c48e0d706e3664ac31899fd91477ae23d0133c2
|
|
| MD5 |
39f97a5523d54a2fe83661db3796fd5c
|
|
| BLAKE2b-256 |
8240db3b588eb13ba52f63ad15bf2f56a575d8f3b7e92b4d6cd9ed179b6b0929
|