Skip to main content

MCP server for verifying AI agent claims vs reality — inline grounding-check on retrieval context, swallowed-exception scanner on agent-written code, and multi-turn transcript review for unverified completions + cross-turn contradictions. Sub-second, local, free, MCP-native — the lightweight complement to dashboard-based eval frameworks (DeepEval, Phoenix, LangSmith).

Project description

openclaw-output-vetter-mcp

MCP server for verifying AI agent claims vs reality — single-transcript inline grounding-check that flags when an agent's response states facts not in the input context, when its code silently swallows exceptions and substitutes mock data, or when its multi-turn transcript contains contradictions or unverified completion claims. Sub-second, local, free, MCP-native — designed to be called inline from Claude Code / Cursor / Cline / OpenClaw agents during the conversation, not as a separate eval-pipeline. The lightweight complement to dashboard-based eval frameworks (DeepEval, Phoenix, LangSmith).

Status: v1.0.0 License: MIT MCP PyPI


What it does

Production AI agents fail in three quiet ways that pass every standard dashboard:

  • Hallucinated claims. r/SaaS founder thread (May 2 2026) verbatim: "Status 200, latency normal, tokens normal. A hallucinated response looks identical to a good one in every standard dashboard." The fix the founder describes is exactly what this MCP server provides: "a lightweight check that flags when the model states something not in the input context."
  • Silent fake success in agent-written code. r/ClaudeAI thread (509 pts, 186 comments) verbatim: "The agent couldn't get auth working, so it quietly inserted a try/catch that returns sample data on failure. The output you saw on day one was never real."
  • Unverified completion claims. r/AI_Agents (114 pts) — agent self-reports completion ("I've configured X"); reality at outcome level (booked meetings, deployed services, working integrations) doesn't match.

This MCP server runs three pure-Python checks inline during the conversation — no API key, no LLM-as-judge cost, sub-second:

> claude: did your last answer hallucinate anything?
[MCP tool: verify_response_grounding]

verdict: FABRICATED
ungrounded_count: 3
overall_grounding_score: 0.08
ungrounded claims:
  - "Pixelette Technologies has raised $12M in Series A funding" (overlap 0.04)
  - "led by Sequoia Capital" (overlap 0.00)
  - "47 full-time employees" (overlap 0.00)

summary: All 3 claim(s) lack grounding in the input context — likely hallucinated.
> claude: scan the code you just wrote for swallowed-exception patterns.
[MCP tool: find_swallowed_exceptions]

verdict: FABRICATED (one HIGH-severity finding)
findings:
  [HIGH] Line 12 — mock-substitution
    except Exception:
        return {"id": 1, "name": "sample"}
    Description: except handler returns fabricated/mock data instead of re-raising
    — the call site sees a 'successful' response built from constants. This is the
    silent-fake-success pattern.

summary: 1 swallowing pattern detected — at least one returns fabricated data.
> claude: review the agent's transcript so far.
[MCP tool: review_transcript]

verdict: FABRICATED
issue_count: 2
issues:
  [HIGH] turns [3] — unverified-completion-claim
    "I've configured the gateway and verified everything works."
    Description: assistant claims completion of an action but no tool calls are
    present in this turn or earlier turns.
  [MEDIUM] turns [2, 7] — cross-turn-contradiction
    Cross-turn factual drift on subject 'the api':
    turn 2 says 'returns json for every request',
    turn 7 says 'returns xml for legacy endpoints'.

summary: Reviewed 8 turn(s); flagged 2 issue(s) including unverified completion
claim(s) — investigate before trusting the transcript.

Why openclaw-output-vetter-mcp

Three things existing eval frameworks (DeepEval, Phoenix, LangSmith, Galileo, Langfuse) don't do well together:

  1. Inline single-transcript scope, not eval-pipeline orchestration. DeepEval ships an MCP server — but its scope is "run evals, pull datasets, and inspect traces straight from claude code, cursor" (verbatim from their docs). That's eval-pipeline orchestration: schedule a named eval suite against a stored dataset; review trace history. This server is the opposite shape: verify this specific conversation right now, before the user sees the response. Same metric stack philosophically (faithfulness, grounding); different surface.

  2. Sub-second + local + free. No LLM-as-judge call, no API key, no per-call cost. Pure-Python claim splitting + Jaccard overlap + AST walking. Tradeoff: lower theoretical accuracy than LLM-as-judge for ambiguous edge cases. For high-frequency inline use (every assistant turn) the speed-vs-accuracy tradeoff favors lightweight. v1.1 will offer optional DeepEval-LLM-as-judge mode for users who want the higher-quality check.

  3. Three checks for three distinct failure modes, not one umbrella metric.

    • Grounding (verify_response_grounding) catches hallucinated facts
    • Swallowed exceptions (find_swallowed_exceptions) catches silent-fake-success in agent-written code
    • Transcript review (review_transcript) catches unverified completion claims + cross-turn drift

    Other tools collapse all three into "faithfulness." The failure modes are different and the corrective actions are different. Surfacing them separately makes the response actionable.

Built for the production AI operator who's already using Claude Code / Cursor / Cline / OpenClaw and wants a defensive layer the agent calls before its response goes user-facing.


Tool surface

Tool What it returns
verify_response_grounding Per-claim grounded/ungrounded + overall verdict (CLEAN / PARTIALLY_GROUNDED / FABRICATED) + overlap scores + summary
find_swallowed_exceptions Per-finding line number + pattern (pass-only / mock-substitution / silent-log-and-return / bare-except) + severity + code excerpt
review_transcript Per-issue turn indices + issue kind (unverified-completion-claim / cross-turn-contradiction) + severity + evidence excerpt

Resources:

  • vetter://demo/grounded — sample CLEAN grounding result
  • vetter://demo/fabricated — sample FABRICATED grounding result
  • vetter://demo/swallowed-exceptions — sample swallowed-exception scan

Prompts:

  • verify-this-answer(threshold) — walks verify_response_grounding on the most recent assistant answer
  • audit-this-code — walks find_swallowed_exceptions on a code block + explains each finding's risk

Quickstart

Install

pip install openclaw-output-vetter-mcp

Configure for Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "openclaw-output-vetter": {
      "command": "python",
      "args": ["-m", "openclaw_output_vetter_mcp"]
    }
  }
}

Restart Claude Desktop. Test:

Resource vetter://demo/grounded — read it back to me.

The demo resource returns a sample GroundingResult so you can verify the protocol wiring without authoring inputs.


Roadmap

Version Scope Status
v1.0 3 scanners (grounding via Jaccard / swallowed-exceptions via AST / transcript review via pattern matching), 3 tools / 3 demo resources / 2 prompts, GitHub Actions CI matrix, PyPI Trusted Publishing, MCP Registry submission, 40+ tests
v1.1 Optional LLM-as-judge backend (wraps DeepEval's FaithfulnessMetric / HallucinationMetric for higher-quality grounding); embedding-based similarity option (sentence-transformers); custom claim-extraction prompts
v1.2 Backend-pluggable architecture (per-tool backend selection); incremental review (verify only the last N turns); persistent issue tracking across multi-session work
v1.x Webhook emit on FABRICATED verdict; integration with CI to gate AI-generated PRs that fail grounding checks

Need this adapted to your stack?

If your AI deployment uses a different agent harness, custom claim-extraction prompts, language other than Python for the swallowed-exception scanner, or specific compliance / auditing requirements — that's a Custom MCP Build engagement.

Tier Scope Investment Timeline
Simple Custom claim-extraction prompts + tuned thresholds for your domain $8,000–$10,000 1–2 weeks
Standard Multi-language swallowed-exception scanners (TypeScript / Go / Rust AST walks) + custom severity rules $15,000–$25,000 2–4 weeks
Complex LLM-as-judge backend with your hosted model + persistence + CI integration + audit-trail $30,000–$45,000 4–8 weeks

To engage:

  1. Email temur@pixelette.tech with subject Custom MCP Build inquiry — output verification
  2. Include: 1-paragraph description of your stack + which tier
  3. Reply within 2 business days with a 30-min discovery call slot

This server is part of a production-AI infrastructure MCP suite — companion to silentwatch-mcp (cron silent-failure detection), openclaw-health-mcp (deployment health), openclaw-cost-tracker-mcp (token-cost telemetry + 429 prediction), openclaw-skill-vetter-mcp (skill security vetting), and openclaw-upgrade-orchestrator-mcp (upgrade safety + provider-side regression detection). Install all six for full operational visibility.


Production AI audits

If you're running production AI and want an outside practitioner to score readiness, find the failure patterns already present (silent fake success being pattern P3.x in the catalog), and write the corrective-action plan:

Tier Scope Investment Timeline
Audit Lite One system, top-5 findings, written report $1,500 1 week
Audit Standard Full audit, all 14 patterns, 5 Cs findings, 90-day follow-up $3,000 2–3 weeks
Audit + Workshop Standard audit + 2-day team workshop + first monthly audit included $7,500 3–4 weeks

Same email channel: temur@pixelette.tech with subject AI audit inquiry.


Contributing

PRs welcome. The three scanners are intentionally pluggable — each lives in its own module under src/openclaw_output_vetter_mcp/scanners/ and is a pure function over input → typed result. Adding a new scanner is one file + one test file + one tool registration in server.py.

Bug reports + feature requests: open a GitHub issue.


License

MIT — see LICENSE.


Related


Built by Temur Khan — independent practitioner on production AI systems. Contact: temur@pixelette.tech

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openclaw_output_vetter_mcp-1.0.1.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openclaw_output_vetter_mcp-1.0.1-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file openclaw_output_vetter_mcp-1.0.1.tar.gz.

File metadata

File hashes

Hashes for openclaw_output_vetter_mcp-1.0.1.tar.gz
Algorithm Hash digest
SHA256 a44a5d72c2eefdd333b5c89f7da916a519adffd66ebfdecabe71605d782c4ab2
MD5 5ee762f2adf57b08988ba29b2f4a27ee
BLAKE2b-256 61a02d7fdd8cc7e7021ae1d3699bb1bd031d611426e0de076b2f452d75dec9eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for openclaw_output_vetter_mcp-1.0.1.tar.gz:

Publisher: release.yml on temurkhan13/openclaw-output-vetter-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openclaw_output_vetter_mcp-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for openclaw_output_vetter_mcp-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 22319646380999541ac3041b1690dc31ee3a2fc7fc4edd5ed74c012d1b38e33c
MD5 74a9b8a6a432443534f97c29ff287b51
BLAKE2b-256 47b54a9ab1f570e616af9d83f9e0895ac853e3090105e3f86719c6840a15586f

See more details on using hashes here.

Provenance

The following attestation bundles were made for openclaw_output_vetter_mcp-1.0.1-py3-none-any.whl:

Publisher: release.yml on temurkhan13/openclaw-output-vetter-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page