Deterministic evaluation tools for AI coding agents, exposed as an MCP server.
Project description
🛡️ agent-eval-mcp
Deterministic Evaluation and Guardrails for AI Coding Agents.
Building autonomous coding agents is easy. Figuring out how to evaluate whether what they've done is actually good is incredibly hard.
agent-eval-mcp is a stateless, deterministic Model Context Protocol (MCP) server that stops AI agents from writing lazy, unverified, or hallucinated code. It provides language-agnostic rulesets and hybrid scoring to grade AI-generated revisions before they get merged.
⚠️ The Problem
When you ask an LLM to evaluate its own code, it suffers from sycophancy. It will confidently tell you its fix is perfect, even when it has:
- Generated dummy patterns like
new HashMap<>()orpass. - Left
// TODO: implement thisin the production patch. - Hallucinated the surrounding
SEARCH/REPLACEcontext, breaking the Git patch.
💡 The Solution
This package exposes objective evaluation tools to your agentic workflows via the Model Context Protocol (MCP). It evaluates AI-generated <<<< SEARCH ==== >>>> REPLACE blocks using fuzzy-matching and language-specific Abstract Syntax Tree (AST) rules (Java, Python, TypeScript) to catch hallucinations deterministically.
It completely decouples the heavy lifting of code validation from your LLM orchestration layer.
🚀 Quickstart
1. Install the Package
Install globally via pip so your MCP clients can execute it:
pip install agent-eval-mcp
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_eval_mcp-0.1.0.tar.gz.
File metadata
- Download URL: agent_eval_mcp-0.1.0.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e612132d5b25c9c12c14be8eea04893628ea5f205ddbc50ef95501ef5d52b303
|
|
| MD5 |
b911980759e1c6dd55c75442041f2713
|
|
| BLAKE2b-256 |
4a74cad2d9e12aa82daa14d4b639026520e90e5b5590b19974f3cb12e8b0069b
|
File details
Details for the file agent_eval_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agent_eval_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb11085d9d0338550b1c8bc20605fdab4f1346e83f29070b77c639fea9fc5046
|
|
| MD5 |
313cacb194b7b74ed848193243812c5f
|
|
| BLAKE2b-256 |
4a33af47254ee54714cdff8b8fe2139d185bc195f851f0ccf405e42244f24bb3
|