Skip to main content

Deterministic evaluation tools for AI coding agents, exposed as an MCP server.

Project description

🛡️ agent-eval-mcp

Deterministic Evaluation and Guardrails for AI Coding Agents.

MCP Compatible Python 3.10+ License: MIT

Building autonomous coding agents is easy. Figuring out how to evaluate whether what they've done is actually good is incredibly hard.

agent-eval-mcp is a stateless, deterministic Model Context Protocol (MCP) server that stops AI agents from writing lazy, unverified, or hallucinated code. It provides language-agnostic rulesets and hybrid scoring to grade AI-generated revisions before they get merged.

⚠️ The Problem

When you ask an LLM to evaluate its own code, it suffers from sycophancy. It will confidently tell you its fix is perfect, even when it has:

  • Generated dummy patterns like new HashMap<>() or pass.
  • Left // TODO: implement this in the production patch.
  • Hallucinated the surrounding SEARCH/REPLACE context, breaking the Git patch.

💡 The Solution

This package exposes objective evaluation tools to your agentic workflows via the Model Context Protocol (MCP). It evaluates AI-generated <<<< SEARCH ==== >>>> REPLACE blocks using fuzzy-matching and language-specific Abstract Syntax Tree (AST) rules (Java, Python, TypeScript) to catch hallucinations deterministically.

It completely decouples the heavy lifting of code validation from your LLM orchestration layer.

🚀 Quickstart

1. Install the Package

Install globally via pip so your MCP clients can execute it:

pip install agent-eval-mcp

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_eval_mcp-0.1.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_eval_mcp-0.1.0-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file agent_eval_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: agent_eval_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for agent_eval_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e612132d5b25c9c12c14be8eea04893628ea5f205ddbc50ef95501ef5d52b303
MD5 b911980759e1c6dd55c75442041f2713
BLAKE2b-256 4a74cad2d9e12aa82daa14d4b639026520e90e5b5590b19974f3cb12e8b0069b

See more details on using hashes here.

File details

Details for the file agent_eval_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agent_eval_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for agent_eval_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cb11085d9d0338550b1c8bc20605fdab4f1346e83f29070b77c639fea9fc5046
MD5 313cacb194b7b74ed848193243812c5f
BLAKE2b-256 4a33af47254ee54714cdff8b8fe2139d185bc195f851f0ccf405e42244f24bb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page