Skip to main content

Pytest-style framework for evaluating Model Context Protocol (MCP) servers.

Project description

MCP-Eval: An Evaluation Framework for MCP Servers

MCP-Eval is a developer-first testing framework for Model Context Protocol (MCP) servers, built on the mcp-agent library. It enables you to write clear, concise, and powerful tests to evaluate the performance, reliability, and correctness of your AI agents and the MCP servers they connect to.

Core Features

  • Task-Based Testing: Define tests as async functions where an agent performs a task.
  • Automatic Metrics: Automatically collect detailed metrics on latency, token usage, cost, and tool calls for every test run.
  • Rich Assertions: A powerful set of assertions designed for AI testing, including:
    • contains(): Checks for substrings in responses.
    • tool_was_called(): Verifies that a specific tool was used.
    • tool_arguments_match(): Checks if a tool was called with the correct arguments.
    • cost_under(): Asserts that a test run stays within a defined cost budget.
    • number_of_steps_under(): Ensures an agent completes a task efficiently.
    • objective_succeeded(): Uses an LLM to verify if the agent's response achieved the overall goal.
    • plan_is_efficient(): Uses an LLM to check for redundant or inefficient steps in the agent's execution path.
  • Tool Coverage Reporting: Automatically calculates the percentage of a server's tools that are exercised by your test suite.
  • Automated Test Generation: A CLI tool to automatically generate a baseline test suite for any MCP server.
  • Detailed Reports: Get immediate feedback from rich console reports and generate detailed JSON reports for CI/CD or further analysis.

Getting Started

1. Installation

Install mcp_eval and its dependencies. Make sure mcp-agent is also installed in your environment.

pip install "typer[all]" rich pydantic jinja2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcpevals-0.1.1.tar.gz (298.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcpevals-0.1.1-py3-none-any.whl (292.7 kB view details)

Uploaded Python 3

File details

Details for the file mcpevals-0.1.1.tar.gz.

File metadata

  • Download URL: mcpevals-0.1.1.tar.gz
  • Upload date:
  • Size: 298.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.5

File hashes

Hashes for mcpevals-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3d088471f1a7d68e9f1f74255b5ace4b17a09cd05355978d5c3400769a3c9790
MD5 12828a9f903946b9c207148a756e7fe2
BLAKE2b-256 19a20489f8146565d42d1dd5e87fde9305505e78963cf15ea5458260bcb67156

See more details on using hashes here.

File details

Details for the file mcpevals-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mcpevals-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 292.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.5

File hashes

Hashes for mcpevals-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8d12e8fac6b3553770ab8b1338943987124c8fbe4699311519dd2e8ab9341def
MD5 4858057dc460dcae8dbd935ddf1f3e4c
BLAKE2b-256 70a547e8f7389478dbdead2d9f41552ec55ddbd606d7a09b50595122647ed420

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page