Skip to main content

Evaluate MCP server accuracy against known questions and answers

Project description

mcp-data-check

Evaluate MCP server accuracy against known questions and answers.

Installation

pip install mcp-data-check

Or install from source:

pip install -e .

Usage

Python API

from mcp_data_check import run_evaluation

results = run_evaluation(
    questions_filepath="questions.csv",
    api_key="sk-ant-...",
    server_url="https://mcp.example.com/sse"
)

print(f"Pass rate: {results['summary']['pass_rate']:.1%}")
print(f"Passed: {results['summary']['passed']}/{results['summary']['total']}")

Command Line

mcp-data-check https://mcp.example.com/sse -q questions.csv -k YOUR_API_KEY

Options:

  • -q, --questions: Path to questions CSV file (required)
  • -k, --api-key: Anthropic API key (defaults to ANTHROPIC_API_KEY env var)
  • -o, --output: Output directory for results (default: ./results)
  • -m, --model: Claude model to use (default: claude-sonnet-4-20250514)
  • -n, --server-name: Name for the MCP server (default: mcp-server)
  • -v, --verbose: Print detailed progress

Questions CSV Format

The questions CSV file must have three columns:

Column Description
question The question to ask the MCP server
expected_answer The expected answer to compare against
eval_type Evaluation method: numeric, string, or llm_judge

Example:

question,expected_answer,eval_type
How many grants were awarded in 2023?,1234,numeric
What organization received the most funding?,NIH,string
Explain the grant distribution,Most grants went to research institutions...,llm_judge

Evaluation Types

  • numeric: Extracts numbers from responses and compares with 5% tolerance
  • string: Checks if expected string appears in response (case-insensitive)
  • llm_judge: Uses Claude to semantically evaluate if the response is correct

Return Value

The run_evaluation function returns a dictionary:

{
    "summary": {
        "total": 10,
        "passed": 8,
        "failed": 2,
        "pass_rate": 0.8,
        "by_eval_type": {
            "numeric": {"total": 5, "passed": 4},
            "string": {"total": 3, "passed": 3},
            "llm_judge": {"total": 2, "passed": 1}
        }
    },
    "results": [
        {
            "question": "...",
            "expected_answer": "...",
            "eval_type": "numeric",
            "model_response": "...",
            "passed": True,
            "details": {...},
            "error": None,
            "time_to_answer": 2.35,
            "tools_called": [
                {
                    "tool_name": "get_grants",
                    "server_name": "mcp-server",
                    "input": {"year": 2023}
                }
            ]
        },
        ...
    ],
    "metadata": {
        "server_url": "https://mcp.example.com/sse",
        "model": "claude-sonnet-4-20250514",
        "timestamp": "20250127_143022"
    }
}

Result Fields

Each result in the results array contains:

Field Description
question The original question asked
expected_answer The expected answer from the CSV
eval_type Evaluation method used
model_response The model's full response text
passed Whether the evaluation passed
details Additional evaluation details
error Error message if the evaluation failed
time_to_answer Response time in seconds for the MCP server call
tools_called List of MCP tools invoked during the response

The tools_called array contains objects with:

  • tool_name: Name of the MCP tool called
  • server_name: Name of the MCP server that provided the tool
  • input: Parameters passed to the tool

Requirements

  • Python 3.10+
  • Anthropic API key with MCP beta access

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_data_check-0.4.0.tar.gz (69.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_data_check-0.4.0-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file mcp_data_check-0.4.0.tar.gz.

File metadata

  • Download URL: mcp_data_check-0.4.0.tar.gz
  • Upload date:
  • Size: 69.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_data_check-0.4.0.tar.gz
Algorithm Hash digest
SHA256 2ef251ac320e4e5139683a11cd2768b0e59d24c0bb08ef251f949b8c59a1e054
MD5 4573b65884109c9906f551eed46b7aca
BLAKE2b-256 1fb7ca32990e5da6e6dff1079b52b0ea099efa9848dfebcb87a377919c81eb5e

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_check-0.4.0.tar.gz:

Publisher: publish.yml on GSA-TTS/mcp-data-check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_data_check-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_data_check-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_data_check-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2eb7961c8900661c225ff9e3b6d8f703d8c802aa5506ef4e2a37943b2a5dc220
MD5 8aef234a512d4136048cd8a665dcfa6a
BLAKE2b-256 1569a7a4abf5b231e9859995f9cb90b954055fc2af465ed6cac94c0bcf005809

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_check-0.4.0-py3-none-any.whl:

Publisher: publish.yml on GSA-TTS/mcp-data-check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page