Skip to main content

Evaluate MCP server accuracy against known questions and answers

Project description

mcp-data-check

Evaluate MCP server accuracy against known questions and answers.

Installation

pip install mcp-data-check

Or install from source:

pip install -e .

Usage

Python API

from mcp_data_check import run_evaluation

results = run_evaluation(
    questions_filepath="questions.csv",
    api_key="sk-ant-...",
    server_url="https://mcp.example.com/sse"
)

print(f"Pass rate: {results['summary']['pass_rate']:.1%}")
print(f"Passed: {results['summary']['passed']}/{results['summary']['total']}")

Command Line

mcp-data-check https://mcp.example.com/sse -q questions.csv -k YOUR_API_KEY

Options:

  • -q, --questions: Path to questions CSV file (required)
  • -k, --api-key: Anthropic API key (defaults to ANTHROPIC_API_KEY env var)
  • -o, --output: Output directory for results (default: ./results)
  • -m, --model: Claude model to use (default: claude-sonnet-4-20250514)
  • -n, --server-name: Name for the MCP server (default: mcp-server)
  • -v, --verbose: Print detailed progress

Questions CSV Format

The questions CSV file must have three columns:

Column Description
question The question to ask the MCP server
expected_answer The expected answer to compare against
eval_type Evaluation method: numeric, string, or llm_judge

Example:

question,expected_answer,eval_type
How many grants were awarded in 2023?,1234,numeric
What organization received the most funding?,NIH,string
Explain the grant distribution,Most grants went to research institutions...,llm_judge

Evaluation Types

  • numeric: Extracts numbers from responses and compares with 5% tolerance
  • string: Checks if expected string appears in response (case-insensitive)
  • llm_judge: Uses Claude to semantically evaluate if the response is correct

Return Value

The run_evaluation function returns a dictionary:

{
    "summary": {
        "total": 10,
        "passed": 8,
        "failed": 2,
        "pass_rate": 0.8,
        "by_eval_type": {
            "numeric": {"total": 5, "passed": 4},
            "string": {"total": 3, "passed": 3},
            "llm_judge": {"total": 2, "passed": 1}
        }
    },
    "results": [
        {
            "question": "...",
            "expected_answer": "...",
            "eval_type": "numeric",
            "model_response": "...",
            "passed": True,
            "details": {...},
            "error": None,
            "time_to_answer": 2.35,
            "tools_called": [
                {
                    "tool_name": "get_grants",
                    "server_name": "mcp-server",
                    "input": {"year": 2023}
                }
            ]
        },
        ...
    ],
    "metadata": {
        "server_url": "https://mcp.example.com/sse",
        "model": "claude-sonnet-4-20250514",
        "timestamp": "20250127_143022"
    }
}

Result Fields

Each result in the results array contains:

Field Description
question The original question asked
expected_answer The expected answer from the CSV
eval_type Evaluation method used
model_response The model's full response text
passed Whether the evaluation passed
details Additional evaluation details
error Error message if the evaluation failed
time_to_answer Response time in seconds for the MCP server call
tools_called List of MCP tools invoked during the response

The tools_called array contains objects with:

  • tool_name: Name of the MCP tool called
  • server_name: Name of the MCP server that provided the tool
  • input: Parameters passed to the tool

Requirements

  • Python 3.10+
  • Anthropic API key with MCP beta access

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_data_check-0.4.1.tar.gz (69.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_data_check-0.4.1-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file mcp_data_check-0.4.1.tar.gz.

File metadata

  • Download URL: mcp_data_check-0.4.1.tar.gz
  • Upload date:
  • Size: 69.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_data_check-0.4.1.tar.gz
Algorithm Hash digest
SHA256 0467a1f3f78de8ace49c518027f4a8c6ba2e0c08a61051f162be875bd747ee00
MD5 8f5caf68d41359e5c9887a45623c2ccc
BLAKE2b-256 8476e6dd679d8959d93c0d3adfa7e854b2a813f92e036e5fb0bc3e90835b99a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_check-0.4.1.tar.gz:

Publisher: publish.yml on GSA-TTS/mcp-data-check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_data_check-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: mcp_data_check-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_data_check-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9f07b553afddd3a1c0b95c2a956fc289b4ae331f5ced3dd028a81cd2a47285d5
MD5 b624807aa1b274fe05ff6742192c687d
BLAKE2b-256 e1f93ebb8d657f9ec13ae87edec381506809a7ea42ec14e5e6a395f908fe7286

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_check-0.4.1-py3-none-any.whl:

Publisher: publish.yml on GSA-TTS/mcp-data-check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page