Skip to main content

Evaluate MCP server accuracy against known questions and answers

Project description

mcp-data-check

Evaluate MCP server accuracy against known questions and answers.

Installation

pip install mcp-data-check

Or install from source:

pip install -e .

Usage

Python API

from mcp_data_check import run_evaluation

results = run_evaluation(
    questions_filepath="questions.csv",
    api_key="sk-ant-...",
    server_url="https://mcp.example.com/sse"
)

print(f"Pass rate: {results['summary']['pass_rate']:.1%}")
print(f"Passed: {results['summary']['passed']}/{results['summary']['total']}")

Command Line

mcp-data-check https://mcp.example.com/sse -q questions.csv -k YOUR_API_KEY

Options:

  • -q, --questions: Path to questions CSV file (required)
  • -k, --api-key: Anthropic API key (defaults to ANTHROPIC_API_KEY env var)
  • -o, --output: Output directory for results (default: ./results)
  • -m, --model: Claude model to use (default: claude-sonnet-4-20250514)
  • -n, --server-name: Name for the MCP server (default: mcp-server)
  • -v, --verbose: Print detailed progress

Questions CSV Format

The questions CSV file must have three columns:

Column Description
question The question to ask the MCP server
expected_answer The expected answer to compare against
eval_type Evaluation method: numeric, string, or llm_judge

Example:

question,expected_answer,eval_type
How many grants were awarded in 2023?,1234,numeric
What organization received the most funding?,NIH,string
Explain the grant distribution,Most grants went to research institutions...,llm_judge

Evaluation Types

  • numeric: Extracts numbers from responses and compares with 5% tolerance
  • string: Checks if expected string appears in response (case-insensitive)
  • llm_judge: Uses Claude to semantically evaluate if the response is correct

Return Value

The run_evaluation function returns a dictionary:

{
    "summary": {
        "total": 10,
        "passed": 8,
        "failed": 2,
        "pass_rate": 0.8,
        "by_eval_type": {
            "numeric": {"total": 5, "passed": 4},
            "string": {"total": 3, "passed": 3},
            "llm_judge": {"total": 2, "passed": 1}
        }
    },
    "results": [
        {
            "question": "...",
            "expected_answer": "...",
            "eval_type": "numeric",
            "model_response": "...",
            "passed": True,
            "details": {...},
            "error": None
        },
        ...
    ],
    "metadata": {
        "server_url": "https://mcp.example.com/sse",
        "model": "claude-sonnet-4-20250514",
        "timestamp": "20250127_143022"
    }
}

Requirements

  • Python 3.10+
  • Anthropic API key with MCP beta access

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_data_check-0.1.0.tar.gz (67.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_data_check-0.1.0-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file mcp_data_check-0.1.0.tar.gz.

File metadata

  • Download URL: mcp_data_check-0.1.0.tar.gz
  • Upload date:
  • Size: 67.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_data_check-0.1.0.tar.gz
Algorithm Hash digest
SHA256 40ac5e3faa82ae852dd87229bb01f17ffbce88fa6ac4debef7da1f69b35a05f4
MD5 4d23aae91f650eef072b0c8776073af1
BLAKE2b-256 7ad8fa40efbe1f128da7ba34f4fb4a8d03f95cb04b2d731be653b0e48e2aa420

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_check-0.1.0.tar.gz:

Publisher: publish.yml on GSA-TTS/mcp-data-check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_data_check-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_data_check-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_data_check-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e23457893423e2ab3e0b319414e92529c3626f8c6515ad7e5ce9d6c1472ee133
MD5 53079142b90e78a337f10003c1bd96fc
BLAKE2b-256 5319dca88c0b7ac73881c313f8f8c47af591543076002ec7892c154c7f3aa053

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_check-0.1.0-py3-none-any.whl:

Publisher: publish.yml on GSA-TTS/mcp-data-check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page