Skip to main content

Evaluate MCP server accuracy against known questions and answers

Project description

mcp-data-check

Evaluate MCP server accuracy against known questions and answers.

Installation

pip install mcp-data-check

Or install from source:

pip install -e .

Usage

Python API

from mcp_data_check import run_evaluation

results = run_evaluation(
    questions_filepath="questions.csv",
    api_key="sk-ant-...",
    server_url="https://mcp.example.com/sse"
)

print(f"Pass rate: {results['summary']['pass_rate']:.1%}")
print(f"Passed: {results['summary']['passed']}/{results['summary']['total']}")

Command Line

mcp-data-check https://mcp.example.com/sse -q questions.csv -k YOUR_API_KEY

Options:

  • -q, --questions: Path to questions CSV file (required)
  • -k, --api-key: Anthropic API key (defaults to ANTHROPIC_API_KEY env var)
  • -o, --output: Output directory for results (default: ./results)
  • -m, --model: Claude model to use (default: claude-sonnet-4-20250514)
  • -n, --server-name: Name for the MCP server (default: mcp-server)
  • -v, --verbose: Print detailed progress

Questions CSV Format

The questions CSV file must have three columns:

Column Description
question The question to ask the MCP server
expected_answer The expected answer to compare against
eval_type Evaluation method: numeric, string, or llm_judge

Example:

question,expected_answer,eval_type
How many grants were awarded in 2023?,1234,numeric
What organization received the most funding?,NIH,string
Explain the grant distribution,Most grants went to research institutions...,llm_judge

Evaluation Types

  • numeric: Extracts numbers from responses and compares with 5% tolerance
  • string: Checks if expected string appears in response (case-insensitive)
  • llm_judge: Uses Claude to semantically evaluate if the response is correct

Return Value

The run_evaluation function returns a dictionary:

{
    "summary": {
        "total": 10,
        "passed": 8,
        "failed": 2,
        "pass_rate": 0.8,
        "by_eval_type": {
            "numeric": {"total": 5, "passed": 4},
            "string": {"total": 3, "passed": 3},
            "llm_judge": {"total": 2, "passed": 1}
        }
    },
    "results": [
        {
            "question": "...",
            "expected_answer": "...",
            "eval_type": "numeric",
            "model_response": "...",
            "passed": True,
            "details": {...},
            "error": None
        },
        ...
    ],
    "metadata": {
        "server_url": "https://mcp.example.com/sse",
        "model": "claude-sonnet-4-20250514",
        "timestamp": "20250127_143022"
    }
}

Requirements

  • Python 3.10+
  • Anthropic API key with MCP beta access

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_data_check-0.3.0.tar.gz (68.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_data_check-0.3.0-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file mcp_data_check-0.3.0.tar.gz.

File metadata

  • Download URL: mcp_data_check-0.3.0.tar.gz
  • Upload date:
  • Size: 68.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_data_check-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6785cdc143e4da9512fb8eac3637c8a80fecf7e2c4d10127ed8d5ad518e480d7
MD5 2953151d95326652fd47726eb5b7eaef
BLAKE2b-256 941c5c334c633a7fb36fd72e6a5f6173684ec019c6c037968c4d3881d0d556b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_check-0.3.0.tar.gz:

Publisher: publish.yml on GSA-TTS/mcp-data-check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_data_check-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_data_check-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_data_check-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c63f43855bf151b4bd366da7ca824b5d8c8be3b65da8dab1d84e20746b9053c5
MD5 e150b2cc4beb3917e450bf06dbdebfd6
BLAKE2b-256 08a77239ffc782d30f74179b7685d6ee849ad1172f3465b4cec04277596843db

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_check-0.3.0-py3-none-any.whl:

Publisher: publish.yml on GSA-TTS/mcp-data-check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page