Skip to main content

Evaluate MCP server accuracy against known questions and answers

Project description

mcp-data-check

Evaluate MCP server accuracy against known questions and answers.

Installation

pip install mcp-data-check

Or install from source:

pip install -e .

Usage

Python API

from mcp_data_check import run_evaluation

results = run_evaluation(
    questions_filepath="questions.csv",
    api_key="sk-ant-...",
    server_url="https://mcp.example.com/sse"
)

print(f"Pass rate: {results['summary']['pass_rate']:.1%}")
print(f"Passed: {results['summary']['passed']}/{results['summary']['total']}")

Command Line

mcp-data-check https://mcp.example.com/sse -q questions.csv -k YOUR_API_KEY

Options:

  • -q, --questions: Path to questions CSV file (required)
  • -k, --api-key: Anthropic API key (defaults to ANTHROPIC_API_KEY env var)
  • -o, --output: Output directory for results (default: ./results)
  • -m, --model: Claude model to use (default: claude-sonnet-4-20250514)
  • -n, --server-name: Name for the MCP server (default: mcp-server)
  • -v, --verbose: Print detailed progress

Questions CSV Format

The questions CSV file must have three columns:

Column Description
question The question to ask the MCP server
expected_answer The expected answer to compare against
eval_type Evaluation method: numeric, string, or llm_judge

Example:

question,expected_answer,eval_type
How many grants were awarded in 2023?,1234,numeric
What organization received the most funding?,NIH,string
Explain the grant distribution,Most grants went to research institutions...,llm_judge

Evaluation Types

  • numeric: Extracts numbers from responses and compares with 5% tolerance
  • string: Checks if expected string appears in response (case-insensitive)
  • llm_judge: Uses Claude to semantically evaluate if the response is correct

Return Value

The run_evaluation function returns a dictionary:

{
    "summary": {
        "total": 10,
        "passed": 8,
        "failed": 2,
        "pass_rate": 0.8,
        "by_eval_type": {
            "numeric": {"total": 5, "passed": 4},
            "string": {"total": 3, "passed": 3},
            "llm_judge": {"total": 2, "passed": 1}
        }
    },
    "results": [
        {
            "question": "...",
            "expected_answer": "...",
            "eval_type": "numeric",
            "model_response": "...",
            "passed": True,
            "details": {...},
            "error": None
        },
        ...
    ],
    "metadata": {
        "server_url": "https://mcp.example.com/sse",
        "model": "claude-sonnet-4-20250514",
        "timestamp": "20250127_143022"
    }
}

Requirements

  • Python 3.10+
  • Anthropic API key with MCP beta access

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_data_check-0.2.0.tar.gz (68.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_data_check-0.2.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file mcp_data_check-0.2.0.tar.gz.

File metadata

  • Download URL: mcp_data_check-0.2.0.tar.gz
  • Upload date:
  • Size: 68.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_data_check-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6c53ace15534074243d3000569f51be224d2e162118d0c94741f3a1cfacca23d
MD5 1aa52b426eb4d9f06b7fd865e97696df
BLAKE2b-256 94487e5432501e6dfe2eac6f7dc8570a4b0ee8e2ae19c329753a59e548f7a71a

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_check-0.2.0.tar.gz:

Publisher: publish.yml on GSA-TTS/mcp-data-check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_data_check-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_data_check-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_data_check-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c1ae28ca4fe297702ba2d70e84ca9a05b1b781701e06227acd7d0516dd0c81f4
MD5 8d70c65b7d5d7c832751e7b114c68616
BLAKE2b-256 4a63d4e303507bc44e138d84367ef08af61e25d09078d67e443a399b68b3a0a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_check-0.2.0-py3-none-any.whl:

Publisher: publish.yml on GSA-TTS/mcp-data-check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page