Evaluate MCP server accuracy against known questions and answers
Project description
mcp-data-check
Evaluate MCP server accuracy against known questions and answers.
Installation
pip install mcp-data-check
Or install from source:
pip install -e .
Usage
Python API
Anthropic (default)
from mcp_data_check import run_evaluation
results = run_evaluation(
questions_filepath="questions.csv",
api_key="sk-ant-...",
server_url="https://mcp.example.com/sse"
)
print(f"Pass rate: {results['summary']['pass_rate']:.1%}")
print(f"Passed: {results['summary']['passed']}/{results['summary']['total']}")
OpenAI
from mcp_data_check import run_evaluation
results = run_evaluation(
questions_filepath="questions.csv",
api_key="sk-...",
server_url="https://mcp.example.com/sse",
provider="openai",
model="gpt-4o"
)
Command Line
Anthropic (default)
mcp-data-check https://mcp.example.com/sse -q questions.csv -k YOUR_API_KEY
OpenAI
mcp-data-check https://mcp.example.com/sse -q questions.csv -p openai -m gpt-4o -k YOUR_API_KEY
Options:
-q, --questions: Path to questions CSV file (required)-p, --provider: LLM provider to use:anthropic(default) oropenai-k, --api-key: API key for the chosen provider (defaults toANTHROPIC_API_KEYorOPENAI_API_KEYenv var)-o, --output: Output directory for results (default:./results)-m, --model: Model to use for evaluation (default:claude-sonnet-4-20250514; use e.g.gpt-4ofor OpenAI)-n, --server-name: Name for the MCP server (default:mcp-server)-v, --verbose: Print detailed progress
Questions CSV Format
The questions CSV file must have three columns:
| Column | Description |
|---|---|
question |
The question to ask the MCP server |
expected_answer |
The expected answer to compare against |
eval_type |
Evaluation method: numeric, string, or llm_judge |
Example:
question,expected_answer,eval_type
How many grants were awarded in 2023?,1234,numeric
What organization received the most funding?,NIH,string
Explain the grant distribution,Most grants went to research institutions...,llm_judge
Evaluation Types
- numeric: Extracts numbers from responses and compares with 5% tolerance
- string: Checks if expected string appears in response (case-insensitive)
- llm_judge: Uses the selected model to semantically evaluate if the response is correct
Return Value
The run_evaluation function returns a dictionary:
{
"summary": {
"total": 10,
"passed": 8,
"failed": 2,
"pass_rate": 0.8,
"by_eval_type": {
"numeric": {"total": 5, "passed": 4},
"string": {"total": 3, "passed": 3},
"llm_judge": {"total": 2, "passed": 1}
}
},
"results": [
{
"question": "...",
"expected_answer": "...",
"eval_type": "numeric",
"model_response": "...",
"passed": True,
"details": {...},
"error": None,
"time_to_answer": 2.35,
"tools_called": [
{
"tool_name": "get_grants",
"server_name": "mcp-server",
"input": {"year": 2023}
}
]
},
...
],
"metadata": {
"server_url": "https://mcp.example.com/sse",
"model": "claude-sonnet-4-20250514",
"provider": "anthropic",
"timestamp": "20250127_143022"
}
}
Result Fields
Each result in the results array contains:
| Field | Description |
|---|---|
question |
The original question asked |
expected_answer |
The expected answer from the CSV |
eval_type |
Evaluation method used |
model_response |
The model's full response text |
passed |
Whether the evaluation passed |
details |
Additional evaluation details |
error |
Error message if the evaluation failed |
time_to_answer |
Response time in seconds for the MCP server call |
tools_called |
List of MCP tools invoked during the response |
The tools_called array contains objects with:
tool_name: Name of the MCP tool calledserver_name: Name of the MCP server that provided the toolinput: Parameters passed to the tool
Requirements
- Python 3.10+
- API key for your chosen provider (Anthropic or OpenAI)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_data_check-0.5.0.tar.gz.
File metadata
- Download URL: mcp_data_check-0.5.0.tar.gz
- Upload date:
- Size: 70.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e65ee2c5df2a67a72bccfe1f98e673863d23463122e939e5e0235b740889475
|
|
| MD5 |
78643592ae7ab82686a8ce914d0944a1
|
|
| BLAKE2b-256 |
a4568702dd052e8dc43de84cc529cfadf9bc306d062ee8a391e891588211629f
|
Provenance
The following attestation bundles were made for mcp_data_check-0.5.0.tar.gz:
Publisher:
publish.yml on GSA-TTS/mcp-data-check
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_data_check-0.5.0.tar.gz -
Subject digest:
5e65ee2c5df2a67a72bccfe1f98e673863d23463122e939e5e0235b740889475 - Sigstore transparency entry: 1403630126
- Sigstore integration time:
-
Permalink:
GSA-TTS/mcp-data-check@e481287c6854da24590d0cde376e720a59e3a5ed -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/GSA-TTS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e481287c6854da24590d0cde376e720a59e3a5ed -
Trigger Event:
push
-
Statement type:
File details
Details for the file mcp_data_check-0.5.0-py3-none-any.whl.
File metadata
- Download URL: mcp_data_check-0.5.0-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0f550505a67d40318b30e3640aca7e4459407bfcfe54798fbefd03acdd4111c
|
|
| MD5 |
2e31edff1a99ec1518a21b5bd816faf1
|
|
| BLAKE2b-256 |
dcc416727c2f57e4d3722657f8d7a63069f395e5d2a1d04b800a79a01ecd13ff
|
Provenance
The following attestation bundles were made for mcp_data_check-0.5.0-py3-none-any.whl:
Publisher:
publish.yml on GSA-TTS/mcp-data-check
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_data_check-0.5.0-py3-none-any.whl -
Subject digest:
d0f550505a67d40318b30e3640aca7e4459407bfcfe54798fbefd03acdd4111c - Sigstore transparency entry: 1403630224
- Sigstore integration time:
-
Permalink:
GSA-TTS/mcp-data-check@e481287c6854da24590d0cde376e720a59e3a5ed -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/GSA-TTS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e481287c6854da24590d0cde376e720a59e3a5ed -
Trigger Event:
push
-
Statement type: