An MCP server for BigQuery exploration with cost guardrails and a built-in NL→SQL eval harness.
Project description
mcp-bigquery-evals
An MCP server for BigQuery exploration with cost guardrails and a built-in NL→SQL eval harness.
What it is
A Model Context Protocol server that lets Claude Desktop, Cursor, and Claude Code explore and query a BigQuery warehouse safely.
- 7 read-only tools for warehouse discovery + querying
- Mandatory dry-run cost cap on every
run_query(default 100 MB scanned, ≈ $0.0005) - Built-in eval harness with result-set-equivalence methodology — every release ships an accuracy number, not a vibe
Two clients ship in the box: RealBigQueryClient (production) and FakeBigQueryClient (in-memory, sqlite-backed; for dev and CI without GCP credentials).
Quickstart (5 minutes)
1. Install
uvx mcp-bigquery-evals --help
(Or pip install mcp-bigquery-evals if you prefer.)
2. Authenticate to GCP
gcloud auth application-default login
3. Add to claude_desktop_config.json
Open Claude Desktop → Settings → Developer → Edit Config, then add:
{
"mcpServers": {
"bigquery": {
"command": "uvx",
"args": ["mcp-bigquery-evals", "serve"],
"env": {
"BIGQUERY_PROJECT": "your-personal-gcp-project-id"
}
}
}
}
Restart Claude Desktop. You should see "bigquery" with 7 tools in the MCP indicator.
4. Try it
Ask Claude:
"Using the bigquery tool, find the top 5 most-viewed Stack Overflow questions tagged 'python'."
Claude will use list_datasets → list_tables → describe_table → run_query to explore and answer. Every run_query is dry-run-cost-capped before execution.
The 7 tools
| Tool | Purpose |
|---|---|
list_datasets |
List all datasets in your GCP project |
list_tables(dataset_id) |
List tables in a dataset |
describe_table(table_id) |
Schema + row count + size |
sample_table(table_id, n=5) |
Up to n sample rows |
search_schema(term) |
Fuzzy-match a term against all column names |
estimate_cost(sql) |
Free dry-run; returns bytes_scanned + estimated USD |
run_query(sql, max_bytes_scanned=100MB) |
Dry-run, refuse if over cap, then execute |
All tools are read-only. There are no write operations in v1 by design. See docs/architecture.md.
Cost guardrails
Every run_query call dry-runs first (free) before execution. If the dry-run estimate exceeds max_bytes_scanned (default 100 MB), the call returns a structured error rather than running:
{
"error": "cost_cap_exceeded",
"would_scan": "1.4 GB",
"cap": "100.0 MB",
"estimated_usd": 0.007,
"hint": "narrow your WHERE clause or pass max_bytes_scanned=1500000000 to override"
}
The agent can read the structured error and self-correct (narrow the WHERE clause, raise the cap explicitly, etc.).
Structured BigQuery errors
When something goes wrong against real BigQuery, the response is a stable code an agent can reason about:
| Code | When |
|---|---|
invalid_sql |
SQL syntax error |
table_not_found |
Referenced table doesn't exist |
permission_denied |
IAM 403 |
unauthenticated |
Credentials missing or expired (run gcloud auth application-default login) |
rate_limited |
Quota or rate limit hit |
query_timeout |
Query exceeded its execution timeout |
unknown |
Catch-all for anything else |
Eval harness
Every release runs a result-set-equivalence eval suite against bigquery-public-data and updates the accuracy badge above. Run locally:
mcp-bigquery-evals evals run --model claude-haiku-4-5
See docs/how_evals_work.md for the methodology, golden pairs format, and how to add your own.
Why this exists
There are a few BigQuery MCP servers floating around. This one is different in three ways:
- Cost guardrails are mandatory and surfaced as structured errors agents can act on. Most don't have them.
- Result-set-equivalence evals ship in the box, with a live accuracy badge in this README. Agent quality is measurable, not assumed.
- Read-only by design — no INSERT/UPDATE/DELETE. The blast radius of an LLM mistake is bounded to scanning bytes, not mutating data.
Development
git clone https://github.com/Umarfarook1/mcp-bigquery-evals
cd mcp-bigquery-evals
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest # unit tests (no GCP needed)
pytest -m bq # real-BQ integration tests (needs GCP)
pytest -m live # end-to-end with real model + real BQ
License
MIT — see LICENSE.
Contributing
Issues and PRs welcome. Especially valuable:
- More golden NL→SQL pairs (hand-verified, against bigquery-public-data)
- Improved prompts (with eval numbers showing the change moves the accuracy badge)
- Bug reports with reproduction steps
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_bigquery_evals-0.1.0.tar.gz.
File metadata
- Download URL: mcp_bigquery_evals-0.1.0.tar.gz
- Upload date:
- Size: 23.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7901848dbd0729efaf53781a7cfeb0a6214baf5a4ecec029f7ade0e23cdf71ff
|
|
| MD5 |
d5e9a8abee55d8debc44fabc7ffc2575
|
|
| BLAKE2b-256 |
5476718d24b66663e702854b6fdbc2226c65953ecae6aa353b05e2e681f91fef
|
File details
Details for the file mcp_bigquery_evals-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mcp_bigquery_evals-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d78a7daa431875ca182fc48b33157f140c53de512d82ebd9773a24c447ee176
|
|
| MD5 |
10462750c71e3808f57c0af37c469ca3
|
|
| BLAKE2b-256 |
54d4c79d735bf4e6ba5990e68ad5671d6098bc5957a5560937da8c7eee252628
|