Skip to main content

An MCP server for BigQuery exploration with cost guardrails and a built-in NL→SQL eval harness.

Project description

mcp-bigquery-evals

An MCP server for BigQuery exploration with cost guardrails and a built-in NL→SQL eval harness.

PyPI accuracy CI

What it is

A Model Context Protocol server that lets Claude Desktop, Cursor, and Claude Code explore and query a BigQuery warehouse safely.

  • 7 read-only tools for warehouse discovery + querying
  • Mandatory dry-run cost cap on every run_query (default 100 MB scanned, ≈ $0.0005)
  • Built-in eval harness with result-set-equivalence methodology — every release ships an accuracy number, not a vibe

Two clients ship in the box: RealBigQueryClient (production) and FakeBigQueryClient (in-memory, sqlite-backed; for dev and CI without GCP credentials).

Quickstart (5 minutes)

1. Install

uvx mcp-bigquery-evals --help

(Or pip install mcp-bigquery-evals if you prefer.)

2. Authenticate to GCP

gcloud auth application-default login

3. Add to claude_desktop_config.json

Open Claude Desktop → Settings → Developer → Edit Config, then add:

{
  "mcpServers": {
    "bigquery": {
      "command": "uvx",
      "args": ["mcp-bigquery-evals", "serve"],
      "env": {
        "BIGQUERY_PROJECT": "your-personal-gcp-project-id"
      }
    }
  }
}

Restart Claude Desktop. You should see "bigquery" with 7 tools in the MCP indicator.

4. Try it

Ask Claude:

"Using the bigquery tool, find the top 5 most-viewed Stack Overflow questions tagged 'python'."

Claude will use list_datasetslist_tablesdescribe_tablerun_query to explore and answer. Every run_query is dry-run-cost-capped before execution.

The 7 tools

Tool Purpose
list_datasets List all datasets in your GCP project
list_tables(dataset_id) List tables in a dataset
describe_table(table_id) Schema + row count + size
sample_table(table_id, n=5) Up to n sample rows
search_schema(term) Fuzzy-match a term against all column names
estimate_cost(sql) Free dry-run; returns bytes_scanned + estimated USD
run_query(sql, max_bytes_scanned=100MB) Dry-run, refuse if over cap, then execute

All tools are read-only. There are no write operations in v1 by design. See docs/architecture.md.

Cost guardrails

Every run_query call dry-runs first (free) before execution. If the dry-run estimate exceeds max_bytes_scanned (default 100 MB), the call returns a structured error rather than running:

{
  "error": "cost_cap_exceeded",
  "would_scan": "1.4 GB",
  "cap": "100.0 MB",
  "estimated_usd": 0.007,
  "hint": "narrow your WHERE clause or pass max_bytes_scanned=1500000000 to override"
}

The agent can read the structured error and self-correct (narrow the WHERE clause, raise the cap explicitly, etc.).

Structured BigQuery errors

When something goes wrong against real BigQuery, the response is a stable code an agent can reason about:

Code When
invalid_sql SQL syntax error
table_not_found Referenced table doesn't exist
permission_denied IAM 403
unauthenticated Credentials missing or expired (run gcloud auth application-default login)
rate_limited Quota or rate limit hit
query_timeout Query exceeded its execution timeout
unknown Catch-all for anything else

Eval harness

Every release runs a result-set-equivalence eval suite against bigquery-public-data and updates the accuracy badge above. Run locally:

mcp-bigquery-evals evals run --model claude-haiku-4-5

See docs/how_evals_work.md for the methodology, golden pairs format, and how to add your own.

Why this exists

There are a few BigQuery MCP servers floating around. This one is different in three ways:

  1. Cost guardrails are mandatory and surfaced as structured errors agents can act on. Most don't have them.
  2. Result-set-equivalence evals ship in the box, with a live accuracy badge in this README. Agent quality is measurable, not assumed.
  3. Read-only by design — no INSERT/UPDATE/DELETE. The blast radius of an LLM mistake is bounded to scanning bytes, not mutating data.

Development

git clone https://github.com/Umarfarook1/mcp-bigquery-evals
cd mcp-bigquery-evals
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest                    # unit tests (no GCP needed)
pytest -m bq              # real-BQ integration tests (needs GCP)
pytest -m live            # end-to-end with real model + real BQ

License

MIT — see LICENSE.

Contributing

Issues and PRs welcome. Especially valuable:

  • More golden NL→SQL pairs (hand-verified, against bigquery-public-data)
  • Improved prompts (with eval numbers showing the change moves the accuracy badge)
  • Bug reports with reproduction steps

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_bigquery_evals-0.1.0.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_bigquery_evals-0.1.0-py3-none-any.whl (28.2 kB view details)

Uploaded Python 3

File details

Details for the file mcp_bigquery_evals-0.1.0.tar.gz.

File metadata

  • Download URL: mcp_bigquery_evals-0.1.0.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mcp_bigquery_evals-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7901848dbd0729efaf53781a7cfeb0a6214baf5a4ecec029f7ade0e23cdf71ff
MD5 d5e9a8abee55d8debc44fabc7ffc2575
BLAKE2b-256 5476718d24b66663e702854b6fdbc2226c65953ecae6aa353b05e2e681f91fef

See more details on using hashes here.

File details

Details for the file mcp_bigquery_evals-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_bigquery_evals-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d78a7daa431875ca182fc48b33157f140c53de512d82ebd9773a24c447ee176
MD5 10462750c71e3808f57c0af37c469ca3
BLAKE2b-256 54d4c79d735bf4e6ba5990e68ad5671d6098bc5957a5560937da8c7eee252628

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page