Minimal MCP server for BigQuery SQL validation and dry-run analysis
Project description
mcp-bigquery
Safe BigQuery exploration through Model Context Protocol
๐ What is this?
mcp-bigquery is an MCP (Model Context Protocol) server that enables AI assistants like Claude to safely interact with Google BigQuery.
๐ฏ Key Features
graph LR
A[AI Assistant] -->|MCP Protocol| B[mcp-bigquery]
B -->|Dry-run Only| C[BigQuery API]
B -.->|โ Never Executes| D[Actual Query Execution]
- ๐ก๏ธ 100% Safe: All operations are dry-run only (never executes queries)
- ๐ฐ Cost Transparency: See costs before running any query
- ๐ Complete Analysis: Analyze SQL structure, dependencies, and performance
- ๐ Schema Explorer: Browse datasets, tables, and columns with ease
โก Why use mcp-bigquery?
| Problem | Solution with mcp-bigquery |
|---|---|
| ๐ธ Accidentally running expensive queries | Check costs before execution |
| ๐ Wasting time on SQL errors | Detect syntax errors before running |
| ๐บ๏ธ Unknown table structures | Easily explore schemas |
| โ ๏ธ AI executing dangerous operations | Everything is read-only and safe |
๐ Quick Start (4 minutes)
Step 1: Install (1 minute)
pip install mcp-bigquery
Step 2: Authenticate with Google Cloud (2 minutes)
# For personal accounts
gcloud auth application-default login
# For service accounts
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
Step 3: Configure Claude Desktop (1 minute)
Open your Claude Desktop config:
- Mac:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Add this configuration:
{
"mcpServers": {
"mcp-bigquery": {
"command": "mcp-bigquery",
"env": {
"BQ_PROJECT": "your-gcp-project-id" // โ Replace with your project ID
}
}
}
}
Step 4: Test It!
Restart Claude Desktop and try these questions:
"What datasets are available in my BigQuery project?"
"Can you estimate the cost of: SELECT * FROM dataset.table"
"Show me the schema for the users table"
๐ ๏ธ Available Tools
๐ SQL Validation & Analysis
| Tool | Purpose | When to Use |
|---|---|---|
| bq_validate_sql | Check SQL syntax | Before running any query |
| bq_dry_run_sql | Get cost estimates & metadata | ๐ฐ To check costs |
| bq_analyze_query_structure | Analyze query complexity | To improve performance |
| bq_extract_dependencies | Extract table dependencies | To understand data lineage |
| bq_validate_query_syntax | Detailed error analysis | To debug SQL errors |
๐ Schema Discovery
| Tool | Purpose | When to Use |
|---|---|---|
| bq_list_datasets | List all datasets | To explore your project |
| bq_list_tables | List tables with partitioning info | To browse a dataset |
| bq_describe_table | Get detailed table schema | To understand columns |
| bq_get_table_info | Complete table metadata | To get statistics |
| bq_query_info_schema | Query INFORMATION_SCHEMA | For advanced metadata queries |
โก Performance Optimization
| Tool | Purpose | When to Use |
|---|---|---|
| bq_analyze_query_performance | Analyze performance | To optimize queries |
๐ก Real-World Examples
Example 1: Check Costs Before Running
# Before running an expensive query...
query = "SELECT * FROM `bigquery-public-data.github_repos.commits`"
# First, check the cost
result = bq_dry_run_sql(sql=query)
print(f"Estimated cost: ${result['usdEstimate']}")
print(f"Data processed: {result['totalBytesProcessed'] / 1e9:.2f} GB")
# Output:
# Estimated cost: $12.50
# Data processed: 2500.00 GB
Example 2: Understand Table Structure
# Check table schema
result = bq_describe_table(
dataset_id="your_dataset",
table_id="users"
)
# Output:
# โโโ user_id (INTEGER, REQUIRED)
# โโโ email (STRING, NULLABLE)
# โโโ created_at (TIMESTAMP, REQUIRED)
# โโโ profile (RECORD, REPEATED)
# โโโ name (STRING)
# โโโ age (INTEGER)
Example 3: Get Optimization Suggestions
# Analyze a slow query
query = """
SELECT *
FROM large_table
WHERE date > '2024-01-01'
"""
result = bq_analyze_query_performance(sql=query)
# Output:
# Performance Score: 45/100 (Needs Improvement)
#
# Optimization Suggestions:
# 1. Avoid SELECT * - specify only needed columns
# 2. Use partition filter on date field
# 3. Consider adding LIMIT clause
Example 4: Track Data Dependencies
# Understand query dependencies
query = """
WITH user_stats AS (
SELECT user_id, COUNT(*) as order_count
FROM orders
GROUP BY user_id
)
SELECT u.name, s.order_count
FROM users u
JOIN user_stats s ON u.id = s.user_id
"""
result = bq_extract_dependencies(sql=query)
# Output:
# Tables: ['orders', 'users']
# Columns: ['user_id', 'name', 'id']
# Dependency Graph:
# orders โ user_stats โ final_result
# users โ final_result
๐จ How It Works
Your Code โ โ Claude/AI Assistant
โ
MCP Protocol
โ
mcp-bigquery
โ
BigQuery API (Dry-run)
โ
BigQuery
(Never executes actual queries)
โ๏ธ Configuration
Environment Variables
export BQ_PROJECT="my-project" # GCP Project ID (required)
export BQ_LOCATION="asia-northeast1" # Region (optional)
export SAFE_PRICE_PER_TIB="5.0" # Price per TiB (default: $5)
export DEBUG="true" # Enable debug logging
Full Claude Desktop Configuration
{
"mcpServers": {
"mcp-bigquery": {
"command": "mcp-bigquery",
"env": {
"BQ_PROJECT": "my-production-project",
"BQ_LOCATION": "asia-northeast1",
"SAFE_PRICE_PER_TIB": "6.0",
"DEBUG": "false"
}
}
}
}
๐ง Troubleshooting
Common Issues & Solutions
โ Authentication Error
Error: Could not automatically determine credentials
Solution:
gcloud auth application-default login
โ Permission Error
Error: User does not have bigquery.tables.get permission
Solution: Grant BigQuery Data Viewer role
gcloud projects add-iam-policy-binding YOUR_PROJECT \
--member="user:your-email@example.com" \
--role="roles/bigquery.dataViewer"
โ Project Not Set
Error: Project ID is required
Solution: Set BQ_PROJECT in your configuration
Debug Mode
If issues persist, enable debug mode:
{
"env": {
"DEBUG": "true",
"BQ_PROJECT": "your-project"
}
}
๐ Learn More
Getting Started
For Developers
๐ฆ Project Status
| Version | Release Date | Key Features |
|---|---|---|
| v0.4.2 | 2025-12-08 | Modular schema explorer, unified client/logging controls |
| v0.4.1 | 2025-01-22 | Better error handling, debug logging |
| v0.4.0 | 2025-01-22 | Added 6 schema discovery tools |
| v0.3.0 | 2025-01-17 | SQL analysis engine |
| v0.2.0 | 2025-01-16 | Basic validation & dry-run |
๐ค Contributing
Pull requests are welcome! See our Contributing Guide.
# Setup development environment
git clone https://github.com/caron14/mcp-bigquery.git
cd mcp-bigquery
pip install -e ".[dev]"
# Run tests
pytest tests/
๐ License
MIT License - see LICENSE for details.
๐ Acknowledgments
- Google BigQuery team for the excellent API
- Anthropic for the MCP protocol
- All contributors and users
Built for safe BigQuery exploration ๐ก๏ธ
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_bigquery-0.4.2.tar.gz.
File metadata
- Download URL: mcp_bigquery-0.4.2.tar.gz
- Upload date:
- Size: 38.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9b393a7d86b9fbbd8e01ab0b3a3eaf59463167968bc78494cf2de6dba07bd85
|
|
| MD5 |
ff59bfbdff1f795a9cf4859ccbf1e508
|
|
| BLAKE2b-256 |
e6e05a30bb30ae5d9db2e10dff93c0de7e3628711bedf43f23c4ac4206dd4f6c
|
File details
Details for the file mcp_bigquery-0.4.2-py3-none-any.whl.
File metadata
- Download URL: mcp_bigquery-0.4.2-py3-none-any.whl
- Upload date:
- Size: 48.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c12259481a4cc284451725949ecf379a0a30c2bff32407c753ed98da88913216
|
|
| MD5 |
2cd3c9cad1a56c3ec0b76176a6e0a075
|
|
| BLAKE2b-256 |
55f8840b5dcf3737c2f2a8036b3b585de469623e3aa0a1bd5d4bfb90c8a57fdf
|