Skip to main content

Minimal MCP server for BigQuery SQL validation and dry-run analysis

Project description

mcp-bigquery

MCP BigQuery Logo

Safe BigQuery exploration through Model Context Protocol

MIT License PyPI Version Python Support Downloads

Documentation | Quick Start | Tools | Examples


๐Ÿ“Œ What is this?

mcp-bigquery is an MCP (Model Context Protocol) server that enables AI assistants like Claude to safely interact with Google BigQuery.

๐ŸŽฏ Key Features

graph LR
    A[AI Assistant] -->|MCP Protocol| B[mcp-bigquery]
    B -->|Dry-run Only| C[BigQuery API]
    B -.->|โŒ Never Executes| D[Actual Query Execution]
  • ๐Ÿ›ก๏ธ 100% Safe: All operations are dry-run only (never executes queries)
  • ๐Ÿ’ฐ Cost Transparency: See costs before running any query
  • ๐Ÿ” Complete Analysis: Analyze SQL structure, dependencies, and performance
  • ๐Ÿ“Š Schema Explorer: Browse datasets, tables, and columns with ease

โšก Why use mcp-bigquery?

Problem Solution with mcp-bigquery
๐Ÿ’ธ Accidentally running expensive queries Check costs before execution
๐Ÿ› Wasting time on SQL errors Detect syntax errors before running
๐Ÿ—บ๏ธ Unknown table structures Easily explore schemas
โš ๏ธ AI executing dangerous operations Everything is read-only and safe

๐Ÿš€ Quick Start (4 minutes)

Step 1: Install (1 minute)

pip install mcp-bigquery

Step 2: Authenticate with Google Cloud (2 minutes)

# For personal accounts
gcloud auth application-default login

# For service accounts
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json

Step 3: Configure Claude Desktop (1 minute)

Open your Claude Desktop config:

  • Mac: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Add this configuration:

{
  "mcpServers": {
    "mcp-bigquery": {
      "command": "mcp-bigquery",
      "env": {
        "BQ_PROJECT": "your-gcp-project-id"  // โ† Replace with your project ID
      }
    }
  }
}

Step 4: Test It!

Restart Claude Desktop and try these questions:

"What datasets are available in my BigQuery project?"
"Can you estimate the cost of: SELECT * FROM dataset.table"
"Show me the schema for the users table"

๐Ÿ› ๏ธ Available Tools

๐Ÿ“ SQL Validation & Analysis

Tool Purpose When to Use
bq_validate_sql Check SQL syntax Before running any query
bq_dry_run_sql Get cost estimates & metadata ๐Ÿ’ฐ To check costs
bq_analyze_query_structure Analyze query complexity To improve performance
bq_extract_dependencies Extract table dependencies To understand data lineage
bq_validate_query_syntax Detailed error analysis To debug SQL errors

๐Ÿ” Schema Discovery

Tool Purpose When to Use
bq_list_datasets List all datasets To explore your project
bq_list_tables List tables with partitioning info To browse a dataset
bq_describe_table Get detailed table schema To understand columns
bq_get_table_info Complete table metadata To get statistics
bq_query_info_schema Query INFORMATION_SCHEMA For advanced metadata queries

โšก Performance Optimization

Tool Purpose When to Use
bq_analyze_query_performance Analyze performance To optimize queries

๐Ÿ’ก Real-World Examples

Example 1: Check Costs Before Running

# Before running an expensive query...
query = "SELECT * FROM `bigquery-public-data.github_repos.commits`"

# First, check the cost
result = bq_dry_run_sql(sql=query)
print(f"Estimated cost: ${result['usdEstimate']}")
print(f"Data processed: {result['totalBytesProcessed'] / 1e9:.2f} GB")

# Output:
# Estimated cost: $12.50
# Data processed: 2500.00 GB

Example 2: Understand Table Structure

# Check table schema
result = bq_describe_table(
    dataset_id="your_dataset",
    table_id="users"
)

# Output:
# โ”œโ”€โ”€ user_id (INTEGER, REQUIRED)
# โ”œโ”€โ”€ email (STRING, NULLABLE)
# โ”œโ”€โ”€ created_at (TIMESTAMP, REQUIRED)
# โ””โ”€โ”€ profile (RECORD, REPEATED)
#     โ”œโ”€โ”€ name (STRING)
#     โ””โ”€โ”€ age (INTEGER)

Example 3: Get Optimization Suggestions

# Analyze a slow query
query = """
SELECT * 
FROM large_table 
WHERE date > '2024-01-01'
"""

result = bq_analyze_query_performance(sql=query)

# Output:
# Performance Score: 45/100 (Needs Improvement)
# 
# Optimization Suggestions:
# 1. Avoid SELECT * - specify only needed columns
# 2. Use partition filter on date field
# 3. Consider adding LIMIT clause

Example 4: Track Data Dependencies

# Understand query dependencies
query = """
WITH user_stats AS (
  SELECT user_id, COUNT(*) as order_count
  FROM orders
  GROUP BY user_id
)
SELECT u.name, s.order_count
FROM users u
JOIN user_stats s ON u.id = s.user_id
"""

result = bq_extract_dependencies(sql=query)

# Output:
# Tables: ['orders', 'users']
# Columns: ['user_id', 'name', 'id']
# Dependency Graph:
#   orders โ†’ user_stats โ†’ final_result
#   users โ†’ final_result

๐ŸŽจ How It Works

Your Code โ† โ†’ Claude/AI Assistant
                   โ†“
            MCP Protocol
                   โ†“
            mcp-bigquery
                   โ†“
         BigQuery API (Dry-run)
                   โ†“
             BigQuery
      (Never executes actual queries)

โš™๏ธ Configuration

Environment Variables

export BQ_PROJECT="my-project"        # GCP Project ID (required)
export BQ_LOCATION="asia-northeast1"  # Region (optional)
export SAFE_PRICE_PER_TIB="5.0"      # Price per TiB (default: $5)
export DEBUG="true"                   # Enable debug logging

Full Claude Desktop Configuration

{
  "mcpServers": {
    "mcp-bigquery": {
      "command": "mcp-bigquery",
      "env": {
        "BQ_PROJECT": "my-production-project",
        "BQ_LOCATION": "asia-northeast1",
        "SAFE_PRICE_PER_TIB": "6.0",
        "DEBUG": "false"
      }
    }
  }
}

๐Ÿ”ง Troubleshooting

Common Issues & Solutions

โŒ Authentication Error

Error: Could not automatically determine credentials

Solution:

gcloud auth application-default login

โŒ Permission Error

Error: User does not have bigquery.tables.get permission

Solution: Grant BigQuery Data Viewer role

gcloud projects add-iam-policy-binding YOUR_PROJECT \
  --member="user:your-email@example.com" \
  --role="roles/bigquery.dataViewer"

โŒ Project Not Set

Error: Project ID is required

Solution: Set BQ_PROJECT in your configuration

Debug Mode

If issues persist, enable debug mode:

{
  "env": {
    "DEBUG": "true",
    "BQ_PROJECT": "your-project"
  }
}

๐Ÿ“š Learn More

Getting Started

For Developers

๐Ÿšฆ Project Status

Version Release Date Key Features
v0.4.2 2025-12-08 Modular schema explorer, unified client/logging controls
v0.4.1 2025-01-22 Better error handling, debug logging
v0.4.0 2025-01-22 Added 6 schema discovery tools
v0.3.0 2025-01-17 SQL analysis engine
v0.2.0 2025-01-16 Basic validation & dry-run

๐Ÿค Contributing

Pull requests are welcome! See our Contributing Guide.

# Setup development environment
git clone https://github.com/caron14/mcp-bigquery.git
cd mcp-bigquery
pip install -e ".[dev]"

# Run tests
pytest tests/

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿ™ Acknowledgments

  • Google BigQuery team for the excellent API
  • Anthropic for the MCP protocol
  • All contributors and users

Built for safe BigQuery exploration ๐Ÿ›ก๏ธ

Report Bug ยท Request Feature ยท Discussions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_bigquery-0.4.2.tar.gz (38.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_bigquery-0.4.2-py3-none-any.whl (48.4 kB view details)

Uploaded Python 3

File details

Details for the file mcp_bigquery-0.4.2.tar.gz.

File metadata

  • Download URL: mcp_bigquery-0.4.2.tar.gz
  • Upload date:
  • Size: 38.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for mcp_bigquery-0.4.2.tar.gz
Algorithm Hash digest
SHA256 c9b393a7d86b9fbbd8e01ab0b3a3eaf59463167968bc78494cf2de6dba07bd85
MD5 ff59bfbdff1f795a9cf4859ccbf1e508
BLAKE2b-256 e6e05a30bb30ae5d9db2e10dff93c0de7e3628711bedf43f23c4ac4206dd4f6c

See more details on using hashes here.

File details

Details for the file mcp_bigquery-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: mcp_bigquery-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 48.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for mcp_bigquery-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c12259481a4cc284451725949ecf379a0a30c2bff32407c753ed98da88913216
MD5 2cd3c9cad1a56c3ec0b76176a6e0a075
BLAKE2b-256 55f8840b5dcf3737c2f2a8036b3b585de469623e3aa0a1bd5d4bfb90c8a57fdf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page