Skip to main content

Advanced MCP server for Databricks workspace intelligence — dependency scanning, impact analysis, notebook review, and job/pipeline operations.

Project description

Databricks Advanced MCP Server

Python 3.11+ PyPI License: MIT MCP

An advanced Model Context Protocol (MCP) server that gives AI assistants deep visibility into your Databricks workspace - dependency scanning, impact analysis, notebook review, job/pipeline operations, SQL execution, and table metadata inspection.

Features

Domain What it does
SQL Execution Run SQL queries against Databricks SQL warehouses with configurable result limits
Table Information Inspect table metadata, schemas, column details, row counts, and storage info
Dependency Scanning Scan notebooks, jobs, and DLT pipelines to build a workspace dependency graph (DAG)
Impact Analysis Predict downstream breakage from column drops, schema changes, or pipeline failures
Notebook Review Detect performance anti-patterns, coding standard violations, and suggest optimizations
Job & Pipeline Ops List jobs/pipelines, get run status with error diagnostics, trigger reruns

Demo

Demo teaser

Click to play full video

https://github.com/user-attachments/assets/579282ca-bb26-4244-b0c6-3ad26050aca3

Covers SQL execution, dependency scanning, impact analysis, notebook review, and job/pipeline operations.

Quick Start

Prerequisites

  • Python 3.11+
  • uv — fast Python package manager
  • A Databricks workspace with a SQL warehouse
  • A Databricks personal access token

Other auth methods: The Databricks SDK supports unified authentication — if you don't set DATABRICKS_TOKEN, it will fall back to Azure CLI, managed identity, or .databrickscfg. The .env setup below uses a PAT for simplicity.

Don't have a Databricks workspace yet? See infra/INSTALL.md for a one-command Azure deployment using Bicep.

1. Install

Option A: Install from PyPI (recommended)

uv pip install databricks-advanced-mcp

Or with pip:

pip install databricks-advanced-mcp

Option B: Install from source

git clone https://github.com/henrybravo/databricks-advanced-mcp-server.git
cd databricks-advanced-mcp-server

Create and activate a virtual environment:

Windows (PowerShell)

uv venv .venv
.\.venv\Scripts\Activate.ps1
uv pip install -e .

macOS / Linux

uv venv .venv
source .venv/bin/activate
uv pip install -e .

2. Configure

cp .env.example .env

Edit .env with your Databricks credentials:

# Azure Databricks:
DATABRICKS_HOST=https://adb-xxxx.azuredatabricks.net
# Databricks on AWS / GCP:
# DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com

DATABRICKS_TOKEN=dapi_your_token
DATABRICKS_WAREHOUSE_ID=your_warehouse_id

# Optional (defaults shown)
# Azure workspaces typically use "main"; AWS/GCP workspaces use "workspace"
DATABRICKS_CATALOG=main
DATABRICKS_SCHEMA=default

3. Add to your IDE

Create .vscode/mcp.json in your project to register the MCP server with VS Code / GitHub Copilot.

Option A: PyPI install (recommended)

If you installed from PyPI (pip install databricks-advanced-mcp), the databricks-mcp CLI is available on your PATH:

{
  "servers": {
    "databricks-mcp": {
      "type": "stdio",
      "command": "databricks-mcp",
      "env": {
        "DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
        "DATABRICKS_TOKEN": "dapi_your_token",
        "DATABRICKS_WAREHOUSE_ID": "your_warehouse_id"
      }
    }
  }
}

Option B: Virtual environment (source install)

If you cloned the repo and installed into a local .venv, point directly to the Python interpreter:

Windows

{
  "servers": {
    "databricks-mcp": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/Scripts/python.exe",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env"
    }
  }
}

macOS / Linux

{
  "servers": {
    "databricks-mcp": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/bin/python",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env"
    }
  }
}

Multiple Workspaces

Each MCP server instance connects to exactly one Databricks workspace. To work with multiple workspaces simultaneously, register a separate server entry per workspace — each with its own credentials:

{
  "servers": {
    // AWS / GCP workspace
    "databricks-cloud": {
      "type": "stdio",
      "command": "databricks-mcp",
      "env": {
        "DATABRICKS_HOST": "https://dbc-xxxx.cloud.databricks.com",
        "DATABRICKS_TOKEN": "dapi_cloud_token",
        "DATABRICKS_WAREHOUSE_ID": "cloud_warehouse_id",
        "DATABRICKS_CATALOG": "workspace"
      }
    },
    // Azure workspace
    "databricks-azure": {
      "type": "stdio",
      "command": "databricks-mcp",
      "env": {
        "DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
        "DATABRICKS_TOKEN": "dapi_azure_token",
        "DATABRICKS_WAREHOUSE_ID": "azure_warehouse_id",
        "DATABRICKS_CATALOG": "main"
      }
    }
  }
}

Alternatively, with a source install you can use separate .env files per workspace:

{
  "servers": {
    "databricks-cloud": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/bin/python",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env"
    },
    "databricks-azure": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/bin/python",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env_azure"
    }
  }
}

4. Start using

Once configured, your AI assistant can call any of the 18 tools below. Here are example prompts organized by domain:

Explore your data

  • "What tables exist in the analytics schema?"
  • "Show me the schema and metadata for main.sales.orders"
  • "Run a query that counts and sums orders by status from main.sales.orders"

Understand dependencies

  • "Build the full workspace dependency graph"
  • "What are the upstream and downstream dependencies of main.default.customers?"
  • "Scan the /Shared/mandated_broker_v2_etl_pipeline notebook for table references"
  • "Scan all jobs and show their table dependencies"

Assess impact before making changes

  • "What would break if I drop the customer_id column from main.default.customers?"
  • "What's the impact of removing the amount column and renaming status to order_status in main.sales.orders?"

Review notebook quality

  • "Review /Shared/mandated_broker_v2_etl_pipeline for performance issues"
  • "Review /Shared/analysis for all issues — performance, coding standards, and optimizations"

Monitor jobs and pipelines

  • "List all jobs in the workspace"
  • "What's the current status of job 12345?"
  • "Show me the pipeline status for my DLT pipeline"

MCP Tools

Tool Description
execute_query Execute SQL against a Databricks SQL warehouse
get_table_info Get table metadata — columns, row count, properties, storage
list_tables List tables in a catalog.schema
scan_notebook Scan a notebook for table/column references
scan_jobs Scan all jobs for table dependencies
scan_dlt_pipelines Scan all DLT pipelines for source/target tables
scan_dlt_pipeline Scan a single DLT pipeline by ID for source/target tables
build_dependency_graph Build the full workspace dependency graph
get_table_dependencies Get upstream/downstream dependencies for a table
refresh_graph Invalidate and rebuild the dependency graph cache
analyze_impact Analyze impact of column drop / schema change / pipeline failure
review_notebook Review a notebook for issues, anti-patterns, and optimizations
list_jobs List jobs with status and schedule info
get_job_status Get detailed job run status with error diagnostics
list_pipelines List DLT pipelines with state and update status
get_pipeline_status Get pipeline update details with event log
trigger_rerun Trigger a job rerun (requires confirmation)
list_workspace_notebooks List all notebooks in a workspace path

Configuration Reference

Variable Required Default Description
DATABRICKS_HOST Yes Workspace URL (https://adb-xxx.azuredatabricks.net for Azure, https://dbc-xxx.cloud.databricks.com for AWS/GCP)
DATABRICKS_TOKEN Yes Personal access token or service principal token
DATABRICKS_WAREHOUSE_ID Yes SQL warehouse ID for query execution
DATABRICKS_CATALOG No main Default catalog for unqualified table names — use workspace for AWS/GCP
DATABRICKS_SCHEMA No default Default schema for unqualified table names

Cloud Provider Notes

This server is tested against Azure Databricks and Databricks on AWS (.cloud.databricks.com). Key differences:

Aspect Azure AWS / GCP
Host format https://adb-xxx.azuredatabricks.net https://dbc-xxx.cloud.databricks.com
Default catalog main workspace
Workspace root objects DIRECTORY DIRECTORY and REPO

All tools work on both platforms. Set DATABRICKS_CATALOG to match your workspace's default catalog.

Infrastructure (Optional)

If you need to provision a new Azure Databricks workspace, the infra/ directory contains:

  • main.bicep — Azure Bicep template (Premium SKU, Unity Catalog enabled)
  • deploy.ps1 — One-command PowerShell deployment script
  • INSTALL.md — Detailed step-by-step deployment guide
cd infra
./deploy.ps1 -ResourceGroupName rg-databricks-mcp -Location eastus2

Development

# Install with dev dependencies
uv pip install -e ".[dev]"

# Run tests
uv run pytest

# Lint
uv run ruff check src/ tests/

# Type check
uv run mypy src/

Architecture

src/databricks_advanced_mcp/
├── server.py          # FastMCP server + CLI entry point
├── config.py          # Pydantic settings from env vars
├── client.py          # Databricks SDK client factory
├── tools/             # MCP tool implementations
│   ├── sql_executor.py
│   ├── dependency_scanner.py
│   ├── impact_analysis.py
│   ├── notebook_reviewer.py
│   ├── job_pipeline_ops.py
|   ├── table_info.py
|   └── workspace_listing.py
├── parsers/           # Code parsing engines
│   ├── sql_parser.py       # sqlglot-based SQL extraction
│   ├── notebook_parser.py  # Databricks notebook cell parsing
│   └── dlt_parser.py       # DLT pipeline definition parsing
├── graph/             # Dependency graph
│   ├── models.py      # Node, Edge, DependencyGraph data models
│   ├── builder.py     # Graph builder (orchestrates scans)
│   └── cache.py       # In-memory graph cache with TTL
└── reviewers/         # Notebook review rule engines
    ├── performance.py # Performance anti-patterns
    ├── standards.py   # Coding standards checks
    └── suggestions.py # Optimization suggestions

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_advanced_mcp-0.0.3.tar.gz (9.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databricks_advanced_mcp-0.0.3-py3-none-any.whl (51.6 kB view details)

Uploaded Python 3

File details

Details for the file databricks_advanced_mcp-0.0.3.tar.gz.

File metadata

  • Download URL: databricks_advanced_mcp-0.0.3.tar.gz
  • Upload date:
  • Size: 9.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_advanced_mcp-0.0.3.tar.gz
Algorithm Hash digest
SHA256 0503fdaebeda17bdce1d82a512bd5647aa62579338b3baa16c37d3ae1f6be661
MD5 084f608ca0af3d21f958ff8fe820cbc4
BLAKE2b-256 b3531b4f64fd23fc838537fb3c6de229440639aec3c7b64509f101d82a2da735

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_advanced_mcp-0.0.3.tar.gz:

Publisher: workflow.yml on henrybravo/databricks-advanced-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file databricks_advanced_mcp-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_advanced_mcp-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 dc308763d05ad1a21ebe68aa49d3b36e80d8e7fc93181df1377052c064e779e8
MD5 bbbe5b904dd5e103b5ecf105229f21a5
BLAKE2b-256 1f684fd226a1f0579bc9328ebdd355baeddde87e526f39f8d2dede69b26397d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_advanced_mcp-0.0.3-py3-none-any.whl:

Publisher: workflow.yml on henrybravo/databricks-advanced-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page