Advanced MCP server for Databricks workspace intelligence — dependency scanning, impact analysis, notebook review, and job/pipeline operations.
Project description
Databricks Advanced MCP Server
An advanced Model Context Protocol (MCP) server that gives AI assistants deep visibility into your Databricks workspace - dependency scanning, impact analysis, notebook review, job/pipeline operations, SQL execution, and table metadata inspection.
Features
| Domain | What it does |
|---|---|
| SQL Execution | Run SQL queries against Databricks SQL warehouses with configurable result limits |
| Table Information | Inspect table metadata, schemas, column details, row counts, and storage info |
| Dependency Scanning | Scan notebooks, jobs, and DLT pipelines to build a workspace dependency graph (DAG) |
| Impact Analysis | Predict downstream breakage from column drops, schema changes, or pipeline failures |
| Notebook Review | Detect performance anti-patterns, coding standard violations, and suggest optimizations |
| Job & Pipeline Ops | List jobs/pipelines, get run status with error diagnostics, trigger reruns |
Demo
Click to play full video
https://github.com/user-attachments/assets/579282ca-bb26-4244-b0c6-3ad26050aca3
Covers SQL execution, dependency scanning, impact analysis, notebook review, and job/pipeline operations.
Quick Start
Prerequisites
- Python 3.11+
- uv — fast Python package manager
- A Databricks workspace with a SQL warehouse
- A Databricks personal access token
Other auth methods: The Databricks SDK supports unified authentication — if you don't set
DATABRICKS_TOKEN, it will fall back to Azure CLI, managed identity, or.databrickscfg. The.envsetup below uses a PAT for simplicity.Don't have a Databricks workspace yet? See
infra/INSTALL.mdfor a one-command Azure deployment using Bicep.
1. Install
Option A: Install from PyPI (recommended)
uv pip install databricks-advanced-mcp
Or with pip:
pip install databricks-advanced-mcp
Option B: Install from source
git clone https://github.com/henrybravo/databricks-advanced-mcp-server.git
cd databricks-advanced-mcp-server
Create and activate a virtual environment:
Windows (PowerShell)
uv venv .venv
.\.venv\Scripts\Activate.ps1
uv pip install -e .
macOS / Linux
uv venv .venv
source .venv/bin/activate
uv pip install -e .
2. Configure
cp .env.example .env
Edit .env with your Databricks credentials:
# Azure Databricks:
DATABRICKS_HOST=https://adb-xxxx.azuredatabricks.net
# Databricks on AWS / GCP:
# DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com
DATABRICKS_TOKEN=dapi_your_token
DATABRICKS_WAREHOUSE_ID=your_warehouse_id
# Optional (defaults shown)
# Azure workspaces typically use "main"; AWS/GCP workspaces use "workspace"
DATABRICKS_CATALOG=main
DATABRICKS_SCHEMA=default
3. Add to your IDE
Create .vscode/mcp.json in your project to register the MCP server with VS Code / GitHub Copilot.
Option A: PyPI install (recommended)
If you installed from PyPI (pip install databricks-advanced-mcp), the databricks-mcp CLI is available on your PATH:
{
"servers": {
"databricks-mcp": {
"type": "stdio",
"command": "databricks-mcp",
"env": {
"DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
"DATABRICKS_TOKEN": "dapi_your_token",
"DATABRICKS_WAREHOUSE_ID": "your_warehouse_id"
}
}
}
}
Option B: Virtual environment (source install)
If you cloned the repo and installed into a local .venv, point directly to the Python interpreter:
Windows
{
"servers": {
"databricks-mcp": {
"type": "stdio",
"command": "${workspaceFolder}/.venv/Scripts/python.exe",
"args": ["-m", "databricks_advanced_mcp.server"],
"envFile": "${workspaceFolder}/.env"
}
}
}
macOS / Linux
{
"servers": {
"databricks-mcp": {
"type": "stdio",
"command": "${workspaceFolder}/.venv/bin/python",
"args": ["-m", "databricks_advanced_mcp.server"],
"envFile": "${workspaceFolder}/.env"
}
}
}
Multiple Workspaces
Each MCP server instance connects to exactly one Databricks workspace. To work with multiple workspaces simultaneously, register a separate server entry per workspace — each with its own credentials:
{
"servers": {
// AWS / GCP workspace
"databricks-cloud": {
"type": "stdio",
"command": "databricks-mcp",
"env": {
"DATABRICKS_HOST": "https://dbc-xxxx.cloud.databricks.com",
"DATABRICKS_TOKEN": "dapi_cloud_token",
"DATABRICKS_WAREHOUSE_ID": "cloud_warehouse_id",
"DATABRICKS_CATALOG": "workspace"
}
},
// Azure workspace
"databricks-azure": {
"type": "stdio",
"command": "databricks-mcp",
"env": {
"DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
"DATABRICKS_TOKEN": "dapi_azure_token",
"DATABRICKS_WAREHOUSE_ID": "azure_warehouse_id",
"DATABRICKS_CATALOG": "main"
}
}
}
}
Alternatively, with a source install you can use separate .env files per workspace:
{
"servers": {
"databricks-cloud": {
"type": "stdio",
"command": "${workspaceFolder}/.venv/bin/python",
"args": ["-m", "databricks_advanced_mcp.server"],
"envFile": "${workspaceFolder}/.env"
},
"databricks-azure": {
"type": "stdio",
"command": "${workspaceFolder}/.venv/bin/python",
"args": ["-m", "databricks_advanced_mcp.server"],
"envFile": "${workspaceFolder}/.env_azure"
}
}
}
4. Start using
Once configured, your AI assistant can call any of the 18 tools below. Here are example prompts organized by domain:
Explore your data
- "What tables exist in the
analyticsschema?" - "Show me the schema and metadata for
main.sales.orders" - "Run a query that counts and sums orders by status from
main.sales.orders"
Understand dependencies
- "Build the full workspace dependency graph"
- "What are the upstream and downstream dependencies of
main.default.customers?" - "Scan the
/Shared/mandated_broker_v2_etl_pipelinenotebook for table references" - "Scan all jobs and show their table dependencies"
Assess impact before making changes
- "What would break if I drop the
customer_idcolumn frommain.default.customers?" - "What's the impact of removing the
amountcolumn and renamingstatustoorder_statusinmain.sales.orders?"
Review notebook quality
- "Review
/Shared/mandated_broker_v2_etl_pipelinefor performance issues" - "Review
/Shared/analysisfor all issues — performance, coding standards, and optimizations"
Monitor jobs and pipelines
- "List all jobs in the workspace"
- "What's the current status of job 12345?"
- "Show me the pipeline status for my DLT pipeline"
MCP Tools
| Tool | Description |
|---|---|
execute_query |
Execute SQL against a Databricks SQL warehouse |
get_table_info |
Get table metadata — columns, row count, properties, storage |
list_tables |
List tables in a catalog.schema |
scan_notebook |
Scan a notebook for table/column references |
scan_jobs |
Scan all jobs for table dependencies |
scan_dlt_pipelines |
Scan all DLT pipelines for source/target tables |
scan_dlt_pipeline |
Scan a single DLT pipeline by ID for source/target tables |
build_dependency_graph |
Build the full workspace dependency graph |
get_table_dependencies |
Get upstream/downstream dependencies for a table |
refresh_graph |
Invalidate and rebuild the dependency graph cache |
analyze_impact |
Analyze impact of column drop / schema change / pipeline failure |
review_notebook |
Review a notebook for issues, anti-patterns, and optimizations |
list_jobs |
List jobs with status and schedule info |
get_job_status |
Get detailed job run status with error diagnostics |
list_pipelines |
List DLT pipelines with state and update status |
get_pipeline_status |
Get pipeline update details with event log |
trigger_rerun |
Trigger a job rerun (requires confirmation) |
list_workspace_notebooks |
List all notebooks in a workspace path |
Configuration Reference
| Variable | Required | Default | Description |
|---|---|---|---|
DATABRICKS_HOST |
Yes | — | Workspace URL (https://adb-xxx.azuredatabricks.net for Azure, https://dbc-xxx.cloud.databricks.com for AWS/GCP) |
DATABRICKS_TOKEN |
Yes | — | Personal access token or service principal token |
DATABRICKS_WAREHOUSE_ID |
Yes | — | SQL warehouse ID for query execution |
DATABRICKS_CATALOG |
No | main |
Default catalog for unqualified table names — use workspace for AWS/GCP |
DATABRICKS_SCHEMA |
No | default |
Default schema for unqualified table names |
Cloud Provider Notes
This server is tested against Azure Databricks and Databricks on AWS (.cloud.databricks.com). Key differences:
| Aspect | Azure | AWS / GCP |
|---|---|---|
| Host format | https://adb-xxx.azuredatabricks.net |
https://dbc-xxx.cloud.databricks.com |
| Default catalog | main |
workspace |
| Workspace root objects | DIRECTORY |
DIRECTORY and REPO |
All tools work on both platforms. Set DATABRICKS_CATALOG to match your workspace's default catalog.
Infrastructure (Optional)
If you need to provision a new Azure Databricks workspace, the infra/ directory contains:
main.bicep— Azure Bicep template (Premium SKU, Unity Catalog enabled)deploy.ps1— One-command PowerShell deployment scriptINSTALL.md— Detailed step-by-step deployment guide
cd infra
./deploy.ps1 -ResourceGroupName rg-databricks-mcp -Location eastus2
Development
# Install with dev dependencies
uv pip install -e ".[dev]"
# Run tests
uv run pytest
# Lint
uv run ruff check src/ tests/
# Type check
uv run mypy src/
Architecture
src/databricks_advanced_mcp/
├── server.py # FastMCP server + CLI entry point
├── config.py # Pydantic settings from env vars
├── client.py # Databricks SDK client factory
├── tools/ # MCP tool implementations
│ ├── sql_executor.py
│ ├── dependency_scanner.py
│ ├── impact_analysis.py
│ ├── notebook_reviewer.py
│ ├── job_pipeline_ops.py
| ├── table_info.py
| └── workspace_listing.py
├── parsers/ # Code parsing engines
│ ├── sql_parser.py # sqlglot-based SQL extraction
│ ├── notebook_parser.py # Databricks notebook cell parsing
│ └── dlt_parser.py # DLT pipeline definition parsing
├── graph/ # Dependency graph
│ ├── models.py # Node, Edge, DependencyGraph data models
│ ├── builder.py # Graph builder (orchestrates scans)
│ └── cache.py # In-memory graph cache with TTL
└── reviewers/ # Notebook review rule engines
├── performance.py # Performance anti-patterns
├── standards.py # Coding standards checks
└── suggestions.py # Optimization suggestions
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databricks_advanced_mcp-0.0.3.tar.gz.
File metadata
- Download URL: databricks_advanced_mcp-0.0.3.tar.gz
- Upload date:
- Size: 9.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0503fdaebeda17bdce1d82a512bd5647aa62579338b3baa16c37d3ae1f6be661
|
|
| MD5 |
084f608ca0af3d21f958ff8fe820cbc4
|
|
| BLAKE2b-256 |
b3531b4f64fd23fc838537fb3c6de229440639aec3c7b64509f101d82a2da735
|
Provenance
The following attestation bundles were made for databricks_advanced_mcp-0.0.3.tar.gz:
Publisher:
workflow.yml on henrybravo/databricks-advanced-mcp-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databricks_advanced_mcp-0.0.3.tar.gz -
Subject digest:
0503fdaebeda17bdce1d82a512bd5647aa62579338b3baa16c37d3ae1f6be661 - Sigstore transparency entry: 1025381003
- Sigstore integration time:
-
Permalink:
henrybravo/databricks-advanced-mcp-server@66d690b318af0b7003c2c10e23f6bc05f7ae5632 -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/henrybravo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@66d690b318af0b7003c2c10e23f6bc05f7ae5632 -
Trigger Event:
release
-
Statement type:
File details
Details for the file databricks_advanced_mcp-0.0.3-py3-none-any.whl.
File metadata
- Download URL: databricks_advanced_mcp-0.0.3-py3-none-any.whl
- Upload date:
- Size: 51.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc308763d05ad1a21ebe68aa49d3b36e80d8e7fc93181df1377052c064e779e8
|
|
| MD5 |
bbbe5b904dd5e103b5ecf105229f21a5
|
|
| BLAKE2b-256 |
1f684fd226a1f0579bc9328ebdd355baeddde87e526f39f8d2dede69b26397d2
|
Provenance
The following attestation bundles were made for databricks_advanced_mcp-0.0.3-py3-none-any.whl:
Publisher:
workflow.yml on henrybravo/databricks-advanced-mcp-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databricks_advanced_mcp-0.0.3-py3-none-any.whl -
Subject digest:
dc308763d05ad1a21ebe68aa49d3b36e80d8e7fc93181df1377052c064e779e8 - Sigstore transparency entry: 1025381375
- Sigstore integration time:
-
Permalink:
henrybravo/databricks-advanced-mcp-server@66d690b318af0b7003c2c10e23f6bc05f7ae5632 -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/henrybravo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@66d690b318af0b7003c2c10e23f6bc05f7ae5632 -
Trigger Event:
release
-
Statement type: