Advanced MCP server for Databricks workspace intelligence — dependency scanning, impact analysis, notebook review, and job/pipeline operations.
Project description
Databricks Advanced MCP Server
An advanced Model Context Protocol (MCP) server that gives AI assistants deep visibility into your Databricks workspace — 43 tools covering dependency scanning, impact analysis, notebook review, job/pipeline operations, SQL execution, catalog management, compute & warehouse control, and Unity Catalog volumes.
Features
| Domain | What it does |
|---|---|
| SQL Execution | Run SQL queries against Databricks SQL warehouses with configurable result limits |
| Table Information | Inspect table metadata, schemas, column details, row counts, and storage info |
| Dependency Scanning | Scan notebooks, jobs, and DLT pipelines to build a workspace dependency graph (DAG) |
| Graph Operations | Build, query, and refresh the workspace dependency graph |
| Impact Analysis | Predict downstream breakage from column drops, schema changes, or pipeline failures |
| Notebook Review | Detect performance anti-patterns, coding standard violations, and suggest optimizations |
| Job & Pipeline Ops | List jobs/pipelines, get run status with error diagnostics, trigger reruns |
| Catalog & Schema | List catalogs, list/describe/create/drop Unity Catalog schemas |
| Compute | List clusters, inspect status, start/stop/restart clusters |
| SQL Warehouses | List warehouses, inspect status, start/stop SQL warehouses |
| Workspace Ops | Create/read/delete notebooks, upload files, get workspace object metadata |
| UC Volumes | List volumes, inspect metadata, browse and read files in Unity Catalog volumes |
Demo
Click to play full video
https://github.com/user-attachments/assets/579282ca-bb26-4244-b0c6-3ad26050aca3
Covers SQL execution, dependency scanning, impact analysis, notebook review, and job/pipeline operations.
Quick Start
Prerequisites
- Python 3.11+
- uv — fast Python package manager
- A Databricks workspace with a SQL warehouse
- A Databricks personal access token
Other auth methods: The Databricks SDK supports unified authentication — if you don't set
DATABRICKS_TOKEN, it will fall back to Azure CLI, managed identity, or.databrickscfg. The.envsetup below uses a PAT for simplicity.Don't have a Databricks workspace yet? See
infra/INSTALL.mdfor a one-command Azure deployment using Bicep.
1. Install
Option A: Install from PyPI (recommended)
uv pip install databricks-advanced-mcp
Or with pip:
pip install databricks-advanced-mcp
Option B: Install from source
git clone https://github.com/henrybravo/databricks-advanced-mcp-server.git
cd databricks-advanced-mcp-server
Create and activate a virtual environment:
Windows (PowerShell)
uv venv .venv
.\.venv\Scripts\Activate.ps1
uv pip install -e .
macOS / Linux
uv venv .venv
source .venv/bin/activate
uv pip install -e .
2. Configure
cp .env.example .env
Edit .env with your Databricks credentials:
# Azure Databricks:
DATABRICKS_HOST=https://adb-xxxx.azuredatabricks.net
# Databricks on AWS / GCP:
# DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com
DATABRICKS_TOKEN=dapi_your_token
DATABRICKS_WAREHOUSE_ID=your_warehouse_id
# Optional (defaults shown)
# Azure workspaces typically use "main"; AWS/GCP workspaces use "workspace"
DATABRICKS_CATALOG=main
DATABRICKS_SCHEMA=default
3. Add to your IDE
Create .vscode/mcp.json in your project to register the MCP server with VS Code / GitHub Copilot.
Option A: PyPI install (recommended)
If you installed from PyPI (pip install databricks-advanced-mcp), the databricks-mcp CLI is available on your PATH:
{
"servers": {
"databricks-mcp": {
"type": "stdio",
"command": "databricks-mcp",
"env": {
"DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
"DATABRICKS_TOKEN": "dapi_your_token",
"DATABRICKS_WAREHOUSE_ID": "your_warehouse_id"
}
}
}
}
Option B: Virtual environment (source install)
If you cloned the repo and installed into a local .venv, point directly to the Python interpreter:
Windows
{
"servers": {
"databricks-mcp": {
"type": "stdio",
"command": "${workspaceFolder}/.venv/Scripts/python.exe",
"args": ["-m", "databricks_advanced_mcp.server"],
"envFile": "${workspaceFolder}/.env"
}
}
}
macOS / Linux
{
"servers": {
"databricks-mcp": {
"type": "stdio",
"command": "${workspaceFolder}/.venv/bin/python",
"args": ["-m", "databricks_advanced_mcp.server"],
"envFile": "${workspaceFolder}/.env"
}
}
}
Multiple Workspaces
Each MCP server instance connects to exactly one Databricks workspace. To work with multiple workspaces simultaneously, register a separate server entry per workspace — each with its own credentials:
{
"servers": {
// AWS / GCP workspace
"databricks-cloud": {
"type": "stdio",
"command": "databricks-mcp",
"env": {
"DATABRICKS_HOST": "https://dbc-xxxx.cloud.databricks.com",
"DATABRICKS_TOKEN": "dapi_cloud_token",
"DATABRICKS_WAREHOUSE_ID": "cloud_warehouse_id",
"DATABRICKS_CATALOG": "workspace"
}
},
// Azure workspace
"databricks-azure": {
"type": "stdio",
"command": "databricks-mcp",
"env": {
"DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
"DATABRICKS_TOKEN": "dapi_azure_token",
"DATABRICKS_WAREHOUSE_ID": "azure_warehouse_id",
"DATABRICKS_CATALOG": "main"
}
}
}
}
Alternatively, with a source install you can use separate .env files per workspace:
{
"servers": {
"databricks-cloud": {
"type": "stdio",
"command": "${workspaceFolder}/.venv/bin/python",
"args": ["-m", "databricks_advanced_mcp.server"],
"envFile": "${workspaceFolder}/.env"
},
"databricks-azure": {
"type": "stdio",
"command": "${workspaceFolder}/.venv/bin/python",
"args": ["-m", "databricks_advanced_mcp.server"],
"envFile": "${workspaceFolder}/.env_azure"
}
}
}
4. Start using
Once configured, your AI assistant can call any of the 43 tools below. Here are example prompts organized by domain:
Explore your data
- "What tables exist in the
analyticsschema?" - "Show me the schema and metadata for
main.sales.orders" - "Run a query that counts and sums orders by status from
main.sales.orders"
Unity Catalog & schemas
- "List all catalogs I have access to"
- "What schemas exist in the analytics catalog?"
- "Describe the
main.defaultschema" - "Create a new schema called
stagingin the analytics catalog"
Understand dependencies
- "Build the full workspace dependency graph"
- "What are the upstream and downstream dependencies of
main.default.customers?" - "Scan the
/Shared/mandated_broker_v2_etl_pipelinenotebook for table references" - "Scan all jobs and show their table dependencies"
Assess impact before making changes
- "What would break if I drop the
customer_idcolumn frommain.default.customers?" - "What's the impact of removing the
amountcolumn and renamingstatustoorder_statusinmain.sales.orders?"
Review notebook quality
- "Review
/Shared/mandated_broker_v2_etl_pipelinefor performance issues" - "Review
/Shared/analysisfor all issues — performance, coding standards, and optimizations"
Monitor jobs and pipelines
- "List all jobs in the workspace"
- "What's the current status of job 12345?"
- "Show me the pipeline status for my DLT pipeline"
- "Trigger a new run of job 67890 with parameter env=prod"
Compute & SQL warehouses
- "Show me the status of cluster abc-123"
- "List all running clusters"
- "Stop the dev SQL warehouse"
- "What warehouses are currently active?"
Workspace & volumes
- "Export the ETL notebook as source"
- "What's the status of the notebook at /Workspace/Users/me/analysis?"
- "What files are in the raw-data volume?"
- "Read the config.json file from the settings volume"
MCP Tools
SQL & Tables (3 tools)
| Tool | Description |
|---|---|
execute_query |
Execute SQL against a Databricks SQL warehouse |
get_table_info |
Get table metadata — columns, row count, properties, storage |
list_tables |
List tables in a catalog.schema |
Dependency Scanning (4 tools)
| Tool | Description |
|---|---|
scan_notebook |
Scan a notebook for table/column references |
scan_jobs |
Scan all jobs for table dependencies |
scan_dlt_pipelines |
Scan all DLT pipelines for source/target tables |
scan_dlt_pipeline |
Scan a single DLT pipeline by ID for source/target tables |
Graph Operations (3 tools)
| Tool | Description |
|---|---|
build_dependency_graph |
Build the full workspace dependency graph |
get_table_dependencies |
Get upstream/downstream dependencies for a table |
refresh_graph |
Invalidate and rebuild the dependency graph cache |
Impact Analysis & Review (2 tools)
| Tool | Description |
|---|---|
analyze_impact |
Analyze impact of column drop / schema change / pipeline failure |
review_notebook |
Review a notebook for issues, anti-patterns, and optimizations |
Job & Pipeline Ops (6 tools)
| Tool | Description |
|---|---|
list_jobs |
List jobs with status and schedule info |
get_job_status |
Get detailed job run status with error diagnostics |
list_pipelines |
List DLT pipelines with state and update status |
get_pipeline_status |
Get pipeline update details with event log |
trigger_rerun |
Trigger a rerun of the latest failed job run (requires confirmation) |
trigger_job_run |
Trigger a brand-new job run with optional parameters (requires confirmation) |
Catalog & Schema (5 tools)
| Tool | Description |
|---|---|
list_catalogs |
List all Unity Catalog catalogs accessible to the current principal |
list_schemas |
List all schemas in a catalog |
describe_schema |
Get schema metadata, owner, comment, and properties |
create_schema |
Create a new schema in a catalog (requires confirmation) |
drop_schema |
Drop a schema — must be empty (requires confirmation) |
Compute (5 tools)
| Tool | Description |
|---|---|
list_clusters |
List all clusters with state, creator, and node type |
get_cluster_status |
Get detailed cluster status, spark version, and config |
start_cluster |
Start a terminated cluster (requires confirmation) |
stop_cluster |
Stop (terminate) a running cluster (requires confirmation) |
restart_cluster |
Restart a running cluster (requires confirmation) |
SQL Warehouses (4 tools)
| Tool | Description |
|---|---|
list_warehouses |
List all SQL warehouses with state, size, and type |
get_warehouse_status |
Get detailed warehouse config, scaling, and auto-stop settings |
start_warehouse |
Start a stopped SQL warehouse (requires confirmation) |
stop_warehouse |
Stop a running SQL warehouse (requires confirmation) |
Workspace Ops (7 tools)
| Tool | Description |
|---|---|
list_workspace_notebooks |
List all notebooks in a workspace path |
create_job |
Create a new Databricks job (requires confirmation) |
create_notebook |
Create a notebook in the workspace (requires confirmation) |
workspace_upload |
Upload a local file to the workspace (requires confirmation) |
read_notebook |
Read/export a notebook's content (SOURCE or HTML) |
delete_workspace_item |
Delete a notebook or folder (requires confirmation) |
get_workspace_status |
Get metadata for a workspace object (type, language, modified) |
UC Volumes (4 tools)
| Tool | Description |
|---|---|
list_volumes |
List Unity Catalog volumes in a catalog.schema |
get_volume_info |
Get volume metadata (type, storage location, owner) |
list_volume_files |
List files and directories inside a volume |
read_volume_file |
Read contents of a file from a volume |
Configuration Reference
| Variable | Required | Default | Description |
|---|---|---|---|
DATABRICKS_HOST |
Yes | — | Workspace URL (https://adb-xxx.azuredatabricks.net for Azure, https://dbc-xxx.cloud.databricks.com for AWS/GCP) |
DATABRICKS_TOKEN |
Yes | — | Personal access token or service principal token |
DATABRICKS_WAREHOUSE_ID |
Yes | — | SQL warehouse ID for query execution |
DATABRICKS_CATALOG |
No | main |
Default catalog for unqualified table names — use workspace for AWS/GCP |
DATABRICKS_SCHEMA |
No | default |
Default schema for unqualified table names |
GRAPH_CACHE_TTL |
No | 3600 |
Dependency graph cache TTL in seconds |
GRAPH_REFRESH_INTERVAL |
No | 0 |
Auto-refresh the graph in the background every N seconds. 0 disables auto-refresh |
Security note: The
execute_querytool can run any SQL against your warehouse, including DDL (DROP TABLE,ALTER TABLE) and DML (DELETE,UPDATE). Use a least-privilege service principal (see Security & Governance below) rather than a personal admin PAT in production environments.
Cloud Provider Notes
This server is tested against Azure Databricks and Databricks on AWS (.cloud.databricks.com). Key differences:
| Aspect | Azure | AWS / GCP |
|---|---|---|
| Host format | https://adb-xxx.azuredatabricks.net |
https://dbc-xxx.cloud.databricks.com |
| Default catalog | main |
workspace |
| Workspace root objects | DIRECTORY |
DIRECTORY and REPO |
All tools work on both platforms. Set DATABRICKS_CATALOG to match your workspace's default catalog.
Security & Governance
Recommended: Service Principal with Least Privilege
Avoid storing a personal admin PAT in .env or VS Code config. Instead, create a dedicated service principal with only the permissions required:
-- Grant read access on the catalog and schema
GRANT USE CATALOG ON CATALOG main TO `sp-databricks-mcp`;
GRANT USE SCHEMA ON SCHEMA main.default TO `sp-databricks-mcp`;
GRANT SELECT ON SCHEMA main.default TO `sp-databricks-mcp`;
-- For job operations (trigger_rerun, trigger_job_run, get_job_status)
-- Grant CAN_MANAGE_RUN on specific jobs only, via the Databricks UI or API
Set DATABRICKS_TOKEN to the service principal's OAuth token or M2M secret. The Databricks SDK supports OAuth M2M authentication natively — no PAT required.
Tool Risk Levels
| Risk | Tools | Notes |
|---|---|---|
| Read-only | execute_query (SELECT only), get_table_info, list_tables, scan_*, list_*, get_*, describe_schema, build_dependency_graph, get_table_dependencies, analyze_impact, review_notebook, read_notebook, read_volume_file |
Safe with read-only grants |
| Mutating | trigger_rerun, trigger_job_run, create_schema, start_cluster, stop_cluster, restart_cluster, start_warehouse, stop_warehouse, create_job, create_notebook, workspace_upload |
Require confirm=True |
| Destructive | drop_schema, delete_workspace_item, execute_query with DDL/DML |
Require confirm=True; scope permissions carefully |
Unity Catalog ACLs
All execute_query and get_table_info calls respect Unity Catalog row/column-level security. If the token's principal lacks SELECT on a table, the operation fails with a permission error — expected and correct behaviour.
Query Guard
To limit execute_query to read-only statements, restrict the SQL warehouse's channel policy or use a dedicated read-only warehouse for this MCP server.
Infrastructure (Optional)
If you need to provision a new Azure Databricks workspace, the infra/ directory contains:
main.bicep— Azure Bicep template (Premium SKU, Unity Catalog enabled)deploy.ps1— One-command PowerShell deployment scriptINSTALL.md— Detailed step-by-step deployment guide
cd infra
./deploy.ps1 -ResourceGroupName rg-databricks-mcp -Location eastus2
Development
# Install with dev dependencies
uv pip install -e ".[dev]"
# Run tests (excluding live integration tests)
uv run pytest tests/ --ignore=tests/test_workspace_ops_live.py -v
# Run tests with coverage report
uv run pytest tests/ --cov=src/databricks_advanced_mcp --cov-report=term-missing --ignore=tests/test_workspace_ops_live.py
# Lint
uv run ruff check src/ tests/
# Type check
uv run mypy src/
Architecture
src/databricks_advanced_mcp/
├── server.py # FastMCP server + CLI entry point
├── config.py # Pydantic settings from env vars
├── client.py # Databricks SDK client factory
├── tools/ # MCP tool implementations (43 tools across 13 modules)
│ ├── __init__.py # Central registration of all tool modules
│ ├── sql_executor.py # SQL execution (1 tool)
│ ├── table_info.py # Table metadata (2 tools)
│ ├── dependency_scanner.py # Scan notebooks/jobs/pipelines (4 tools)
│ ├── graph_ops.py # Build/query/refresh dependency graph (3 tools)
│ ├── impact_analysis.py # Impact analysis (1 tool)
│ ├── notebook_reviewer.py # Notebook review (1 tool)
│ ├── job_pipeline_ops.py # Job & pipeline operations (6 tools)
│ ├── workspace_listing.py # Workspace listing (1 tool)
│ ├── workspace_ops.py # Workspace mutations + read/delete (6 tools)
│ ├── catalog_ops.py # Unity Catalog & schema management (5 tools)
│ ├── compute_ops.py # Cluster management (5 tools)
│ ├── warehouse_ops.py # SQL warehouse management (4 tools)
│ └── volume_ops.py # Unity Catalog volumes (4 tools)
├── parsers/ # Code parsing engines
│ ├── sql_parser.py # sqlglot-based SQL extraction
│ ├── notebook_parser.py # Databricks notebook cell parsing
│ └── dlt_parser.py # DLT pipeline definition parsing
├── graph/ # Dependency graph
│ ├── models.py # Node, Edge, DependencyGraph data models
│ ├── builder.py # Graph builder (orchestrates scans)
│ └── cache.py # In-memory graph cache with TTL
└── reviewers/ # Notebook review rule engines
├── performance.py # Performance anti-patterns
├── standards.py # Coding standards checks
└── suggestions.py # Optimization suggestions
License
Contributions welcome! See CONTRIBUTING.md for setup instructions, PR checklist, and a list of wanted features.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databricks_advanced_mcp-0.0.7.tar.gz.
File metadata
- Download URL: databricks_advanced_mcp-0.0.7.tar.gz
- Upload date:
- Size: 9.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
742a1bfccb6d81bf09c1fe2f1f5d732b80332de37e13b8d9e814380c1c961383
|
|
| MD5 |
40ec2545b90f34d309210296b19474ba
|
|
| BLAKE2b-256 |
af4bafcc9bf8a22e5650c626040d0803017394909bfd58d5c5374e964cd8dbd4
|
Provenance
The following attestation bundles were made for databricks_advanced_mcp-0.0.7.tar.gz:
Publisher:
workflow.yml on henrybravo/databricks-advanced-mcp-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databricks_advanced_mcp-0.0.7.tar.gz -
Subject digest:
742a1bfccb6d81bf09c1fe2f1f5d732b80332de37e13b8d9e814380c1c961383 - Sigstore transparency entry: 1030379480
- Sigstore integration time:
-
Permalink:
henrybravo/databricks-advanced-mcp-server@587b09b0d31c396045d4d0c1942a3c1a37a0c5d4 -
Branch / Tag:
refs/tags/v0.0.7 - Owner: https://github.com/henrybravo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@587b09b0d31c396045d4d0c1942a3c1a37a0c5d4 -
Trigger Event:
release
-
Statement type:
File details
Details for the file databricks_advanced_mcp-0.0.7-py3-none-any.whl.
File metadata
- Download URL: databricks_advanced_mcp-0.0.7-py3-none-any.whl
- Upload date:
- Size: 66.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c37f833b6ad495ebfb12f6f8d6894e3ef71a7fc64c789fce9d7cec56e353438e
|
|
| MD5 |
50783350dafce13cf7569f77236d3b75
|
|
| BLAKE2b-256 |
eb6d63a61df9553de30bdb4844d36fbbe6ba65c9ac39db7b26806bb4a45d4ae2
|
Provenance
The following attestation bundles were made for databricks_advanced_mcp-0.0.7-py3-none-any.whl:
Publisher:
workflow.yml on henrybravo/databricks-advanced-mcp-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databricks_advanced_mcp-0.0.7-py3-none-any.whl -
Subject digest:
c37f833b6ad495ebfb12f6f8d6894e3ef71a7fc64c789fce9d7cec56e353438e - Sigstore transparency entry: 1030379577
- Sigstore integration time:
-
Permalink:
henrybravo/databricks-advanced-mcp-server@587b09b0d31c396045d4d0c1942a3c1a37a0c5d4 -
Branch / Tag:
refs/tags/v0.0.7 - Owner: https://github.com/henrybravo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@587b09b0d31c396045d4d0c1942a3c1a37a0c5d4 -
Trigger Event:
release
-
Statement type: