Advanced MCP server for Databricks workspace intelligence — dependency scanning, impact analysis, notebook review, and job/pipeline operations.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

henrybravo

These details have not been verified by PyPI

Project description

Databricks Advanced MCP Server

An advanced Model Context Protocol (MCP) server that gives AI assistants deep visibility into your Databricks workspace — 43 tools covering dependency scanning, impact analysis, notebook review, job/pipeline operations, SQL execution, catalog management, compute & warehouse control, and Unity Catalog volumes.

Features

Domain	What it does
SQL Execution	Run SQL queries against Databricks SQL warehouses with configurable result limits
Table Information	Inspect table metadata, schemas, column details, row counts, and storage info
Dependency Scanning	Scan notebooks, jobs, and DLT pipelines to build a workspace dependency graph (DAG)
Graph Operations	Build, query, and refresh the workspace dependency graph
Impact Analysis	Predict downstream breakage from column drops, schema changes, or pipeline failures
Notebook Review	Detect performance anti-patterns, coding standard violations, and suggest optimizations
Job & Pipeline Ops	List jobs/pipelines, get run status with error diagnostics, trigger reruns
Catalog & Schema	List catalogs, list/describe/create/drop Unity Catalog schemas
Compute	List clusters, inspect status, start/stop/restart clusters
SQL Warehouses	List warehouses, inspect status, start/stop SQL warehouses
Workspace Ops	Create/read/delete notebooks, upload files, get workspace object metadata
UC Volumes	List volumes, inspect metadata, browse and read files in Unity Catalog volumes

Demo

Demo teaser

Click to play full video

https://github.com/user-attachments/assets/579282ca-bb26-4244-b0c6-3ad26050aca3

Covers SQL execution, dependency scanning, impact analysis, notebook review, and job/pipeline operations.

Quick Start

Prerequisites

Python 3.11+
uv — fast Python package manager
A Databricks workspace with a SQL warehouse
A Databricks personal access token

Other auth methods: The Databricks SDK supports unified authentication — if you don't set DATABRICKS_TOKEN, it will fall back to Azure CLI, managed identity, or .databrickscfg. The .env setup below uses a PAT for simplicity.

Don't have a Databricks workspace yet? See infra/INSTALL.md for a one-command Azure deployment using Bicep.

1. Install

Option A: Install from PyPI (recommended)

uv pip install databricks-advanced-mcp

Or with pip:

pip install databricks-advanced-mcp

Option B: Install from source

git clone https://github.com/henrybravo/databricks-advanced-mcp-server.git
cd databricks-advanced-mcp-server

Create and activate a virtual environment:

Windows (PowerShell)

uv venv .venv
.\.venv\Scripts\Activate.ps1
uv pip install -e .

macOS / Linux

uv venv .venv
source .venv/bin/activate
uv pip install -e .

2. Configure

cp .env.example .env

Edit .env with your Databricks credentials:

# Azure Databricks:
DATABRICKS_HOST=https://adb-xxxx.azuredatabricks.net
# Databricks on AWS / GCP:
# DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com

DATABRICKS_TOKEN=dapi_your_token
DATABRICKS_WAREHOUSE_ID=your_warehouse_id

# Optional (defaults shown)
# Azure workspaces typically use "main"; AWS/GCP workspaces use "workspace"
DATABRICKS_CATALOG=main
DATABRICKS_SCHEMA=default

3. Add to your IDE

Create .vscode/mcp.json in your project to register the MCP server with VS Code / GitHub Copilot.

Option A: PyPI install (recommended)

If you installed from PyPI (pip install databricks-advanced-mcp), the databricks-mcp CLI is available on your PATH:

{
  "servers": {
    "databricks-mcp": {
      "type": "stdio",
      "command": "databricks-mcp",
      "env": {
        "DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
        "DATABRICKS_TOKEN": "dapi_your_token",
        "DATABRICKS_WAREHOUSE_ID": "your_warehouse_id"
      }
    }
  }
}

Option B: Virtual environment (source install)

If you cloned the repo and installed into a local .venv, point directly to the Python interpreter:

Windows

{
  "servers": {
    "databricks-mcp": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/Scripts/python.exe",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env"
    }
  }
}

macOS / Linux

{
  "servers": {
    "databricks-mcp": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/bin/python",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env"
    }
  }
}

Multiple Workspaces

Each MCP server instance connects to exactly one Databricks workspace. To work with multiple workspaces simultaneously, register a separate server entry per workspace — each with its own credentials:

{
  "servers": {
    // AWS / GCP workspace
    "databricks-cloud": {
      "type": "stdio",
      "command": "databricks-mcp",
      "env": {
        "DATABRICKS_HOST": "https://dbc-xxxx.cloud.databricks.com",
        "DATABRICKS_TOKEN": "dapi_cloud_token",
        "DATABRICKS_WAREHOUSE_ID": "cloud_warehouse_id",
        "DATABRICKS_CATALOG": "workspace"
      }
    },
    // Azure workspace
    "databricks-azure": {
      "type": "stdio",
      "command": "databricks-mcp",
      "env": {
        "DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
        "DATABRICKS_TOKEN": "dapi_azure_token",
        "DATABRICKS_WAREHOUSE_ID": "azure_warehouse_id",
        "DATABRICKS_CATALOG": "main"
      }
    }
  }
}

Alternatively, with a source install you can use separate .env files per workspace:

{
  "servers": {
    "databricks-cloud": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/bin/python",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env"
    },
    "databricks-azure": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/bin/python",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env_azure"
    }
  }
}

4. Start using

Once configured, your AI assistant can call any of the 43 tools below. Here are example prompts organized by domain:

Explore your data

"What tables exist in the analytics schema?"
"Show me the schema and metadata for main.sales.orders"
"Run a query that counts and sums orders by status from main.sales.orders"

Unity Catalog & schemas

"List all catalogs I have access to"
"What schemas exist in the analytics catalog?"
"Describe the main.default schema"
"Create a new schema called staging in the analytics catalog"

Understand dependencies

"Build the full workspace dependency graph"
"What are the upstream and downstream dependencies of main.default.customers?"
"Scan the /Shared/mandated_broker_v2_etl_pipeline notebook for table references"
"Scan all jobs and show their table dependencies"

Assess impact before making changes

"What would break if I drop the customer_id column from main.default.customers?"
"What's the impact of removing the amount column and renaming status to order_status in main.sales.orders?"

Review notebook quality

"Review /Shared/mandated_broker_v2_etl_pipeline for performance issues"
"Review /Shared/analysis for all issues — performance, coding standards, and optimizations"

Monitor jobs and pipelines

"List all jobs in the workspace"
"What's the current status of job 12345?"
"Show me the pipeline status for my DLT pipeline"
"Trigger a new run of job 67890 with parameter env=prod"

Compute & SQL warehouses

"Show me the status of cluster abc-123"
"List all running clusters"
"Stop the dev SQL warehouse"
"What warehouses are currently active?"

Workspace & volumes

"Export the ETL notebook as source"
"What's the status of the notebook at /Workspace/Users/me/analysis?"
"What files are in the raw-data volume?"
"Read the config.json file from the settings volume"

MCP Tools

SQL & Tables (3 tools)

Tool	Description
`execute_query`	Execute SQL against a Databricks SQL warehouse
`get_table_info`	Get table metadata — columns, row count, properties, storage
`list_tables`	List tables in a catalog.schema

Dependency Scanning (4 tools)

Tool	Description
`scan_notebook`	Scan a notebook for table/column references
`scan_jobs`	Scan all jobs for table dependencies
`scan_dlt_pipelines`	Scan all DLT pipelines for source/target tables
`scan_dlt_pipeline`	Scan a single DLT pipeline by ID for source/target tables

Graph Operations (3 tools)

Tool	Description
`build_dependency_graph`	Build the full workspace dependency graph
`get_table_dependencies`	Get upstream/downstream dependencies for a table
`refresh_graph`	Invalidate and rebuild the dependency graph cache

Impact Analysis & Review (2 tools)

Tool	Description
`analyze_impact`	Analyze impact of column drop / schema change / pipeline failure
`review_notebook`	Review a notebook for issues, anti-patterns, and optimizations

Job & Pipeline Ops (6 tools)

Tool	Description
`list_jobs`	List jobs with status and schedule info
`get_job_status`	Get detailed job run status with error diagnostics
`list_pipelines`	List DLT pipelines with state and update status
`get_pipeline_status`	Get pipeline update details with event log
`trigger_rerun`	Trigger a rerun of the latest failed job run (requires confirmation)
`trigger_job_run`	Trigger a brand-new job run with optional parameters (requires confirmation)

Catalog & Schema (5 tools)

Tool	Description
`list_catalogs`	List all Unity Catalog catalogs accessible to the current principal
`list_schemas`	List all schemas in a catalog
`describe_schema`	Get schema metadata, owner, comment, and properties
`create_schema`	Create a new schema in a catalog (requires confirmation)
`drop_schema`	Drop a schema — must be empty (requires confirmation)

Compute (5 tools)

Tool	Description
`list_clusters`	List all clusters with state, creator, and node type
`get_cluster_status`	Get detailed cluster status, spark version, and config
`start_cluster`	Start a terminated cluster (requires confirmation)
`stop_cluster`	Stop (terminate) a running cluster (requires confirmation)
`restart_cluster`	Restart a running cluster (requires confirmation)

SQL Warehouses (4 tools)

Tool	Description
`list_warehouses`	List all SQL warehouses with state, size, and type
`get_warehouse_status`	Get detailed warehouse config, scaling, and auto-stop settings
`start_warehouse`	Start a stopped SQL warehouse (requires confirmation)
`stop_warehouse`	Stop a running SQL warehouse (requires confirmation)

Workspace Ops (7 tools)

Tool	Description
`list_workspace_notebooks`	List all notebooks in a workspace path
`create_job`	Create a new Databricks job (requires confirmation)
`create_notebook`	Create a notebook in the workspace (requires confirmation)
`workspace_upload`	Upload a local file to the workspace (requires confirmation)
`read_notebook`	Read/export a notebook's content (SOURCE or HTML)
`delete_workspace_item`	Delete a notebook or folder (requires confirmation)
`get_workspace_status`	Get metadata for a workspace object (type, language, modified)

UC Volumes (4 tools)

Tool	Description
`list_volumes`	List Unity Catalog volumes in a catalog.schema
`get_volume_info`	Get volume metadata (type, storage location, owner)
`list_volume_files`	List files and directories inside a volume
`read_volume_file`	Read contents of a file from a volume

Configuration Reference

Variable	Required	Default	Description
`DATABRICKS_HOST`	Yes	—	Workspace URL (`https://adb-xxx.azuredatabricks.net` for Azure, `https://dbc-xxx.cloud.databricks.com` for AWS/GCP)
`DATABRICKS_TOKEN`	Yes	—	Personal access token or service principal token
`DATABRICKS_WAREHOUSE_ID`	Yes	—	SQL warehouse ID for query execution
`DATABRICKS_CATALOG`	No	`main`	Default catalog for unqualified table names — use `workspace` for AWS/GCP
`DATABRICKS_SCHEMA`	No	`default`	Default schema for unqualified table names
`GRAPH_CACHE_TTL`	No	`3600`	Dependency graph cache TTL in seconds
`GRAPH_REFRESH_INTERVAL`	No	`0`	Auto-refresh the graph in the background every N seconds. `0` disables auto-refresh

Security note: The execute_query tool can run any SQL against your warehouse, including DDL (DROP TABLE, ALTER TABLE) and DML (DELETE, UPDATE). Use a least-privilege service principal (see Security & Governance below) rather than a personal admin PAT in production environments.

Cloud Provider Notes

This server is tested against Azure Databricks and Databricks on AWS (.cloud.databricks.com). Key differences:

Aspect	Azure	AWS / GCP
Host format	`https://adb-xxx.azuredatabricks.net`	`https://dbc-xxx.cloud.databricks.com`
Default catalog	`main`	`workspace`
Workspace root objects	`DIRECTORY`	`DIRECTORY` and `REPO`

All tools work on both platforms. Set DATABRICKS_CATALOG to match your workspace's default catalog.

Security & Governance

Recommended: Service Principal with Least Privilege

Avoid storing a personal admin PAT in .env or VS Code config. Instead, create a dedicated service principal with only the permissions required:

-- Grant read access on the catalog and schema
GRANT USE CATALOG ON CATALOG main TO `sp-databricks-mcp`;
GRANT USE SCHEMA ON SCHEMA main.default TO `sp-databricks-mcp`;
GRANT SELECT ON SCHEMA main.default TO `sp-databricks-mcp`;

-- For job operations (trigger_rerun, trigger_job_run, get_job_status)
-- Grant CAN_MANAGE_RUN on specific jobs only, via the Databricks UI or API

Set DATABRICKS_TOKEN to the service principal's OAuth token or M2M secret. The Databricks SDK supports OAuth M2M authentication natively — no PAT required.

Tool Risk Levels

Risk	Tools	Notes
Read-only	`execute_query` (SELECT only), `get_table_info`, `list_tables`, `scan_`, `list_`, `get_*`, `describe_schema`, `build_dependency_graph`, `get_table_dependencies`, `analyze_impact`, `review_notebook`, `read_notebook`, `read_volume_file`	Safe with read-only grants
Mutating	`trigger_rerun`, `trigger_job_run`, `create_schema`, `start_cluster`, `stop_cluster`, `restart_cluster`, `start_warehouse`, `stop_warehouse`, `create_job`, `create_notebook`, `workspace_upload`	Require `confirm=True`
Destructive	`drop_schema`, `delete_workspace_item`, `execute_query` with DDL/DML	Require `confirm=True`; scope permissions carefully

Unity Catalog ACLs

All execute_query and get_table_info calls respect Unity Catalog row/column-level security. If the token's principal lacks SELECT on a table, the operation fails with a permission error — expected and correct behaviour.

Query Guard

To limit execute_query to read-only statements, restrict the SQL warehouse's channel policy or use a dedicated read-only warehouse for this MCP server.

Infrastructure (Optional)

If you need to provision a new Azure Databricks workspace, the infra/ directory contains:

main.bicep — Azure Bicep template (Premium SKU, Unity Catalog enabled)
deploy.ps1 — One-command PowerShell deployment script
INSTALL.md — Detailed step-by-step deployment guide

cd infra
./deploy.ps1 -ResourceGroupName rg-databricks-mcp -Location eastus2

Development

# Install with dev dependencies
uv pip install -e ".[dev]"

# Run tests (excluding live integration tests)
uv run pytest tests/ --ignore=tests/test_workspace_ops_live.py -v

# Run tests with coverage report
uv run pytest tests/ --cov=src/databricks_advanced_mcp --cov-report=term-missing --ignore=tests/test_workspace_ops_live.py

# Lint
uv run ruff check src/ tests/

# Type check
uv run mypy src/

Architecture

src/databricks_advanced_mcp/
├── server.py              # FastMCP server + CLI entry point
├── config.py              # Pydantic settings from env vars
├── client.py              # Databricks SDK client factory
├── tools/                 # MCP tool implementations (43 tools across 13 modules)
│   ├── __init__.py        # Central registration of all tool modules
│   ├── sql_executor.py    # SQL execution (1 tool)
│   ├── table_info.py      # Table metadata (2 tools)
│   ├── dependency_scanner.py # Scan notebooks/jobs/pipelines (4 tools)
│   ├── graph_ops.py       # Build/query/refresh dependency graph (3 tools)
│   ├── impact_analysis.py # Impact analysis (1 tool)
│   ├── notebook_reviewer.py # Notebook review (1 tool)
│   ├── job_pipeline_ops.py  # Job & pipeline operations (6 tools)
│   ├── workspace_listing.py # Workspace listing (1 tool)
│   ├── workspace_ops.py   # Workspace mutations + read/delete (6 tools)
│   ├── catalog_ops.py     # Unity Catalog & schema management (5 tools)
│   ├── compute_ops.py     # Cluster management (5 tools)
│   ├── warehouse_ops.py   # SQL warehouse management (4 tools)
│   └── volume_ops.py      # Unity Catalog volumes (4 tools)
├── parsers/               # Code parsing engines
│   ├── sql_parser.py      # sqlglot-based SQL extraction
│   ├── notebook_parser.py # Databricks notebook cell parsing
│   └── dlt_parser.py      # DLT pipeline definition parsing
├── graph/                 # Dependency graph
│   ├── models.py          # Node, Edge, DependencyGraph data models
│   ├── builder.py         # Graph builder (orchestrates scans)
│   └── cache.py           # In-memory graph cache with TTL
└── reviewers/             # Notebook review rule engines
    ├── performance.py     # Performance anti-patterns
    ├── standards.py       # Coding standards checks
    └── suggestions.py     # Optimization suggestions

License

MIT

Contributions welcome! See CONTRIBUTING.md for setup instructions, PR checklist, and a list of wanted features.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

henrybravo

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.7

Mar 4, 2026

This version

0.0.6

Mar 4, 2026

0.0.5

Mar 4, 2026

0.0.3

Mar 4, 2026

0.0.2

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_advanced_mcp-0.0.6.tar.gz (9.5 MB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

databricks_advanced_mcp-0.0.6-py3-none-any.whl (64.8 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file databricks_advanced_mcp-0.0.6.tar.gz.

File metadata

Download URL: databricks_advanced_mcp-0.0.6.tar.gz
Upload date: Mar 4, 2026
Size: 9.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_advanced_mcp-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`adcaa50fdf403485a20f6c47aa2088db1454ab5de6c99ddeb824bc2f936d896e`
MD5	`d94bcf02531e64e3208ece9fa0a5f97d`
BLAKE2b-256	`6a071e18e2c538442ad496c50fa13b61df712a01abcebb4c489fc42009bf2f8c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_advanced_mcp-0.0.6.tar.gz:

Publisher: workflow.yml on henrybravo/databricks-advanced-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: databricks_advanced_mcp-0.0.6.tar.gz
- Subject digest: adcaa50fdf403485a20f6c47aa2088db1454ab5de6c99ddeb824bc2f936d896e
- Sigstore transparency entry: 1029515665
- Sigstore integration time: Mar 4, 2026
Source repository:
- Permalink: henrybravo/databricks-advanced-mcp-server@f0d62c296d5ecf49e6d1a1084733696f87eefafc
- Branch / Tag: refs/tags/v0.0.6
- Owner: https://github.com/henrybravo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@f0d62c296d5ecf49e6d1a1084733696f87eefafc
- Trigger Event: release

File details

Details for the file databricks_advanced_mcp-0.0.6-py3-none-any.whl.

File metadata

Download URL: databricks_advanced_mcp-0.0.6-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 64.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_advanced_mcp-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a4c70a64ec8e9087959944515976f9786788fdf5c1679bd1a8b53060dfa0a965`
MD5	`3e51eb32eaebfc6c582f84dbc2cddf86`
BLAKE2b-256	`a523ee6cbadc003a0d14abcc39fa55fcbd7d19c715b742e5700d661429f2e199`

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_advanced_mcp-0.0.6-py3-none-any.whl:

Publisher: workflow.yml on henrybravo/databricks-advanced-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: databricks_advanced_mcp-0.0.6-py3-none-any.whl
- Subject digest: a4c70a64ec8e9087959944515976f9786788fdf5c1679bd1a8b53060dfa0a965
- Sigstore transparency entry: 1029515718
- Sigstore integration time: Mar 4, 2026
Source repository:
- Permalink: henrybravo/databricks-advanced-mcp-server@f0d62c296d5ecf49e6d1a1084733696f87eefafc
- Branch / Tag: refs/tags/v0.0.6
- Owner: https://github.com/henrybravo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@f0d62c296d5ecf49e6d1a1084733696f87eefafc
- Trigger Event: release

databricks-advanced-mcp 0.0.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Databricks Advanced MCP Server

Features

Demo

Quick Start

Prerequisites

1. Install

Option A: Install from PyPI (recommended)

Option B: Install from source

2. Configure

3. Add to your IDE

Option A: PyPI install (recommended)

Option B: Virtual environment (source install)

Multiple Workspaces

4. Start using

MCP Tools

SQL & Tables (3 tools)

Dependency Scanning (4 tools)

Graph Operations (3 tools)

Impact Analysis & Review (2 tools)

Job & Pipeline Ops (6 tools)

Catalog & Schema (5 tools)

Compute (5 tools)

SQL Warehouses (4 tools)

Workspace Ops (7 tools)

UC Volumes (4 tools)

Configuration Reference

Cloud Provider Notes

Security & Governance

Recommended: Service Principal with Least Privilege

Tool Risk Levels

Unity Catalog ACLs

Query Guard

Infrastructure (Optional)

Development

Architecture

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance