Advanced MCP server for Databricks workspace intelligence — dependency scanning, impact analysis, notebook review, and job/pipeline operations.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

henrybravo

These details have not been verified by PyPI

Project description

Databricks Advanced MCP Server

An advanced Model Context Protocol (MCP) server that gives AI assistants deep visibility into your Databricks workspace - dependency scanning, impact analysis, notebook review, job/pipeline operations, SQL execution, and table metadata inspection.

Features

Domain	What it does
SQL Execution	Run SQL queries against Databricks SQL warehouses with configurable result limits
Table Information	Inspect table metadata, schemas, column details, row counts, and storage info
Dependency Scanning	Scan notebooks, jobs, and DLT pipelines to build a workspace dependency graph (DAG)
Impact Analysis	Predict downstream breakage from column drops, schema changes, or pipeline failures
Notebook Review	Detect performance anti-patterns, coding standard violations, and suggest optimizations
Job & Pipeline Ops	List jobs/pipelines, get run status with error diagnostics, trigger reruns

Demo

Demo teaser

Click to play full video

https://github.com/user-attachments/assets/579282ca-bb26-4244-b0c6-3ad26050aca3

Covers SQL execution, dependency scanning, impact analysis, notebook review, and job/pipeline operations.

Quick Start

Prerequisites

Python 3.11+
uv — fast Python package manager
A Databricks workspace with a SQL warehouse
A Databricks personal access token

Other auth methods: The Databricks SDK supports unified authentication — if you don't set DATABRICKS_TOKEN, it will fall back to Azure CLI, managed identity, or .databrickscfg. The .env setup below uses a PAT for simplicity.

Don't have a Databricks workspace yet? See infra/INSTALL.md for a one-command Azure deployment using Bicep.

1. Install

Option A: Install from PyPI (recommended)

uv pip install databricks-advanced-mcp

Or with pip:

pip install databricks-advanced-mcp

Option B: Install from source

git clone https://github.com/henrybravo/databricks-advanced-mcp-server.git
cd databricks-advanced-mcp-server

Create and activate a virtual environment:

Windows (PowerShell)

uv venv .venv
.\.venv\Scripts\Activate.ps1
uv pip install -e .

macOS / Linux

uv venv .venv
source .venv/bin/activate
uv pip install -e .

2. Configure

cp .env.example .env

Edit .env with your Databricks credentials:

# Azure Databricks:
DATABRICKS_HOST=https://adb-xxxx.azuredatabricks.net
# Databricks on AWS / GCP:
# DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com

DATABRICKS_TOKEN=dapi_your_token
DATABRICKS_WAREHOUSE_ID=your_warehouse_id

# Optional (defaults shown)
# Azure workspaces typically use "main"; AWS/GCP workspaces use "workspace"
DATABRICKS_CATALOG=main
DATABRICKS_SCHEMA=default

3. Add to your IDE

Create .vscode/mcp.json in your project to register the MCP server with VS Code / GitHub Copilot.

Option A: PyPI install (recommended)

If you installed from PyPI (pip install databricks-advanced-mcp), the databricks-mcp CLI is available on your PATH:

{
  "servers": {
    "databricks-mcp": {
      "type": "stdio",
      "command": "databricks-mcp",
      "env": {
        "DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
        "DATABRICKS_TOKEN": "dapi_your_token",
        "DATABRICKS_WAREHOUSE_ID": "your_warehouse_id"
      }
    }
  }
}

Option B: Virtual environment (source install)

If you cloned the repo and installed into a local .venv, point directly to the Python interpreter:

Windows

{
  "servers": {
    "databricks-mcp": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/Scripts/python.exe",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env"
    }
  }
}

macOS / Linux

{
  "servers": {
    "databricks-mcp": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/bin/python",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env"
    }
  }
}

Multiple Workspaces

Each MCP server instance connects to exactly one Databricks workspace. To work with multiple workspaces simultaneously, register a separate server entry per workspace — each with its own credentials:

{
  "servers": {
    // AWS / GCP workspace
    "databricks-cloud": {
      "type": "stdio",
      "command": "databricks-mcp",
      "env": {
        "DATABRICKS_HOST": "https://dbc-xxxx.cloud.databricks.com",
        "DATABRICKS_TOKEN": "dapi_cloud_token",
        "DATABRICKS_WAREHOUSE_ID": "cloud_warehouse_id",
        "DATABRICKS_CATALOG": "workspace"
      }
    },
    // Azure workspace
    "databricks-azure": {
      "type": "stdio",
      "command": "databricks-mcp",
      "env": {
        "DATABRICKS_HOST": "https://adb-xxxx.azuredatabricks.net",
        "DATABRICKS_TOKEN": "dapi_azure_token",
        "DATABRICKS_WAREHOUSE_ID": "azure_warehouse_id",
        "DATABRICKS_CATALOG": "main"
      }
    }
  }
}

Alternatively, with a source install you can use separate .env files per workspace:

{
  "servers": {
    "databricks-cloud": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/bin/python",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env"
    },
    "databricks-azure": {
      "type": "stdio",
      "command": "${workspaceFolder}/.venv/bin/python",
      "args": ["-m", "databricks_advanced_mcp.server"],
      "envFile": "${workspaceFolder}/.env_azure"
    }
  }
}

4. Start using

Once configured, your AI assistant can call any of the 18 tools below. Here are example prompts organized by domain:

Explore your data

"What tables exist in the analytics schema?"
"Show me the schema and metadata for main.sales.orders"
"Run a query that counts and sums orders by status from main.sales.orders"

Understand dependencies

"Build the full workspace dependency graph"
"What are the upstream and downstream dependencies of main.default.customers?"
"Scan the /Shared/mandated_broker_v2_etl_pipeline notebook for table references"
"Scan all jobs and show their table dependencies"

Assess impact before making changes

"What would break if I drop the customer_id column from main.default.customers?"
"What's the impact of removing the amount column and renaming status to order_status in main.sales.orders?"

Review notebook quality

"Review /Shared/mandated_broker_v2_etl_pipeline for performance issues"
"Review /Shared/analysis for all issues — performance, coding standards, and optimizations"

Monitor jobs and pipelines

"List all jobs in the workspace"
"What's the current status of job 12345?"
"Show me the pipeline status for my DLT pipeline"

MCP Tools

Tool	Description
`execute_query`	Execute SQL against a Databricks SQL warehouse
`get_table_info`	Get table metadata — columns, row count, properties, storage
`list_tables`	List tables in a catalog.schema
`scan_notebook`	Scan a notebook for table/column references
`scan_jobs`	Scan all jobs for table dependencies
`scan_dlt_pipelines`	Scan all DLT pipelines for source/target tables
`scan_dlt_pipeline`	Scan a single DLT pipeline by ID for source/target tables
`build_dependency_graph`	Build the full workspace dependency graph
`get_table_dependencies`	Get upstream/downstream dependencies for a table
`refresh_graph`	Invalidate and rebuild the dependency graph cache
`analyze_impact`	Analyze impact of column drop / schema change / pipeline failure
`review_notebook`	Review a notebook for issues, anti-patterns, and optimizations
`list_jobs`	List jobs with status and schedule info
`get_job_status`	Get detailed job run status with error diagnostics
`list_pipelines`	List DLT pipelines with state and update status
`get_pipeline_status`	Get pipeline update details with event log
`trigger_rerun`	Trigger a job rerun (requires confirmation)
`list_workspace_notebooks`	List all notebooks in a workspace path

Configuration Reference

Variable	Required	Default	Description
`DATABRICKS_HOST`	Yes	—	Workspace URL (`https://adb-xxx.azuredatabricks.net` for Azure, `https://dbc-xxx.cloud.databricks.com` for AWS/GCP)
`DATABRICKS_TOKEN`	Yes	—	Personal access token or service principal token
`DATABRICKS_WAREHOUSE_ID`	Yes	—	SQL warehouse ID for query execution
`DATABRICKS_CATALOG`	No	`main`	Default catalog for unqualified table names — use `workspace` for AWS/GCP
`DATABRICKS_SCHEMA`	No	`default`	Default schema for unqualified table names

Cloud Provider Notes

This server is tested against Azure Databricks and Databricks on AWS (.cloud.databricks.com). Key differences:

Aspect	Azure	AWS / GCP
Host format	`https://adb-xxx.azuredatabricks.net`	`https://dbc-xxx.cloud.databricks.com`
Default catalog	`main`	`workspace`
Workspace root objects	`DIRECTORY`	`DIRECTORY` and `REPO`

All tools work on both platforms. Set DATABRICKS_CATALOG to match your workspace's default catalog.

Infrastructure (Optional)

If you need to provision a new Azure Databricks workspace, the infra/ directory contains:

main.bicep — Azure Bicep template (Premium SKU, Unity Catalog enabled)
deploy.ps1 — One-command PowerShell deployment script
INSTALL.md — Detailed step-by-step deployment guide

cd infra
./deploy.ps1 -ResourceGroupName rg-databricks-mcp -Location eastus2

Development

# Install with dev dependencies
uv pip install -e ".[dev]"

# Run tests
uv run pytest

# Lint
uv run ruff check src/ tests/

# Type check
uv run mypy src/

Architecture

src/databricks_advanced_mcp/
├── server.py          # FastMCP server + CLI entry point
├── config.py          # Pydantic settings from env vars
├── client.py          # Databricks SDK client factory
├── tools/             # MCP tool implementations
│   ├── sql_executor.py
│   ├── dependency_scanner.py
│   ├── impact_analysis.py
│   ├── notebook_reviewer.py
│   ├── job_pipeline_ops.py
|   ├── table_info.py
|   └── workspace_listing.py
├── parsers/           # Code parsing engines
│   ├── sql_parser.py       # sqlglot-based SQL extraction
│   ├── notebook_parser.py  # Databricks notebook cell parsing
│   └── dlt_parser.py       # DLT pipeline definition parsing
├── graph/             # Dependency graph
│   ├── models.py      # Node, Edge, DependencyGraph data models
│   ├── builder.py     # Graph builder (orchestrates scans)
│   └── cache.py       # In-memory graph cache with TTL
└── reviewers/         # Notebook review rule engines
    ├── performance.py # Performance anti-patterns
    ├── standards.py   # Coding standards checks
    └── suggestions.py # Optimization suggestions

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

henrybravo

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.7

Mar 4, 2026

0.0.6

Mar 4, 2026

0.0.5

Mar 4, 2026

This version

0.0.3

Mar 4, 2026

0.0.2

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_advanced_mcp-0.0.3.tar.gz (9.5 MB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

databricks_advanced_mcp-0.0.3-py3-none-any.whl (51.6 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file databricks_advanced_mcp-0.0.3.tar.gz.

File metadata

Download URL: databricks_advanced_mcp-0.0.3.tar.gz
Upload date: Mar 4, 2026
Size: 9.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_advanced_mcp-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`0503fdaebeda17bdce1d82a512bd5647aa62579338b3baa16c37d3ae1f6be661`
MD5	`084f608ca0af3d21f958ff8fe820cbc4`
BLAKE2b-256	`b3531b4f64fd23fc838537fb3c6de229440639aec3c7b64509f101d82a2da735`

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_advanced_mcp-0.0.3.tar.gz:

Publisher: workflow.yml on henrybravo/databricks-advanced-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: databricks_advanced_mcp-0.0.3.tar.gz
- Subject digest: 0503fdaebeda17bdce1d82a512bd5647aa62579338b3baa16c37d3ae1f6be661
- Sigstore transparency entry: 1025381003
- Sigstore integration time: Mar 4, 2026
Source repository:
- Permalink: henrybravo/databricks-advanced-mcp-server@66d690b318af0b7003c2c10e23f6bc05f7ae5632
- Branch / Tag: refs/tags/v0.0.3
- Owner: https://github.com/henrybravo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@66d690b318af0b7003c2c10e23f6bc05f7ae5632
- Trigger Event: release

File details

Details for the file databricks_advanced_mcp-0.0.3-py3-none-any.whl.

File metadata

Download URL: databricks_advanced_mcp-0.0.3-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 51.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_advanced_mcp-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dc308763d05ad1a21ebe68aa49d3b36e80d8e7fc93181df1377052c064e779e8`
MD5	`bbbe5b904dd5e103b5ecf105229f21a5`
BLAKE2b-256	`1f684fd226a1f0579bc9328ebdd355baeddde87e526f39f8d2dede69b26397d2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_advanced_mcp-0.0.3-py3-none-any.whl:

Publisher: workflow.yml on henrybravo/databricks-advanced-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: databricks_advanced_mcp-0.0.3-py3-none-any.whl
- Subject digest: dc308763d05ad1a21ebe68aa49d3b36e80d8e7fc93181df1377052c064e779e8
- Sigstore transparency entry: 1025381375
- Sigstore integration time: Mar 4, 2026
Source repository:
- Permalink: henrybravo/databricks-advanced-mcp-server@66d690b318af0b7003c2c10e23f6bc05f7ae5632
- Branch / Tag: refs/tags/v0.0.3
- Owner: https://github.com/henrybravo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@66d690b318af0b7003c2c10e23f6bc05f7ae5632
- Trigger Event: release

databricks-advanced-mcp 0.0.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Databricks Advanced MCP Server

Features

Demo

Quick Start

Prerequisites

1. Install

Option A: Install from PyPI (recommended)

Option B: Install from source

2. Configure

3. Add to your IDE

Option A: PyPI install (recommended)

Option B: Virtual environment (source install)

Multiple Workspaces

4. Start using

MCP Tools

Configuration Reference

Cloud Provider Notes

Infrastructure (Optional)

Development

Architecture

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance