A Model Completion Protocol (MCP) server for Databricks
Project description
๐ค Built by Markov
When AI changes everything, you start from scratch.
Markov specializes in cutting-edge AI solutions and automation. From neural ledgers to MCP servers,
we're building the tools that power the next generation of AI-driven applications.
๐ผ We're always hiring exceptional engineers! Join us in shaping the future of AI.
๐ Visit markov.bot โข โ๏ธ Get in Touch โข ๐ Careers
Databricks MCP Server
A Model Completion Protocol (MCP) server for Databricks that provides access to Databricks functionality via the MCP protocol. This allows LLM-powered tools to interact with Databricks clusters, jobs, notebooks, and more.
Version 0.4.0 - Structured MCP responses, resource caching, and resilience upgrades.
๐ One-Click Install
For Cursor Users
Click this link to install instantly:
cursor://anysphere.cursor-deeplink/mcp/install?name=databricks-mcp&config=eyJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyJkYXRhYnJpY2tzLW1jcC1zZXJ2ZXIiXSwiZW52Ijp7IkRBVEFCUklDS1NfSE9TVCI6IiR7REFUQUJSSUNLU19IT1NUfSIsIkRBVEFCUklDS1NfVE9LRU4iOiIke0RBVEFCUklDS1NfVE9LRU59IiwiREFUQUJSSUNLU19XQVJFSE9VU0VfSUQiOiIke0RBVEFCUklDS1NfV0FSRUhPVVNFX0lEfSJ9fQ==
Or copy and paste this deeplink:
cursor://anysphere.cursor-deeplink/mcp/install?name=databricks-mcp&config=eyJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyJkYXRhYnJpY2tzLW1jcC1zZXJ2ZXIiXSwiZW52Ijp7IkRBVEFCUklDS1NfSE9TVCI6IiR7REFUQUJSSUNLU19IT1NUfSIsIkRBVEFCUklDS1NfVE9LRU4iOiIke0RBVEFCUklDS1NfVE9LRU59IiwiREFUQUJSSUNLU19XQVJFSE9VU0VfSUQiOiIke0RBVEFCUklDS1NfV0FSRUhPVVNFX0lEfSJ9fQ==
โ Install Databricks MCP in Cursor โ
This project is maintained by Olivier Debeuf De Rijcker olivier@markov.bot.
Credit for the initial version goes to @JustTryAI.
Features
- MCP Protocol Support: Implements the MCP protocol to allow LLMs to interact with Databricks
- Databricks API Integration: Provides access to Databricks REST API functionality
- Tool Registration: Exposes Databricks functionality as MCP tools
- Async Support: Built with asyncio for efficient operation
Available Tools
The Databricks MCP Server exposes the following tools:
Cluster Management
- list_clusters: List all Databricks clusters
- create_cluster: Create a new Databricks cluster
- terminate_cluster: Terminate a Databricks cluster
- get_cluster: Get information about a specific Databricks cluster
- start_cluster: Start a terminated Databricks cluster
Job Management
- list_jobs: List all Databricks jobs
- run_job: Run a Databricks job
- run_notebook: Submit and wait for a one-time notebook run
- create_job: Create a new Databricks job
- delete_job: Delete a Databricks job
- get_run_status: Get status information for a job run
- list_job_runs: List recent runs for a job
- cancel_run: Cancel a running job
Workspace Files
- list_notebooks: List notebooks in a workspace directory
- export_notebook: Export a notebook from the workspace
- import_notebook: Import a notebook into the workspace
- delete_workspace_object: Delete a notebook or directory
- get_workspace_file_content: Retrieve content of any workspace file (JSON, notebooks, scripts, etc.)
- get_workspace_file_info: Get metadata about workspace files
File System
- list_files: List files and directories in a DBFS path
- dbfs_put: Upload a small file to DBFS
- dbfs_delete: Delete a DBFS file or directory
Cluster Libraries
- install_library: Install libraries on a cluster
- uninstall_library: Remove libraries from a cluster
- list_cluster_libraries: Check installed libraries on a cluster
Repos
- create_repo: Clone a Git repository
- update_repo: Update an existing repo
- list_repos: List repos in the workspace
- pull_repo: Pull the latest commit for a Databricks repo
Unity Catalog
- list_catalogs: List catalogs
- create_catalog: Create a catalog
- list_schemas: List schemas in a catalog
- create_schema: Create a schema
- list_tables: List tables in a schema
- create_table: Execute a CREATE TABLE statement
- get_table_lineage: Fetch lineage information for a table
Composite
- sync_repo_and_run_notebook: Pull a repo and execute a notebook in one call
SQL Execution
- execute_sql: Execute a SQL statement (optional
warehouse_id,catalog,schema_name)
๐ Recent Updates
Structured Output Refresh (current)
- โ
Typed MCP Schemas: Tools expose precise input schemas using FastMCP's metadata (no
{ "params": ... }envelope). - โ
Structured Results: Each tool now returns
CallToolResultwith a concise text summary and the full Databricks payload in_meta['data']. - โ
Resource URIs for Large Payloads: Notebook/workspace exports stash
resource://databricks/exports/{id}entries in_meta['resources']instead of embedding large blobs. - โ Resilience Improvements: Per-tool concurrency limits, timeouts, and retry-with-backoff for transient Databricks errors.
- โ
Progress & Telemetry: Tools publish MCP progress notifications and surface
_meta._request_idplus per-tool success/error counters for easier observability. - โ
Correlation IDs: All API requests and tool responses carry
_meta._request_idfor traceability.
v0.3.0 Highlights
- โ
Repository Management: Pull latest commits from Databricks repos with
pull_repo. - โ
One-time Notebook Execution: Submit and wait for notebook runs with
run_notebook. - โ
Composite Operations: Combined repo sync + notebook execution with
sync_repo_and_run_notebook. - โ Enhanced Job Management: Extended job APIs with submit, status checking, and run management.
Previous Updates:
- v0.2.1: Enhanced Codespaces support, documentation improvements, publishing process streamlining
- v0.2.0: Major package refactoring from
src/todatabricks_mcp/structure
Backwards Compatibility: Breaking change alert โ tools now require flat arguments and emit structured responses; update custom clients accordingly.
Installation
Quick Install (Recommended)
Use the link above to install with one click:
โ Install Databricks MCP in Cursor โ
This will automatically install the MCP server using uvx and configure it in Cursor. You'll need to set these environment variables:
DATABRICKS_HOST- Your Databricks workspace URLDATABRICKS_TOKEN- Your Databricks personal access tokenDATABRICKS_WAREHOUSE_ID- (Optional) Your default SQL warehouse ID
Manual Installation
Prerequisites
- Python 3.10 or higher
uvpackage manager (recommended for MCP servers)
Setup
-
Install
uvif you don't have it already:# MacOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows (in PowerShell) irm https://astral.sh/uv/install.ps1 | iex
Restart your terminal after installation.
-
Clone the repository:
git clone https://github.com/markov-kernel/databricks-mcp.git cd databricks-mcp
-
Create a virtual environment (optional) and install dependencies for local development:
# Create and activate virtual environment uv venv # On Windows .\.venv\Scripts\activate # On Linux/Mac source .venv/bin/activate # Install dependencies in development mode uv pip install -e . # Install development dependencies uv pip install -e ".[dev]"
-
Set up environment variables:
# Required variables # Windows set DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net set DATABRICKS_TOKEN=your-personal-access-token # Linux/Mac export DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net export DATABRICKS_TOKEN=your-personal-access-token # Optional: Set default SQL warehouse (makes warehouse_id optional in execute_sql) export DATABRICKS_WAREHOUSE_ID=sql_warehouse_12345
You can also create an
.envfile based on the.env.exampletemplate.
Running the MCP Server
Standalone
To start the MCP server directly for testing or development, run:
uvx databricks-mcp-server@latest
Pass --log-level DEBUG or other options using standard CLI flags:
uvx databricks-mcp-server@latest -- --log-level DEBUG
Integrating with AI Clients
To use this server with AI clients like Cursor or Claude CLI, you need to register it.
Cursor Setup
-
Open your global MCP configuration file located at
~/.cursor/mcp.json(create it if it doesn't exist). -
Add the following entry within the
mcpServersobject, replacing placeholders with your actual values:{ "mcpServers": { // ... other servers ... "databricks-mcp-local": { "command": "uvx", "args": ["databricks-mcp-server@latest"], "env": { "DATABRICKS_HOST": "https://your-databricks-instance.azuredatabricks.net", "DATABRICKS_TOKEN": "dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", "DATABRICKS_WAREHOUSE_ID": "sql_warehouse_12345", "RUNNING_VIA_CURSOR_MCP": "true" } } // ... other servers ... } }
-
Replace the
DATABRICKS_HOSTandDATABRICKS_TOKENvalues with your credentials, then restart Cursor. -
You can now invoke tools using
databricks-mcp-local:<tool_name>(e.g.,databricks-mcp-local:list_jobs).
Claude CLI Setup
-
Use the
claude mcp addcommand to register the server. Provide your credentials using the-eflag for environment variables and point the command touvx databricks-mcp-server@latest:claude mcp add databricks-mcp-local \ -s user \ -e DATABRICKS_HOST="https://your-databricks-instance.azuredatabricks.net" \ -e DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \ -e DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345" \ -- uvx databricks-mcp-server@latest
-
Replace the
DATABRICKS_HOSTandDATABRICKS_TOKENvalues with your credentials. -
You can now invoke tools using
databricks-mcp-local:<tool_name>in your Claude interactions.
Usage Examples
SQL Execution with Default Warehouse
# With DATABRICKS_WAREHOUSE_ID set, warehouse_id is optional
await session.call_tool("execute_sql", {
"statement": "SELECT * FROM my_table LIMIT 10"
})
# You can still override the default warehouse
await session.call_tool("execute_sql", {
"statement": "SELECT * FROM my_table LIMIT 10",
"warehouse_id": "sql_warehouse_specific"
})
Workspace File Content Retrieval
# Get JSON file content from workspace
await session.call_tool("get_workspace_file_content", {
"path": "/Users/user@domain.com/config/settings.json"
})
# Get notebook content in Jupyter format
await session.call_tool("get_workspace_file_content", {
"path": "/Users/user@domain.com/my_notebook",
"format": "JUPYTER"
})
# Get file metadata without downloading content
await session.call_tool("get_workspace_file_info", {
"path": "/Users/user@domain.com/large_file.py"
})
Repo Sync and Notebook Execution
await session.call_tool("sync_repo_and_run_notebook", {
"repo_id": 123,
"notebook_path": "/Repos/user/project/run_me"
})
Create Nightly ETL Job
job_conf = {
"name": "Nightly ETL",
"tasks": [
{
"task_key": "etl",
"notebook_task": {"notebook_path": "/Repos/me/etl.py"},
"existing_cluster_id": "abc-123"
}
]
}
await session.call_tool("create_job", job_conf)
Project Structure
databricks-mcp/
โโโ AGENTS.md # Contributor guidelines (agents/LLM focus)
โโโ ARCHITECTURE.md # Deep architecture walkthrough
โโโ databricks_mcp/ # Main package
โ โโโ __init__.py # Package initialization
โ โโโ __main__.py # Run via `python -m databricks_mcp`
โ โโโ main.py # CLI/stdio launcher
โ โโโ api/ # Databricks API clients
โ โ โโโ clusters.py # Cluster management
โ โ โโโ jobs.py # Job management
โ โ โโโ notebooks.py # Notebook operations
โ โ โโโ sql.py # SQL execution
โ โ โโโ dbfs.py # DBFS operations
โ โโโ core/ # Core functionality
โ โ โโโ auth.py # Authentication helpers
โ โ โโโ config.py # Settings and env loading
โ โ โโโ logging_utils.py # Centralized logging
โ โ โโโ utils.py # HTTP utilities & error helpers
โ โโโ server/ # MCP server implementation
โ โ โโโ __main__.py # Server entry point
โ โ โโโ databricks_mcp_server.py # Main MCP server class
โ โ โโโ tool_helpers.py # Shared response builders
โ โโโ cli/ # Command-line interface
โ โโโ commands.py # CLI commands
โโโ tests/ # Test directory
โ โโโ test_clusters.py # Cluster tests
โ โโโ test_mcp_server.py # Server tests
โ โโโ test_*.py # Other test files
โโโ README.md # Project overview (this file)
โโโ TODO.md # Active refactor checklist
โโโ pyproject.toml # Package metadata
โโโ uv.lock # Dependency lock file
โโโ .gitignore # Git ignore rules
Development
Documentation
- ARCHITECTURE.md โ End-to-end component overview, resource flow, and integration details.
- AGENTS.md โ Contributor guidelines and MCP agent conventions.
Cross-Platform Notes
uvx databricks-mcp-server@latestworks on macOS, Linux, and Windows (PowerShell) without per-platform scripts.- Tests run portably with
uv run pytest; no shell-specific harnesses remain. - Progress notifications and structured outputs follow the MCP spec, so clients on any OS receive the same responses.
Code Standards
- Python code follows PEP 8 style guide with a maximum line length of 100 characters
- Use 4 spaces for indentation (no tabs)
- Use double quotes for strings
- All classes, methods, and functions should have Google-style docstrings
- Type hints are required for all code except tests
Linting
The project uses the following linting tools:
# Run all linters
uv run pylint databricks_mcp/ tests/
uv run flake8 databricks_mcp/ tests/
uv run mypy databricks_mcp/
Testing
The project uses pytest for testing. To run the tests:
# Run all tests with our convenient script
.\scripts\run_tests.ps1
# Run with coverage report
.\scripts\run_tests.ps1 -Coverage
# Run specific tests with verbose output
.\scripts\run_tests.ps1 -Verbose -Coverage tests/test_clusters.py
You can also run the tests directly with pytest:
# Run all tests
uv run pytest tests/
# Run with coverage report
uv run pytest --cov=databricks_mcp tests/ --cov-report=term-missing
A minimum code coverage of 80% is the goal for the project.
Documentation
- API documentation is generated using Sphinx and can be found in the
docs/apidirectory - All code includes Google-style docstrings
- See the
examples/directory for usage examples
Examples
Check the examples/ directory for usage examples. To run examples:
# Run example scripts with uv
uv run examples/direct_usage.py
uv run examples/mcp_client_usage.py
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Ensure your code follows the project's coding standards
- Add tests for any new functionality
- Update documentation as necessary
- Verify all tests pass before submitting
License
This project is licensed under the MIT License - see the LICENSE file for details.
About
A Model Completion Protocol (MCP) server for interacting with Databricks services. Maintained by markov.bot.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databricks_mcp_server-0.4.0.tar.gz.
File metadata
- Download URL: databricks_mcp_server-0.4.0.tar.gz
- Upload date:
- Size: 40.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1207f165fda0556c6af97f381867982d18f4196e5e73053d1a9a5d075a143a9b
|
|
| MD5 |
3c5316cb1501d6888776188857fdb411
|
|
| BLAKE2b-256 |
ca742fe6295ca4e7426ef8b95544058d1b21a2b61b931960c111d510b3f16845
|
File details
Details for the file databricks_mcp_server-0.4.0-py3-none-any.whl.
File metadata
- Download URL: databricks_mcp_server-0.4.0-py3-none-any.whl
- Upload date:
- Size: 35.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d538cccefee4ab97fcaade7e97536d0cf098aaeb7ba6357f9a1ab8f9bbbcdbe6
|
|
| MD5 |
f241140dba6b99b40a1d185044313790
|
|
| BLAKE2b-256 |
267db6a3cd28241d38ba39c1c75ca34f28a151435b39b899609f4a6af53c3583
|