Skip to main content

MCP server for Databricks with enhanced Genie AI integration - natural language data analysis

Project description

Databricks MCP Genie

A Model Context Protocol (MCP) server with enhanced Genie AI integration that provides seamless natural language interaction between AI assistants (like Claude Desktop, Cursor) and Databricks workspaces.

What This Does

Enables AI assistants to directly interact with your Databricks workspace:

  • Execute SQL queries and manage warehouses
  • Control clusters (create, start, stop, monitor)
  • Run jobs and notebooks
  • Ask natural language questions with Genie AI
  • Manage Unity Catalog (catalogs, schemas, tables)
  • Work with DBFS, repos, and libraries

Quick Start

For Cursor Users

Team Installation (Recommended): See Cursor Setup Guide for one-click installation instructions.

Quick install:

pip install databricks-mcp-genie

Then configure in Cursor settings - full details in the Cursor Setup Guide.

Prerequisites

  • Python 3.10 or higher
  • Databricks workspace with personal access token
  • Cursor IDE, Claude Desktop, or any MCP-compatible client

Installation

# Install from PyPI (recommended)
pip install databricks-mcp-genie

# Or install from source
git clone https://github.com/sidart10/databricks-mcp-genie.git
cd databricks-mcp-genie
pip install -e ".[dev]"

Configuration

  1. Get your Databricks credentials:

    • Workspace URL: https://your-workspace.cloud.databricks.com
    • Personal Access Token: Generate from User Settings > Developer > Access Tokens
  2. Configure MCP client (Claude Desktop example):

Edit ~/.config/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "databricks": {
      "command": "/path/to/databrics-mcp-server/.venv/bin/python",
      "args": ["-m", "databricks_mcp.main"],
      "cwd": "/path/to/databrics-mcp-server",
      "env": {
        "DATABRICKS_HOST": "https://your-workspace.cloud.databricks.com",
        "DATABRICKS_TOKEN": "your-personal-access-token-here"
      }
    }
  }
}
  1. Restart Claude Desktop

Verify Installation

# Test server starts correctly
.venv/bin/python -m databricks_mcp.main

# Run test suite
.venv/bin/pytest tests/ -v

# Quick server test script
./test_server.sh

Available Features

43 MCP Tools Across 9 API Modules

Genie AI (5 tools) - Natural language data analysis

  • list_genie_spaces - List available Genie AI spaces
  • start_genie_conversation - Ask questions in natural language
  • send_genie_followup - Continue conversations with context
  • get_genie_message_status - Check message processing status
  • get_genie_query_results - Retrieve SQL results from Genie

Clusters API (6 tools)

  • list_clusters, create_cluster, get_cluster
  • start_cluster, terminate_cluster

SQL API (1 tool)

  • execute_sql - Run SQL queries with warehouse

Jobs API (9 tools)

  • list_jobs, create_job, delete_job, run_job
  • list_job_runs, get_run_status, cancel_run
  • run_notebook, sync_repo_and_run_notebook

Notebooks API (5 tools)

  • list_notebooks, export_notebook, import_notebook
  • delete_workspace_object, get_workspace_file_content, get_workspace_file_info

DBFS API (3 tools)

  • list_files, dbfs_put, dbfs_delete

Unity Catalog API (7 tools)

  • list_catalogs, create_catalog
  • list_schemas, create_schema
  • list_tables, create_table, get_table_lineage

Repos API (4 tools)

  • list_repos, create_repo, update_repo, pull_repo

Libraries API (3 tools)

  • install_library, uninstall_library, list_cluster_libraries

Usage Examples

Using with Claude Desktop

Once configured, you can ask Claude to interact with Databricks:

"List all my running clusters"
"Execute this SQL query: SELECT * FROM my_catalog.my_schema.my_table LIMIT 10"
"Ask Genie: What were the top products by revenue last month?"
"Create a new job to run my ETL notebook daily"

Programmatic Usage

from databricks_mcp.server import DatabricksMCPServer

# Initialize server
server = DatabricksMCPServer()

# Use via MCP protocol
server.run()

Direct API Usage

from databricks_mcp.api import clusters, genie, sql

# List clusters
clusters_list = await clusters.list_clusters()

# Ask Genie a question
response = await genie.start_conversation(
    space_id="01efc298aabd1ae9bac6128988a6eaaa",
    question="Show me revenue trends by product category"
)

# Execute SQL
results = await sql.execute_sql(
    statement="SELECT * FROM sales.orders LIMIT 100",
    warehouse_id="your-warehouse-id"
)

Project Structure

databrics-mcp-server/
├── databricks_mcp/           # Main Python package
│   ├── api/                  # API modules (clusters, sql, genie, etc.)
│   ├── core/                 # Core utilities and config
│   ├── server/               # MCP server implementation
│   └── cli/                  # CLI commands
├── tests/                    # Test suite
├── examples/                 # Usage examples
├── scripts/                  # Utility scripts
├── docs/                     # Documentation
├── pyproject.toml           # Package configuration
├── .mcp.json                # MCP client configuration
└── test_server.sh           # Quick server test

Troubleshooting

Server Won't Start

Check logs: databricks_mcp.log

Common issues:

  • Invalid credentials in .mcp.json
  • Incorrect Python path in MCP config
  • Missing dependencies (run pip install -e ".[dev]")

Import Errors

# Verify all imports work
.venv/bin/python -c "from databricks_mcp.server import DatabricksMCPServer"
.venv/bin/python -c "from databricks_mcp.api import clusters, sql, genie"

Connection Issues

Verify credentials:

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-token"
.venv/bin/python -c "
from databricks_mcp.api import clusters
import asyncio
print(asyncio.run(clusters.list_clusters()))
"

See TROUBLESHOOTING.md for detailed solutions.

Development

Running Tests

# All tests
.venv/bin/pytest tests/ -v

# Specific test file
.venv/bin/pytest tests/test_clusters.py -v

# With coverage
.venv/bin/pytest tests/ --cov=databricks_mcp

Code Quality

# Format code
.venv/bin/black databricks_mcp/

# Lint
.venv/bin/pylint databricks_mcp/

Adding New Tools

  1. Add API function in databricks_mcp/api/
  2. Register tool in databricks_mcp/server/databricks_mcp_server.py:
@self.tool(
    name="your_tool_name",
    description="What your tool does with parameters: param1 (required), param2 (optional)"
)
async def your_tool(params: Dict[str, Any]) -> List[TextContent]:
    try:
        actual_params = _unwrap_params(params)
        result = await your_api_module.your_function(actual_params)
        return [{"type": "text", "text": json.dumps(result)}]
    except Exception as e:
        logger.error(f"Error: {str(e)}")
        return [{"type": "text", "text": json.dumps({"error": str(e)})}]

Documentation

Setup & Installation

Development & Publishing

Requirements

  • Python >=3.10
  • mcp[cli] >=1.2.0
  • httpx
  • databricks-sdk
  • pytest (dev)
  • black (dev)
  • pylint (dev)

License

MIT License - See LICENSE file for details

Acknowledgments

Package: databricks-mcp-genie Maintainer: Sid Original Author: Olivier Debeuf De Rijcker (databricks-mcp) Repository: https://github.com/sidart10/databricks-mcp-genie

Special thanks to:

  • Olivier Debeuf De Rijcker for the original databricks-mcp implementation
  • Anthropic for Claude and the MCP protocol
  • Databricks for their comprehensive SDK and Genie AI
  • The open source community

Built with Claude Code - AI-assisted development tool by Anthropic

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_mcp_genie-1.0.0.tar.gz (3.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databricks_mcp_genie-1.0.0-py3-none-any.whl (41.0 kB view details)

Uploaded Python 3

File details

Details for the file databricks_mcp_genie-1.0.0.tar.gz.

File metadata

  • Download URL: databricks_mcp_genie-1.0.0.tar.gz
  • Upload date:
  • Size: 3.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_mcp_genie-1.0.0.tar.gz
Algorithm Hash digest
SHA256 234cb429ad2140f87faf668fed36b81dd02f0b6da8f999246d885d0abcf24f3c
MD5 de38bf0271ffa2ded42b54fc1142fd7f
BLAKE2b-256 6678a84f6897881cdd2e785d7d6ff7cf2a03461acb62fede0bcdc32c70e83e51

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_mcp_genie-1.0.0.tar.gz:

Publisher: publish.yml on sidart10/databrics-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file databricks_mcp_genie-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_mcp_genie-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1ba0939ab267a9f762020d017566558d0e3d4a0d75d272420ce552494695dd6c
MD5 18fbcdf1c572ff9a9e813b224a2b48fe
BLAKE2b-256 b04d4ebe8755de629d34898cd028f16653e444d34bbd310ef0005b8b5151f81e

See more details on using hashes here.

Provenance

The following attestation bundles were made for databricks_mcp_genie-1.0.0-py3-none-any.whl:

Publisher: publish.yml on sidart10/databrics-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page