Skip to main content

A Jupyter kernel for complete remote execution on Databricks clusters

Project description

jupyter-databricks-kernel

PyPI version CI License Python

A Jupyter kernel for complete remote execution on Databricks clusters.

1. Features

  • Execute Python code entirely on Databricks clusters
  • Works with VS Code, JupyterLab, and other Jupyter frontends

2. Requirements

  • Python 3.11 or later
  • Databricks workspace with Personal Access Token
  • Classic all-purpose cluster

3. Quick Start

  1. Install the kernel:

    # With uv
    uv pip install jupyter-databricks-kernel
    uv run python -m jupyter_databricks_kernel.install
    
    # With pip
    pip install jupyter-databricks-kernel
    python -m jupyter_databricks_kernel.install
    

    Install options:

    Option Description
    (default) Install to current venv (sys.prefix)
    --user Install to user directory (~/.local/share/jupyter/kernels/)
    --prefix PATH Install to custom path
  2. Configure authentication and cluster:

    # Recommended: Use Databricks CLI to set up everything
    databricks auth login --configure-cluster
    

    This creates ~/.databrickscfg with authentication credentials and cluster ID.

    Alternatively, use environment variables:

    # Override cluster ID (optional, takes priority over ~/.databrickscfg)
    export DATABRICKS_CLUSTER_ID=your-cluster-id
    
    # Authentication (if not using ~/.databrickscfg)
    export DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
    export DATABRICKS_TOKEN=your-personal-access-token
    
    # Use specific profile from ~/.databrickscfg (optional)
    export DATABRICKS_CONFIG_PROFILE=your-profile-name
    

    For authentication options, see Databricks SDK Authentication.

  3. Open a notebook and select "Databricks" kernel:

    VS Code:

    1. Install the Jupyter extension
    2. Open a .ipynb file
    3. Click "Select Kernel" and choose "Databricks"

    JupyterLab:

    jupyter-lab
    

    Select "Databricks" from the kernel list.

  4. Run a simple test:

    spark.version
    

If the cluster is stopped, the first execution may take 5-6 minutes while the cluster starts.

4. Configuration

4.1. Cluster ID

Cluster ID is read from (in order of priority):

  1. DATABRICKS_CLUSTER_ID environment variable
  2. ~/.databrickscfg (from active profile)

Active profile is determined by DATABRICKS_CONFIG_PROFILE environment variable, or DEFAULT if not set.

Example ~/.databrickscfg:

[DEFAULT]
host = https://your-workspace.cloud.databricks.com
token = dapi...
cluster_id = 0123-456789-abcdef12

4.2. Sync Settings

You can configure file synchronization in pyproject.toml:

[tool.jupyter-databricks-kernel.sync]
enabled = true
source = "."
exclude = ["*.log", "data/"]
max_size_mb = 100.0
max_file_size_mb = 10.0
use_gitignore = true
Option Description Default
sync.enabled Enable file synchronization true
sync.source Source directory to sync "."
sync.exclude Additional exclude patterns []
sync.max_size_mb Maximum total project size in MB No limit
sync.max_file_size_mb Maximum individual file size in MB No limit
sync.use_gitignore Respect .gitignore patterns true

5. Known Limitations

  • Serverless compute is not supported (Command Execution API limitation)
  • input() and interactive prompts do not work
  • Interactive widgets (ipywidgets) are not supported

6. Troubleshooting

6.1. Kernel feels slow

File sync may be uploading unnecessary files. Check your sync settings:

  1. Ensure .gitignore includes large/unnecessary files:

    .venv/
    __pycache__/
    *.pyc
    data/
    *.parquet
    node_modules/
    
  2. Add exclude patterns in pyproject.toml:

    [tool.jupyter-databricks-kernel.sync]
    exclude = ["data/", "models/", "*.csv"]
    
  3. Set size limits to catch unexpected large files:

    [tool.jupyter-databricks-kernel.sync]
    max_size_mb = 50.0
    max_file_size_mb = 10.0
    
  4. Disable sync entirely if not needed:

    [tool.jupyter-databricks-kernel.sync]
    enabled = false
    

7. Development

See CONTRIBUTING.md for development setup and guidelines.

8. License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jupyter_databricks_kernel-1.1.1.tar.gz (138.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jupyter_databricks_kernel-1.1.1-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file jupyter_databricks_kernel-1.1.1.tar.gz.

File metadata

File hashes

Hashes for jupyter_databricks_kernel-1.1.1.tar.gz
Algorithm Hash digest
SHA256 d1e5eaf138b44795581ca5d3cb83d6a41a964f70e7352ecb12a45c865684abb2
MD5 1ec8a2038cef5905b7a679a50fa79ffc
BLAKE2b-256 49da8ecf760e5cffefe7a96843e0cb130be2acdd03b06e64b8d03238c3ed2e37

See more details on using hashes here.

Provenance

The following attestation bundles were made for jupyter_databricks_kernel-1.1.1.tar.gz:

Publisher: publish.yaml on i9wa4/jupyter-databricks-kernel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file jupyter_databricks_kernel-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for jupyter_databricks_kernel-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 495c8d14449a721ac11e7a4541cf5a8df3b692cc5907e99c1d9f6c99020d34f6
MD5 4438a8cc2f3414be2109122ce7a77e72
BLAKE2b-256 6ebe3ab2ae571d63dd934bf731841ce99ad3afe6008211070f0a2acef3e0c822

See more details on using hashes here.

Provenance

The following attestation bundles were made for jupyter_databricks_kernel-1.1.1-py3-none-any.whl:

Publisher: publish.yaml on i9wa4/jupyter-databricks-kernel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page