Skip to main content

A Jupyter kernel for complete remote execution on Databricks clusters

Project description

jupyter-databricks-kernel

PyPI version CI License Python

Ask DeepWiki

A Jupyter kernel for complete remote execution on Databricks clusters.

1. Features

  • Execute Python code entirely on Databricks clusters
    • Works with VS Code, JupyterLab, and other Jupyter frontends
    • CLI execution support with jupyter execute command
  • Automatic file synchronization to Databricks workspace
    • Syncs your local project files to the remote cluster before execution
    • Respects .gitignore patterns and configurable exclude rules
    • Configurable size limits to prevent syncing large files

2. Requirements

  • Python 3.11 or later
  • Databricks workspace with authentication configured (supports Personal Access Token, OAuth M2M with Service Principal, etc.)
  • Classic all-purpose cluster

3. Quick Start

  1. Install the kernel:

    pip install jupyter-databricks-kernel
    python -m jupyter_databricks_kernel.install
    

    Install options:

    Option Description
    (default) Install to current venv (sys.prefix)
    --user Install to user site (~/.local/share/jupyter/kernels/)
    --prefix PATH Install to custom path
  2. Configure authentication and cluster:

    # Recommended: Use Databricks CLI to set up everything
    databricks auth login --configure-cluster
    

    This creates ~/.databrickscfg with authentication credentials and cluster ID.

    Alternatively, use environment variables:

    # Override cluster ID (optional, takes priority over ~/.databrickscfg)
    export DATABRICKS_CLUSTER_ID=your-cluster-id
    
    # Authentication (if not using ~/.databrickscfg)
    export DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
    export DATABRICKS_TOKEN=your-personal-access-token
    
    # Use specific profile from ~/.databrickscfg (optional)
    export DATABRICKS_CONFIG_PROFILE=your-profile-name
    

    For authentication options, see Databricks SDK Authentication.

  3. Open a notebook and select "Databricks" kernel:

    VS Code:

    1. Install the Jupyter extension
    2. Open a .ipynb file
    3. Click "Select Kernel" and choose "Databricks"

    JupyterLab:

    jupyter-lab
    

    Select "Databricks" from the kernel list.

  4. Run a simple test:

    spark.version
    

If the cluster is stopped, the first execution may take 5-6 minutes while the cluster starts.

4. Configuration

4.1. Cluster ID

Cluster ID is read from (in order of priority):

  1. DATABRICKS_CLUSTER_ID environment variable
  2. ~/.databrickscfg (from active profile)

Active profile is determined by DATABRICKS_CONFIG_PROFILE environment variable, or DEFAULT if not set.

Example ~/.databrickscfg:

[DEFAULT]
host = https://your-workspace.cloud.databricks.com
token = dapi...
cluster_id = 0123-456789-abcdef12

4.2. Sync Settings

You can configure file synchronization in pyproject.toml:

[tool.jupyter-databricks-kernel.sync]
enabled = true
source = "."
exclude = ["*.log", "data/"]
max_size_mb = 100.0
max_file_size_mb = 10.0
use_gitignore = true
Option Description Default
sync.enabled Enable file synchronization true
sync.source Source directory to sync "."
sync.exclude Additional exclude patterns []
sync.max_size_mb Maximum total project size in MB No limit
sync.max_file_size_mb Maximum individual file size in MB No limit
sync.use_gitignore Respect .gitignore patterns true

5. CLI Execution

You can execute notebooks from the command line using jupyter execute:

jupyter execute notebook.ipynb --kernel_name=databricks --inplace

To save the output to a different file:

jupyter execute notebook.ipynb --kernel_name=databricks --output=output.ipynb

5.1. Options

Option Description
--kernel_name Kernel name (use databricks)
--output Output file name
--inplace Overwrite input file with results
--timeout Cell execution timeout in seconds
--startup_timeout Kernel startup timeout in seconds (default: 60)
--allow-errors Continue execution even if a cell raises an error

5.2. Notes

If the cluster is stopped, kernel startup may take 5-6 minutes. Increase --startup_timeout to avoid timeout errors:

jupyter execute notebook.ipynb --kernel_name=databricks --startup_timeout=600

6. Known Limitations

  • Serverless compute is not supported (Command Execution API limitation)
  • input() and interactive prompts do not work
  • Interactive widgets (ipywidgets) are not supported

7. Troubleshooting

7.1. Kernel feels slow

File sync may be uploading unnecessary files. Check your sync settings:

  1. Ensure .gitignore includes large/unnecessary files:

    .venv/
    __pycache__/
    *.pyc
    data/
    *.parquet
    node_modules/
    
  2. Add exclude patterns in pyproject.toml:

    [tool.jupyter-databricks-kernel.sync]
    exclude = ["data/", "models/", "*.csv"]
    
  3. Set size limits to catch unexpected large files:

    [tool.jupyter-databricks-kernel.sync]
    max_size_mb = 50.0
    max_file_size_mb = 10.0
    
  4. Disable sync entirely if not needed:

    [tool.jupyter-databricks-kernel.sync]
    enabled = false
    

8. Development

See CONTRIBUTING.md for development setup and guidelines.

9. License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jupyter_databricks_kernel-1.2.7.tar.gz (162.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jupyter_databricks_kernel-1.2.7-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file jupyter_databricks_kernel-1.2.7.tar.gz.

File metadata

File hashes

Hashes for jupyter_databricks_kernel-1.2.7.tar.gz
Algorithm Hash digest
SHA256 4db1879f7248312ec6a1dddc69fe05c1dbbe47d6c47484f4dbdee84e2fa8de3d
MD5 746face8e763a97bdd6dc16a7abc7e0a
BLAKE2b-256 bcd9d3f3b88224b07df3368748b188aededa206df748714fd7d061dcbbfdef00

See more details on using hashes here.

Provenance

The following attestation bundles were made for jupyter_databricks_kernel-1.2.7.tar.gz:

Publisher: release.yaml on i9wa4/jupyter-databricks-kernel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file jupyter_databricks_kernel-1.2.7-py3-none-any.whl.

File metadata

File hashes

Hashes for jupyter_databricks_kernel-1.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 5c9f864f32d5d309ab7091efe178cfae45f33d2070e673aad465901cec0bcbf6
MD5 706acb7d41098b55901c5e8792d5a122
BLAKE2b-256 4d95509d32b1cd7c4fd8045e03ff67103b1cadde355ec291f83f6b484b423d1c

See more details on using hashes here.

Provenance

The following attestation bundles were made for jupyter_databricks_kernel-1.2.7-py3-none-any.whl:

Publisher: release.yaml on i9wa4/jupyter-databricks-kernel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page