A Jupyter kernel for complete remote execution on Databricks clusters
Project description
jupyter-databricks-kernel
A Jupyter kernel for complete remote execution on Databricks clusters.
1. Features
- Execute Python code entirely on Databricks clusters
- Works with VS Code, JupyterLab, and other Jupyter frontends
- CLI execution support with
jupyter executecommand
- Automatic file synchronization to Databricks workspace
- Syncs your local project files to the remote cluster before execution
- Respects
.gitignorepatterns and configurable exclude rules - Configurable size limits to prevent syncing large files
2. Requirements
- Python 3.11 or later
- Databricks workspace with authentication configured (supports Personal Access Token, OAuth M2M with Service Principal, etc.)
- Classic all-purpose cluster
3. Quick Start
-
Install the kernel:
pip install jupyter-databricks-kernel python -m jupyter_databricks_kernel.install
Install options:
Option Description (default) Install to current venv ( sys.prefix)--userInstall to user site ( ~/.local/share/jupyter/kernels/)--prefix PATHInstall to custom path -
Configure authentication and cluster:
# Recommended: Use Databricks CLI to set up everything databricks auth login --configure-cluster
This creates
~/.databrickscfgwith authentication credentials and cluster ID.Alternatively, use environment variables:
# Override cluster ID (optional, takes priority over ~/.databrickscfg) export DATABRICKS_CLUSTER_ID=your-cluster-id # Authentication (if not using ~/.databrickscfg) export DATABRICKS_HOST=https://your-workspace.cloud.databricks.com export DATABRICKS_TOKEN=your-personal-access-token # Use specific profile from ~/.databrickscfg (optional) export DATABRICKS_CONFIG_PROFILE=your-profile-name
For authentication options, see Databricks SDK Authentication.
-
Open a notebook and select "Databricks" kernel:
VS Code:
- Install the Jupyter extension
- Open a
.ipynbfile - Click "Select Kernel" and choose "Databricks"
JupyterLab:
jupyter-lab
Select "Databricks" from the kernel list.
-
Run a simple test:
spark.version
If the cluster is stopped, the first execution may take 5-6 minutes while the cluster starts.
4. Configuration
4.1. Cluster ID
Cluster ID is read from (in order of priority):
DATABRICKS_CLUSTER_IDenvironment variable~/.databrickscfg(from active profile)
Active profile is determined by DATABRICKS_CONFIG_PROFILE environment
variable, or DEFAULT if not set.
Example ~/.databrickscfg:
[DEFAULT]
host = https://your-workspace.cloud.databricks.com
token = dapi...
cluster_id = 0123-456789-abcdef12
4.2. Sync Settings
You can configure file synchronization in pyproject.toml:
[tool.jupyter-databricks-kernel.sync]
enabled = true
source = "."
exclude = ["*.log", "data/"]
max_size_mb = 100.0
max_file_size_mb = 10.0
use_gitignore = true
| Option | Description | Default |
|---|---|---|
sync.enabled |
Enable file synchronization | true |
sync.source |
Source directory to sync | "." |
sync.exclude |
Additional exclude patterns | [] |
sync.max_size_mb |
Maximum total project size in MB | No limit |
sync.max_file_size_mb |
Maximum individual file size in MB | No limit |
sync.use_gitignore |
Respect .gitignore patterns | true |
5. CLI Execution
You can execute notebooks from the command line using jupyter execute:
jupyter execute notebook.ipynb --kernel_name=databricks --inplace
To save the output to a different file:
jupyter execute notebook.ipynb --kernel_name=databricks --output=output.ipynb
5.1. Options
| Option | Description |
|---|---|
--kernel_name |
Kernel name (use databricks) |
--output |
Output file name |
--inplace |
Overwrite input file with results |
--timeout |
Cell execution timeout in seconds |
--startup_timeout |
Kernel startup timeout in seconds (default: 60) |
--allow-errors |
Continue execution even if a cell raises an error |
5.2. Notes
If the cluster is stopped, kernel startup may take 5-6 minutes. Increase
--startup_timeout to avoid timeout errors:
jupyter execute notebook.ipynb --kernel_name=databricks --startup_timeout=600
6. Known Limitations
- Serverless compute is not supported (Command Execution API limitation)
input()and interactive prompts do not work- Interactive widgets (ipywidgets) are not supported
7. Troubleshooting
7.1. Kernel feels slow
File sync may be uploading unnecessary files. Check your sync settings:
-
Ensure
.gitignoreincludes large/unnecessary files:.venv/ __pycache__/ *.pyc data/ *.parquet node_modules/
-
Add exclude patterns in
pyproject.toml:[tool.jupyter-databricks-kernel.sync] exclude = ["data/", "models/", "*.csv"]
-
Set size limits to catch unexpected large files:
[tool.jupyter-databricks-kernel.sync] max_size_mb = 50.0 max_file_size_mb = 10.0
-
Disable sync entirely if not needed:
[tool.jupyter-databricks-kernel.sync] enabled = false
8. Development
See CONTRIBUTING.md for development setup and guidelines.
9. License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jupyter_databricks_kernel-1.2.7.tar.gz.
File metadata
- Download URL: jupyter_databricks_kernel-1.2.7.tar.gz
- Upload date:
- Size: 162.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4db1879f7248312ec6a1dddc69fe05c1dbbe47d6c47484f4dbdee84e2fa8de3d
|
|
| MD5 |
746face8e763a97bdd6dc16a7abc7e0a
|
|
| BLAKE2b-256 |
bcd9d3f3b88224b07df3368748b188aededa206df748714fd7d061dcbbfdef00
|
Provenance
The following attestation bundles were made for jupyter_databricks_kernel-1.2.7.tar.gz:
Publisher:
release.yaml on i9wa4/jupyter-databricks-kernel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jupyter_databricks_kernel-1.2.7.tar.gz -
Subject digest:
4db1879f7248312ec6a1dddc69fe05c1dbbe47d6c47484f4dbdee84e2fa8de3d - Sigstore transparency entry: 1255742763
- Sigstore integration time:
-
Permalink:
i9wa4/jupyter-databricks-kernel@3c4895a22ca43c486aed2d4d22a7456036a453a9 -
Branch / Tag:
refs/tags/v1.2.7 - Owner: https://github.com/i9wa4
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@3c4895a22ca43c486aed2d4d22a7456036a453a9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file jupyter_databricks_kernel-1.2.7-py3-none-any.whl.
File metadata
- Download URL: jupyter_databricks_kernel-1.2.7-py3-none-any.whl
- Upload date:
- Size: 32.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c9f864f32d5d309ab7091efe178cfae45f33d2070e673aad465901cec0bcbf6
|
|
| MD5 |
706acb7d41098b55901c5e8792d5a122
|
|
| BLAKE2b-256 |
4d95509d32b1cd7c4fd8045e03ff67103b1cadde355ec291f83f6b484b423d1c
|
Provenance
The following attestation bundles were made for jupyter_databricks_kernel-1.2.7-py3-none-any.whl:
Publisher:
release.yaml on i9wa4/jupyter-databricks-kernel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jupyter_databricks_kernel-1.2.7-py3-none-any.whl -
Subject digest:
5c9f864f32d5d309ab7091efe178cfae45f33d2070e673aad465901cec0bcbf6 - Sigstore transparency entry: 1255742873
- Sigstore integration time:
-
Permalink:
i9wa4/jupyter-databricks-kernel@3c4895a22ca43c486aed2d4d22a7456036a453a9 -
Branch / Tag:
refs/tags/v1.2.7 - Owner: https://github.com/i9wa4
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@3c4895a22ca43c486aed2d4d22a7456036a453a9 -
Trigger Event:
push
-
Statement type: