Keep your local python scripts installed and in sync with a databricks notebook. Shortens the feedback loop to develop projects using a hybrid enviroment
Project description
Databricks-Rocket
Databricks-Rocket (short db-rockets), keeps your local Python scripts installed and synchronized with a Databricks notebook. Every change on your local machine is automatically reflected in the notebook. This shortens the feedback loop for developing git-based projects and eliminates the need to set up a local development environment.
Installation
Install databricks-rocket
using pip:
pip install databricks-rocket
Setup
Ensure you've created a personal access token in Databricks (offical documentation). Afterward, set up the Databricks CLI by executing:
databricks configure --token
Alternatively, you can set the Databricks token and host in your environment variables:
export DATABRICKS_HOST="mydatabrickshost"
export DATABRICKS_TOKEN="mydatabrickstoken"
If your project isn't already a pip package, you'll need to convert it into one. Use dbrocket for this:
rocket setup
Will create a setup.py for you.
Usage
To Sync Your Project
By default, databricks-rocket
syncs your project to DBFS automatically. This allows you to update your code and have
those changes reflected in your Databricks notebook without restarting the Python kernel. Simply execute:
rocket launch
You'll then receive the exact command to run in your notebook. Example:
stevenmi@MacBook db-rocket % rocket launch --watch=False
>> Watch activated. Uploaded your project to databricks. Install your project in your databricks notebook by running:
>> %pip install --upgrade pip
>> %pip install -r /dbfs/temp/stevenmi/db-rocket/requirements.txt
>> %pip install --no-deps -e /dbfs/temp/stevenmi/db-rocket
and following in a new Python cell:
>> %load_ext autoreload
>> %autoreload 2
Finally, add the content in you databricks notebook:
Include non-python files
Upload all root level json files:
rocket launch --glob_path="*,json"
On top also upload all env files:
rocket launch --glob_path="[\"*.json\", \".env*\"]"
When specifying lists, be mindful about the formatting of the parameter string.
To Upload Your Python Package
If you've disabled the watch feature, databricks-rocket
will only upload your project as a wheel to DBFS:
rocket launch --watch=False
Example:
stevenmi@MacBook db-rocket % rocket launch --watch=False
>> Watch is disabled. Building creating a python wheel from your project
>> Found setup.py. Building python library
>> Uploaded ./dist/databricks_rocket-2.0.0-py3-none-any.whl to dbfs:/temp/stevenmi/db-rocket/dist/databricks_rocket-2.0.0-py3-none-any.whl
>> Uploaded wheel to databricks. Install your library in your databricks notebook by running:
>> %pip install --upgrade pip
>> %pip install /dbfs/temp/stevenmi/db-rocket/databricks_rocket-2.0.0-py3-none-any.whl --force-reinstall
Blogposts
- DBrocket 2.0: A summary of the big improvements we made to the tool in the new release.
- DB Rocket 1.0 post also gives more details about the rationale around dbrocket.
Support
- Databricks: >=7
- Python: >=3.7
- Tested on Platform: Linux, MacOs. Windows will probably not work but contributions are welcomed!
- Supports uploading to Unity Catalog Volumes starting from version 3.0.0. Note that the underlying dependency,
databricks-sdk
, is still in beta. We do not recommend using UC Volumes in production.
Acknowledgments
- Thanks Leon Poli for the Logo :)
- Thanks Stephane Leonard for source-code and documentation improvements :)
- Thanks Malachi Soord for the CICD setup and README improvements
Contributions are welcomed!
Security
For security issues please contact security@getyourguide.com.
Legal
db-rocket is licensed under the Apache License, Version 2.0. See LICENSE for the full text.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file databricks-rocket-3.0.2.tar.gz
.
File metadata
- Download URL: databricks-rocket-3.0.2.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8902f7ba954626c99057a6d29c7f90966031874f3d1e971e5b05367a088d969c |
|
MD5 | 40c4ed6d06134e06695700e0de5700c3 |
|
BLAKE2b-256 | 1a311506f9d00668160750b3eee797d55668f9956dcc3567def26329a5a9391d |
File details
Details for the file databricks_rocket-3.0.2-py3-none-any.whl
.
File metadata
- Download URL: databricks_rocket-3.0.2-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96c16b8879cf88c1a03ade79ca43d552f396b104f13ba11079b9eeaa515e44ef |
|
MD5 | e54749f94136633b469f4c0e4c9fdd01 |
|
BLAKE2b-256 | 29cb1fb88a6402bb03ab909775798c79a5c74bbd1ef16a80537bb398e53633ae |