Skip to main content

Keep your local python scripts installed and in sync with a databricks notebook. Shortens the feedback loop to develop projects using a hybrid enviroment

Project description

Databricks-Rocket

PyPI version PyPI downloads

Databricks-Rocket (short db-rockets), keeps your local Python scripts installed and synchronized with a Databricks notebook. Every change on your local machine is automatically reflected in the notebook. This shortens the feedback loop for developing git-based projects and eliminates the need to set up a local development environment.

Installation

Install databricks-rocket using pip:

pip install databricks-rocket

Setup

Ensure you've created a personal access token in Databricks (offical documentation). Afterward, set up the Databricks CLI by executing:

databricks configure --token

Alternatively, you can set the Databricks token and host in your environment variables:

export DATABRICKS_HOST="mydatabrickshost"
export DATABRICKS_TOKEN="mydatabrickstoken"

If your project isn't already a pip package, you'll need to convert it into one. Use dbrocket for this:

rocket setup

Will create a setup.py for you.

Usage

To Sync Your Project

By default, databricks-rocket syncs your project to DBFS automatically. This allows you to update your code and have those changes reflected in your Databricks notebook without restarting the Python kernel. Simply execute:

rocket launch

You'll then receive the exact command to run in your notebook. Example:

stevenmi@MacBook db-rocket % rocket launch --watch=False
>> Watch activated. Uploaded your project to databricks. Install your project in your databricks notebook by running:
>> %pip install --upgrade pip
>> %pip install  -r /dbfs/temp/stevenmi/db-rocket/requirements.txt
>> %pip install --no-deps -e /dbfs/temp/stevenmi/db-rocket

and following in a new Python cell:
>> %load_ext autoreload
>> %autoreload 2

Finally, add the content in you databricks notebook: imgs/img_2.png

Include non-python files

Upload all root level json files:

rocket launch --glob_path="*,json"

On top also upload all env files:

rocket launch --glob_path="[\"*.json\", \".env*\"]"

When specifying lists, be mindful about the formatting of the parameter string.

To Upload Your Python Package

If you've disabled the watch feature, databricks-rocket will only upload your project as a wheel to DBFS:

rocket launch --watch=False

Example:

stevenmi@MacBook db-rocket % rocket launch --watch=False
>> Watch is disabled. Building creating a python wheel from your project
>> Found setup.py. Building python library
>> Uploaded ./dist/databricks_rocket-2.0.0-py3-none-any.whl to dbfs:/temp/stevenmi/db-rocket/dist/databricks_rocket-2.0.0-py3-none-any.whl
>> Uploaded wheel to databricks. Install your library in your databricks notebook by running:
>> %pip install --upgrade pip
>> %pip install  /dbfs/temp/stevenmi/db-rocket/databricks_rocket-2.0.0-py3-none-any.whl --force-reinstall

Blogposts

  • DBrocket 2.0: A summary of the big improvements we made to the tool in the new release.
  • DB Rocket 1.0 post also gives more details about the rationale around dbrocket.

Support

  • Databricks: >=7
  • Python: >=3.7
  • Tested on Platform: Linux, MacOs. Windows will probably not work but contributions are welcomed!
  • Supports uploading to Unity Catalog Volumes starting from version 3.0.0. Note that the underlying dependency, databricks-sdk, is still in beta. We do not recommend using UC Volumes in production.

Acknowledgments

  • Thanks Leon Poli for the Logo :)
  • Thanks Stephane Leonard for source-code and documentation improvements :)
  • Thanks Malachi Soord for the CICD setup and README improvements

Contributions are welcomed!

Security

For security issues please contact security@getyourguide.com.

Legal

db-rocket is licensed under the Apache License, Version 2.0. See LICENSE for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-rocket-3.0.3.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

databricks_rocket-3.0.3-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file databricks-rocket-3.0.3.tar.gz.

File metadata

  • Download URL: databricks-rocket-3.0.3.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for databricks-rocket-3.0.3.tar.gz
Algorithm Hash digest
SHA256 fc1cba18311bf6d658bb6c24afd03015cbca3026e9b489873615c6098b6db1bc
MD5 906c4bc3b4bcc9cbc52e35738f7cb954
BLAKE2b-256 c9bdb59272207147a057af1401232c881ca5fe8bff9a112c428ea70adb8c712c

See more details on using hashes here.

File details

Details for the file databricks_rocket-3.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_rocket-3.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 34e96e4a84f5cfa8e80cf7d71fc5c0779e89878a89924f91f12dcc45bfaf5b39
MD5 06a9f16c45e01044daf386a353e23b25
BLAKE2b-256 90517aaae5b24234aa95230d4351157a0b0ed62823526c3465f4df5f31d236e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page