CICD tool for testing and deploying to Databricks
Project description
Databricks CI/CD
This is a tool for building CI/CD pipelines for Databricks. It is a python package that works in conjunction with a custom GIT repository (or a simple file structure) to validate and deploy content to databricks. Currently, it can handle the following content:
- Workspace - a collection of notebooks written in Scala, Python, R or SQL
- Jobs - list of Databricks jobs
- Clusters
- Instance Pools
- DBFS - an arbitrary collection of files that may be deployed on a Databricks workspace
Installation
pip install databricks-cicd
Requirements
To use this tool, you need a source directory structure (preferably as a private GIT repository) that has the following structure:
any_local_folder_or_git_repo/
├── workspace/
│ ├── some_notebooks_subdir
│ │ └── Notebook 1.py
│ ├── Notebook 2.sql
│ ├── Notebook 3.r
│ └── Notebook 4.scala
├── jobs/
│ ├── My first job.json
│ └── Side gig.json
├── clusters/
│ ├── orion.json
│ └── Another cluster.json
├── instance_pools/
│ ├── Pool 1.json
│ └── Pool 2.json
└── dbfs/
├── strawbery_jam.jar
├── subdir
│ └── some_other.jar
├── some_python.egg
└── Ice cream.jpeg
Note: All folder names represent the default and can be configured. This is just a sample.
Usage
For the latest options and commands run:
cicd -h
A sample command could be:
cicd deploy \
-w sample_12432.7.azuredatabricks.net \
-u john.smith@domain.com \
-t dapi_sample_token_0d5-2 \
-lp '~/git/my-private-repo' \
-tp /blabla \
-c DEV.ini \
--verbose
Note: Paths for windows need to be in double quotes
The default configuration is defined in default.ini and can be overridden with a custom ini file using the -c option, usually one config file per target environment. (sample)
Create content
Notebooks:
- Add a notebook to source
- On the databricks UI go to your notebook.
- Click on
File -> Export -> Source file
. - Add that file to the
workspace
folder of this repo without changing the file name.
Jobs:
- Add a job to source
-
Get the source of the job and write it to a file. You need to have the Databricks CLI and JQ installed. For Windows, it is easier to rename the
jq-win64.exe
tojq.exe
and place it inc:\Windows\System32
folder. Then on Windows/Linux/MAC:databricks jobs get --job-id 74 | jq .settings > Job_Name.json
This downloads the source JSON of the job from the databricks server and pulls only the settings from it, then writes it in to a file.
Note: The file name should be the same as the job name within the json file. Please, avoid spaces in names.
-
Add that file to the
jobs
folder
-
Clusters:
- Add a cluster to source
- Get the source of the cluster and write it to a file.
Note: The file name should be the same as the cluster name within the json file. Please, avoid spaces in names.databricks clusters get --cluster-name orion > orion.json
- Add that file to the
clusters
folder
- Get the source of the cluster and write it to a file.
Instance pools:
- Add an instance pool to source
- Similar to clusters, just use
instance-pools
instead ofclusters
- Similar to clusters, just use
DBFS:
- Add a file to dbfs
- Just add a file to the the
dbfs
folder.
- Just add a file to the the
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for databricks_cicd-0.1.16-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5137798e2fa8b4001be72cb8715f7097f96775e5f121ee42508daa74ef233428 |
|
MD5 | 5c06acf22affb2bb630db59a87518514 |
|
BLAKE2b-256 | 197182d34f4771bc930a244a6781ccbe00f4da8b909c5cb8926ea5bf39b2dcb8 |