CICD tool for testing and deploying to Databricks
Project description
Databricks CI/CD
Forked from Manol Manolov's original databricks-cicd repo to use with tactivos/data-databricks repo
This is a tool for building CI/CD pipelines for Databricks. It is a python package that works in conjunction with a custom GIT repository (or a simple file structure) to validate and deploy content to databricks. Currently, it can handle the following content:
- Workspace - a collection of notebooks written in Scala, Python, R or SQL
- Jobs - list of Databricks jobs
- Clusters
- Instance Pools
- DBFS - an arbitrary collection of files that may be deployed on a Databricks workspace
Installation
pip install tactivos-databricks-cicd
Requirements
To use this tool, you need a source directory structure (preferably as a private GIT repository) that has the following structure:
any_local_folder_or_git_repo/
├── workspace/
│ ├── some_notebooks_subdir
│ │ └── Notebook 1.py
│ ├── Notebook 2.sql
│ ├── Notebook 3.r
│ └── Notebook 4.scala
├── jobs/
│ ├── My first job.json
│ └── Side gig.json
├── clusters/
│ ├── orion.json
│ └── Another cluster.json
├── instance_pools/
│ ├── Pool 1.json
│ └── Pool 2.json
└── dbfs/
├── strawbery_jam.jar
├── subdir
│ └── some_other.jar
├── some_python.egg
└── Ice cream.jpeg
Note: All folder names represent the default and can be configured. This is just a sample.
Usage
For the latest options and commands run:
cicd -h
A sample command could be:
cicd deploy \
-w sample_12432.7.azuredatabricks.net \
-u john.smith@domain.com \
-t dapi_sample_token_0d5-2 \
-lp '~/git/my-private-repo' \
-tp /blabla \
-c DEV.ini \
--verbose
Note: Paths for windows need to be in double quotes
The default configuration is defined in default.ini and can be overridden with a custom ini file using the -c option, usually one config file per target environment. (sample)
Create content
Notebooks:
- Add a notebook to source
- On the databricks UI go to your notebook.
- Click on
File -> Export -> Source file
. - Add that file to the
workspace
folder of this repo without changing the file name.
Jobs:
- Add a job to source
-
Get the source of the job and write it to a file. You need to have the Databricks CLI and JQ installed. For Windows, it is easier to rename the
jq-win64.exe
tojq.exe
and place it inc:\Windows\System32
folder. Then on Windows/Linux/MAC:databricks jobs get --job-id 74 | jq .settings > Job_Name.json
This downloads the source JSON of the job from the databricks server and pulls only the settings from it, then writes it in to a file.
Note: The file name should be the same as the job name within the json file. Please, avoid spaces in names.
-
Add that file to the
jobs
folder
-
Clusters:
- Add a cluster to source
- Get the source of the cluster and write it to a file.
Note: The file name should be the same as the cluster name within the json file. Please, avoid spaces in names.databricks clusters get --cluster-name orion > orion.json
- Add that file to the
clusters
folder
- Get the source of the cluster and write it to a file.
Instance pools:
- Add an instance pool to source
- Similar to clusters, just use
instance-pools
instead ofclusters
- Similar to clusters, just use
DBFS:
- Add a file to dbfs
- Just add a file to the the
dbfs
folder.
- Just add a file to the the
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tactivos-databricks-cicd-0.1.16.tar.gz
.
File metadata
- Download URL: tactivos-databricks-cicd-0.1.16.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef80aed317701e5523b39c53b252db0c1a5ca4fdf04c8b5b0ebcbe17bcb5d1e4 |
|
MD5 | 74f826a86bd81b9bf495fae90ccb3703 |
|
BLAKE2b-256 | 1d499e0e14cf0d85690a678d61ebb428b43d92fc87fcc3169c17ca796d6749d4 |
File details
Details for the file tactivos_databricks_cicd-0.1.16-py3-none-any.whl
.
File metadata
- Download URL: tactivos_databricks_cicd-0.1.16-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8f2f052ac24d99a7e39b9ea6b332a788896d6d3d752f23384d0996ac1ab11d0 |
|
MD5 | 78f8dd5441a6680062954be6317ede93 |
|
BLAKE2b-256 | 15fa59de245b453e3e934d5efba1fc976cb5fb3a2020e28175deb960db101afe |