Skip to main content

CICD tool for testing and deploying to Databricks

Project description

Databricks CI/CD

PyPI Latest Release

This is a tool for building CI/CD pipelines for Databricks. It is a python package that works in conjunction with a custom GIT repository (or a simple file structure) to validate and deploy content to databricks. Currently, it can handle the following content:

  • Workspace - a collection of notebooks written in Scala, Python, R or SQL
  • Jobs - list of Databricks jobs
  • Clusters
  • Instance Pools
  • DBFS - an arbitrary collection of files that may be deployed on a Databricks workspace

Installation

pip install databricks-cicd

Requirements

To use this tool, you need a source directory structure (preferably as a private GIT repository) that has the following structure:

any_local_folder_or_git_repo/
├── workspace/
│   ├── some_notebooks_subdir
│   │   └── Notebook 1.py
│   ├── Notebook 2.sql
│   ├── Notebook 3.r
│   └── Notebook 4.scala
├── jobs/
│   ├── My first job.json
│   └── Side gig.json
├── clusters/
│   ├── orion.json
│   └── Another cluster.json
├── instance_pools/
│   ├── Pool 1.json
│   └── Pool 2.json
└── dbfs/
    ├── strawbery_jam.jar
    ├── subdir
    │   └── some_other.jar
    ├── some_python.egg
    └── Ice cream.jpeg

Note: All folder names represent the default and can be configured. This is just a sample.

Usage

For the latest options and commands run:

cicd -h

A sample command could be:

cicd deploy \
   -w sample_12432.7.azuredatabricks.net \
   -u john.smith@domain.com \
   -t dapi_sample_token_0d5-2 \
   -lp '~/git/my-private-repo' \
   -tp /blabla \
   -c DEV.ini \
   --verbose

Note: Paths for windows need to be in double quotes

The default configuration is defined in default.ini and can be overridden with a custom ini file using the -c option, usually one config file per target environment. (sample)

Create content

Notebooks:

  1. Add a notebook to source
    1. On the databricks UI go to your notebook.
    2. Click on File -> Export -> Source file.
    3. Add that file to the workspace folder of this repo without changing the file name.

Jobs:

  1. Add a job to source
    1. Get the source of the job and write it to a file. You need to have the Databricks CLI and JQ installed. For Windows, it is easier to rename the jq-win64.exe to jq.exe and place it in c:\Windows\System32 folder. Then on Windows/Linux/MAC:

      databricks jobs get --job-id 74 | jq .settings > Job_Name.json
      

      This downloads the source JSON of the job from the databricks server and pulls only the settings from it, then writes it in to a file.

      Note: The file name should be the same as the job name within the json file. Please, avoid spaces in names.

    2. Add that file to the jobs folder

Clusters:

  1. Add a cluster to source
    1. Get the source of the cluster and write it to a file.
      databricks clusters get --cluster-name orion > orion.json
      
      Note: The file name should be the same as the cluster name within the json file. Please, avoid spaces in names.
    2. Add that file to the clusters folder

Instance pools:

  1. Add an instance pool to source
    1. Similar to clusters, just use instance-pools instead of clusters

DBFS:

  1. Add a file to dbfs
    1. Just add a file to the the dbfs folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-cicd-0.1.16.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databricks_cicd-0.1.16-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file databricks-cicd-0.1.16.tar.gz.

File metadata

  • Download URL: databricks-cicd-0.1.16.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for databricks-cicd-0.1.16.tar.gz
Algorithm Hash digest
SHA256 423a15d2c01f7d9b5bfd62768774a76ee2e4c74d106d378f361e1be43e82c05b
MD5 3e37001a80d6c5a75238c1c98aa68451
BLAKE2b-256 75c3b126b2f1e295aa13810ad20eb3b39dad85b9f5fce501d1afd43729daa91a

See more details on using hashes here.

File details

Details for the file databricks_cicd-0.1.16-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_cicd-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 5137798e2fa8b4001be72cb8715f7097f96775e5f121ee42508daa74ef233428
MD5 5c06acf22affb2bb630db59a87518514
BLAKE2b-256 197182d34f4771bc930a244a6781ccbe00f4da8b909c5cb8926ea5bf39b2dcb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page