Skip to main content

CICD tool for testing and deploying to Databricks

Project description

Databricks CI/CD

PyPI Latest Release

Forked from Manol Manolov's original databricks-cicd repo to use with tactivos/data-databricks repo

This is a tool for building CI/CD pipelines for Databricks. It is a python package that works in conjunction with a custom GIT repository (or a simple file structure) to validate and deploy content to databricks. Currently, it can handle the following content:

  • Workspace - a collection of notebooks written in Scala, Python, R or SQL
  • Jobs - list of Databricks jobs
  • Clusters
  • Instance Pools
  • DBFS - an arbitrary collection of files that may be deployed on a Databricks workspace

Installation

pip install tactivos-databricks-cicd

Requirements

To use this tool, you need a source directory structure (preferably as a private GIT repository) that has the following structure:

any_local_folder_or_git_repo/
├── workspace/
│   ├── some_notebooks_subdir
│   │   └── Notebook 1.py
│   ├── Notebook 2.sql
│   ├── Notebook 3.r
│   └── Notebook 4.scala
├── jobs/
│   ├── My first job.json
│   └── Side gig.json
├── clusters/
│   ├── orion.json
│   └── Another cluster.json
├── instance_pools/
│   ├── Pool 1.json
│   └── Pool 2.json
└── dbfs/
    ├── strawbery_jam.jar
    ├── subdir
    │   └── some_other.jar
    ├── some_python.egg
    └── Ice cream.jpeg

Note: All folder names represent the default and can be configured. This is just a sample.

Usage

For the latest options and commands run:

cicd -h

A sample command could be:

cicd deploy \
   -w sample_12432.7.azuredatabricks.net \
   -u john.smith@domain.com \
   -t dapi_sample_token_0d5-2 \
   -lp '~/git/my-private-repo' \
   -tp /blabla \
   -c DEV.ini \
   --verbose

Note: Paths for windows need to be in double quotes

The default configuration is defined in default.ini and can be overridden with a custom ini file using the -c option, usually one config file per target environment. (sample)

Create content

Notebooks:

  1. Add a notebook to source
    1. On the databricks UI go to your notebook.
    2. Click on File -> Export -> Source file.
    3. Add that file to the workspace folder of this repo without changing the file name.

Jobs:

  1. Add a job to source
    1. Get the source of the job and write it to a file. You need to have the Databricks CLI and JQ installed. For Windows, it is easier to rename the jq-win64.exe to jq.exe and place it in c:\Windows\System32 folder. Then on Windows/Linux/MAC:

      databricks jobs get --job-id 74 | jq .settings > Job_Name.json
      

      This downloads the source JSON of the job from the databricks server and pulls only the settings from it, then writes it in to a file.

      Note: The file name should be the same as the job name within the json file. Please, avoid spaces in names.

    2. Add that file to the jobs folder

Clusters:

  1. Add a cluster to source
    1. Get the source of the cluster and write it to a file.
      databricks clusters get --cluster-name orion > orion.json
      
      Note: The file name should be the same as the cluster name within the json file. Please, avoid spaces in names.
    2. Add that file to the clusters folder

Instance pools:

  1. Add an instance pool to source
    1. Similar to clusters, just use instance-pools instead of clusters

DBFS:

  1. Add a file to dbfs
    1. Just add a file to the the dbfs folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tactivos-databricks-cicd-0.1.16.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

tactivos_databricks_cicd-0.1.16-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file tactivos-databricks-cicd-0.1.16.tar.gz.

File metadata

File hashes

Hashes for tactivos-databricks-cicd-0.1.16.tar.gz
Algorithm Hash digest
SHA256 ef80aed317701e5523b39c53b252db0c1a5ca4fdf04c8b5b0ebcbe17bcb5d1e4
MD5 74f826a86bd81b9bf495fae90ccb3703
BLAKE2b-256 1d499e0e14cf0d85690a678d61ebb428b43d92fc87fcc3169c17ca796d6749d4

See more details on using hashes here.

File details

Details for the file tactivos_databricks_cicd-0.1.16-py3-none-any.whl.

File metadata

File hashes

Hashes for tactivos_databricks_cicd-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 f8f2f052ac24d99a7e39b9ea6b332a788896d6d3d752f23384d0996ac1ab11d0
MD5 78f8dd5441a6680062954be6317ede93
BLAKE2b-256 15fa59de245b453e3e934d5efba1fc976cb5fb3a2020e28175deb960db101afe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page