Skip to main content

Snowflake CI/CD utility

Project description

Snowflake CICD utility

Python based CICD tool for for managing Snowflake databases.

Table of contents:

  1. Installation
  2. Configuration
  3. Usage
  4. Work cycle
  5. Use cases
  6. How it works
  7. Troubleshooting

Installation

1. Install Python

Make sure that python3 executable is present in your path:

$ python3
Python 3.7.7 (default, Mar 10 2020, 15:43:33)
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Deployment script won't work with Python2!

2. Install python dependencies

This should be as easy as running:

$ python3 -m pip install -r requirements.txt

Of course you can use pip instead, but this way it easier to make sure the packages will be installed in a proper location.

Feel free to use one of Python's environment managers.

Configuration

Current version supports Key Pair Authentication with Snowflake. Only unencrypted keys are supported! There is an undocumented username+password method but should not be used.

If you have your key pair already generated you can jump to section 3.

1. Generate key pair

Generate unencrypted private key:

$ openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out [username]_snowflake.p8 -nocrypt

Now generate the public key by referencing the private key:

$ openssl rsa -in [username]_snowflake.p8 -pubout -out [username]_snowflake.pub

Assign the public key to the Snowflake user using ALTER USER. Run the query using your Snowflake's client:

alter user [username] set rsa_public_key='MIIBIjANBgkqh...
WbyzHiMVJw8u+...
xwIDAQAB';

To fill in the rsa_public_key part you have to copy [username]_snowflake.pub file contents and paste it in SQL console.

Note: You have to remove -----BEGIN PUBLIC KEY----- and -----END PUBLIC KEY----- lines from public key in your SQL query.

Now place your keys in a safe location. For Linux and OSX this can be for example:

$ mv [username]_snowflake.* ~/.ssh/

2. Prepare data model repository

Prepare (or reuse existing) GIT repository that holds data model.

If you don't have one, you can follow this procedure:

# you are in `snowflake-cicd` folder
$ cp -r model-repo-sample ../data_model
$ cd ../data_model
$ git init

3. Update configuration file

Go to your model GIT repository, and open .snowflake-cicd.ini file:

# you are in `snowflake-cicd` folder
$ cp snowflake-cicd.ini ~/.snowflake-cicd.ini
$ edit .snowflake-cicd.ini

In the .snowflake-cicd.ini file you have to fill in at least two fields: user and private_key.

Usage

You can run cicd without arguments to see help:

$ cicd
usage: cicd [-h] [-v] [-t] [-f]
                 {abandoned,clone,compare,deploy,diff,history,migrate,prepare,sync,test_sync}
                 [{abandoned,clone,compare,deploy,diff,history,migrate,prepare,sync,test_sync} ...]

Git <-> Snowflake sync and automatic deployment.

Actions:

  prepare               Prepares release candidate file.
  deploy                Deploys changes from release candidate file.
  migrate               prepare + deploy 
  validate              Validates all .sql files in model directory
  history               Prints release history.
  clone                 Clones (or replaces) database based on prod.
  sync                  Syncs unapplied changes from model and releases dirs.
  test_sync             Test release on a separate clone (run it before creating pull request).
  compare               Compares Snowflake and current branch DDLs.
  diff                  Prints diff from production.
  abandoned             Compares active branches and development clones.

positional arguments:
  {prepare,deploy,migrate,validate,history,clone,sync,test_sync,compare,diff,abandoned}
                        Action to run
                        Action to run

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Verbose mode. Shows SQL statements.
  -t, --dry-run         Show SQL to be executed, but doesn't run it.
  -f, --force           Force command without yes/no question asked in terminal.
  --file [FILE]         Filename for compare action

Actions

abandoned

Compares active development branches and database clones. It does so by querying all the databases LIKE '_DEV%' in INFORMATION_SCHEMA.DATABASES. For each clone found it is searching for a corresponding branch.

Branch names are converted to database clone names this way:

  1. Uppercase
  2. Replace all non-words characters (\W) with underscore (_)
  3. Take first 30 chars
  4. Add prefix _DEV_

This way branch feature/DW-1249-create-pre-deploy-and-post-deploy-tests is mapped to _DEV_FEATURE_DW_1249_CREATE_PRE_DEP.

abandoned example output:

|____________clone_____________|_____created______|_____updated______|_has branch?__|
| _DEV_DW_1336_CREATE_A_SAMPLE | 2020-04-20 17:18 | 2020-04-20 17:19 |      no      |
| _DEV_FEATURE_A               | 2020-04-22 12:04 | 2020-04-22 12:06 |      no      |
| _DEV_FEATURE_SQL_QUERIES_IN_ | 2020-04-23 19:27 | 2020-04-23 19:29 |      no      |
| _DEV_FEATURE_DW_1249_CREATE_ | 2020-04-24 23:13 | 2020-04-24 23:14 |     yes      |

Database clones with no in the last column (has branch?) should be dropped.

clone

Clones (or replaces) database based on production. New clone is created based on active branch name (see branch to clone mapping in abandoned action to learn how branch name is mapped to clone name). Due to this it is impossible to run clone on main branch (CICD will refuse to run).

If target clone does not exist the action will perform immediately. If target clone exists CICD will ask for confirmation before running. This behavior can be overwritten by --force flag.

clone example output:

Checking if _DEV_FEATURE_A already exists...
_DEV_FEATURE_A already exists. Are you sure you want to replace it (y/n): y
Cloning DWH into _DEV_FEATURE_A
Cloning finished

Example run on main branch:

Trying to create new database with name dwh...

compare

Compare allows you to list all the objects modified in your cloned database since cloning. The comparison is made purely on a database level. Even if altered an object manually on the server (ignoring GIT and releases history table) the object will be listed.

Technically compare list all the objects from INFORMATION_SCHEMA.[TABLES|FUNCTIONS|FILE_FORMATS|EXTERNAL_TABLES|PIPES|PROCEDURES|SEQUENCES|STAGES|VIEWS] looking for those with LAST_ALTERED greater than current clone creation time (+1 minute).

Comparison algorithm assumes the object was changed in both situation: it's definition has changed or it contents has changed (in case of a table).

compare example output:

|___schema___|__________object name___________|____type____|___last change____|
|     PUBLIC . dwh_releases_history           |   TABLE    | 2020-04-12 15:42 |
|        RAW . lnk_event_sess                 |   TABLE    | 2020-04-12 15:42 |
|        RAW . hub_event                      |   TABLE    | 2020-04-12 15:42 |
|         DM . dim_page                       |    VIEW    | 2020-04-12 15:42 |
|        RAW . load_hub_session               | PROCEDURE  | 2020-04-12 15:42 |
|        BIZ . sat_event                      |    VIEW    | 2020-04-12 15:42 |

In the example above all the listed objects were somehow altered after the clone was created. All the (beside dwh_releases_history) should be included in release file.

deploy

Deploys changes from release candidate file.

diff

Show all the changes in model folder between current branch and main. Technically it runs:

$ git diff --color=always main model

diff example output:

diff --git a/model/DV_Layers/dm/Dimension/tables/dim_time.sql b/model/DV_Layers/dm/Dimension/tables/dim_time.sql
index c9a758c..3303895 100644
@@ -2,7 +2,7 @@ CREATE TABLE dm.dim_time
        sk_time varchar(4) PRIMARY KEY NOT NULL,
        time_of_day     varchar(8) NOT NULL,
-       hour_of_day varchar(2) NOT NULL,
+       hour_of_day varchar(4) NOT NULL,
        minute_of_hour varchar(2) NOT NULL

history

This action displays a comparison of all the releases applied in on clone and production. It works only on a database level. Use diff action if you want to see the comparison on GIT level.

Technically the comparison is done by comparing entries in tables PUBLIC.DWH_RELEASES_HISTORY both in the production database, and cloned database.

history example output:

|_commit hash_|___________file name____________|__applied by__|____applied on____|branch|_prod_|
| bc2cbffee0  | <<init>>                       |     USER     | 2020-04-05 15:16 |  •   |  •   |
| 84d474c10a  | lites/views/view_sat_event.sql |     USER     | 2020-04-08 14:55 |  •   |  •   |
| dffcae544d  | imension/views/dim_browser.sql |     USER     | 2020-04-08 14:56 |  •   |  •   |
| dc84b4ac53  | m/Dimension/views/dim_page.sql |     USER     | 2020-04-08 14:56 |  •   |  •   |
| 0b66beaeac  | releases/fact_visits.sql       |     USER     | 2020-04-08 14:56 |      |  •   |
| 4f79836a92  | releases/dim_page.sql          |     USER     | 2020-04-08 14:57 |  •   |      |

In the example above we can see that:

  1. Release file named releases/dim_page.sql was applied only in our branch. Most likely this is something we did in our branch.
  2. Release file named releases/fact_visits.sql was applied on production but not applied in our clone. There are two possible ways of syncing with production: a. Faster: apply this missing release file by running sync action. b. Safer: run clone again to get a fresh copy of production.

migrate

This is an equivalent of:

$ cicd prepare deploy

In other words it prepares releases/release_candidate.sql file first, and then releases it. This can work only if you are making safe changes (as described in prepare action).

prepare

Prepares release candidate file. Let's assume we are on a branch feature/a and we already cloned database into _DEV_FEATURE_A (as described in branch to clone mapping). Prepare works this way:

  1. Getting latest applied commit in a current clone. It's as simple as selecting the latest COMMIT from _DEV_FEATURE_A.PUBLIC.DWH_RELEASES_HISTORY.
  2. Getting all the modified (renamed, removed, added) files in model folder. These files were not yet applied on current clone.
  3. Creating the file: releases/release_candidate.sql. This file has a recipe on how to apply changes on clone. Additionally releases/release_candidate.sha file is created with release_candidate.sql checksum inside.

release_candidate.sql file contains entries in this form:

--.Release candidate file, branch: feature_a

--.File was changed but *will not be* included in the release.
-- [M] NOT_INCLUDED:model/DV_Layers/dm/Dimension/tables/dim_time.sql
--.DIFF: @@ -5 +5 @@ CREATE TABLE dm.dim_time
--.DIFF: -	hour_of_day varchar(2) NOT NULL,
--.DIFF: +	hour_of_day varchar(3) NOT NULL,

   You may need to include an ALTER TABLE statement to properly sync database state:
   ALTER TABLE DM.DIM_TIME <<HERE>>;

--.File was changed and will be included in the release.
--.You can change the order of the INCLUDE: statement below.
-- [M] INCLUDED:model/DV_Layers/dm/Dimension/views/dim_event_la.sql

As you can see each modified (renamed, removed, added) leaves an entry in release_candidate.sql. There are two main types of entries:

Safe changes

Safe changes are those that can be applied without any additional attention. These are:

  • newly created files
  • modified files with objects other that tables and streams

You can easily notice that view dim_event_la.sql is a safe change. Updated view definition can be easily applied on the server. There are two easy ways of detecting safe changes:

  1. By looking at INCLUDED: keyword in file description line
  2. By having no non-comment lines below the entry. Notice that there is nothing below [M] INCLUDED:model/.../dim_event_la.sql

In most of the cases you don't have to do anything with this entries. Exceptions are:

  1. This is a table or stream object definition that you already added into production database.

    In this case you should change keyword INCLUDE to NOT_INCLUDE to avoid table/stream definition being executed.

  2. The order or INCLUDE statements is wrong.

    In this case you shoud reorder the entries accordingly. You have to be careful with lines with filenames. Those starting from --. are comments that can be ignored.

Unsafe changes

Unsafe changes are:

  • modified or renamed files with table or stream definition
  • removed files with objects of any type definition

You can easily notice that table dim_time.sql is not a safe change. Updated table definition can not be easily applied on the server. You have to prepare a proper ALTER TABLE statement. There are two easy ways of detecting unsafe changes:

  1. By looking at NON_INCLUDED: keyword in file description line
  2. By having a non-comment lines below the entry. Notice the sentences below [M] INCLUDED:model/.../dim_time.sql

Your job here is to replace the suggestion and sample code generated by CICD and fill it in with a valid SQL statement (if necessary). In the example above we should replace:

   You may need to include an ALTER TABLE statement to properly sync database state:
   ALTER TABLE DM.DIM_TIME <<HERE>>;

with:

ALTER TABLE DM.DIM_TIME CHANGE hour_of_day hour_of_day varchar(3) NOT NULL;

To make this process easier you have a difference highlighted. Notice lines starting from --.DIFF:

@@ -5 +5 @@ CREATE TABLE dm.dim_time
-	hour_of_day varchar(2) NOT NULL,
+	hour_of_day varchar(3) NOT NULL,

sync

Sync performs all the missing releases in releases folder and run them against the database. It is somehow similar to prepare but it scans releases folder (with already prepared releases) instead of raw model folder.

Syncing can be necessary if you merge changes from other branch (including main) into your branch. Imagine this scenario:

  1. You branched out to feature/a and created already a release file releases/feature_a.sql.
  2. Other person was working on a feature/b, created releases/feature_b.sql and merge it to main.
  3. You decided to git merge before creating a pull request. This created releases/feature_b.sql in your repository. You have two options (as described in history action): a. Faster: apply this missing release file by running sync action. b. Safer: run clone again to get a fresh copy of production.

sync example output:

  [A] releases/feature_DW-1249-create-pre-deploy-and-post-deploy-tests.sql
Syncing changes:
Running release file releases/feature_DW-1249-create-pre-deploy-and-post-deploy-tests.sql:
   'DWH_RELEASES_HISTORY was sucesfully created',
New entry in release history table with d31197261424b6c96ae7b8d31fcc1ac9220f934b

test_sync

Test sync combines three operations in one action:

  1. Creates a fresh clone of production database. This clone has a new name made out last commit SHA, not branch name. This way it's independent from all you existing changes.
  2. Syncs all the unsynchronized releases in releases directory.
  3. Drops clone created in step 1.

This step is useful for testing your changes before creating a pull-request. It ensures your release file is deployable on production.

test_sync example output:

Cloning DWH into _DEV_26FC3058369ACECB30C48A9C69FB47
Cloning finished
   [A] releases/feature_DW-1249-create-pre-deploy-and-post-deploy-tests.sql
Syncing changes:
Running release file releases/ feature_DW-1249-create-pre-deploy-and-post-deploy-tests.sql:
   'DWH_RELEASES_HISTORY was sucesfully created',
 New entry in release history table with d31197261424b6c96ae7b8d31fcc1ac9220f934b
 Dropping clone _DEV_26FC3058369ACECB30C48A

Optional arguments

--help

Displays help message. Running CICD without arguments will also cause the script to show help.

--verbose

Use --verbose or -v to display more details. It enables debugging messages and print-outs all the SQL queries being raised against Snowflake.

--dry-run

Use --dry-run or -t with deploy or sync actions to preview the release SQL to be performed. Has no impact on other actions.

--force

Use --force or -f to:

  1. Ignore question Clone already exists. Are you sure you want to replace it while running clone action.
  2. Bypass error releases/release_candidate.sql was changed or you changed branch, can't create new release candidate. In this case it the file will be overwritten.

Work cycle

1. Create branch

Start from branching out from main branch. It's usually main but it doesn't have to. You can branch out from other active branch.

You can branch-out using Jira, Bitbucket or command line. In all the examples below we are going to use command line and we are going to work on branch feature/a.

# checking active branch
$ git status
On branch main
[...]
# making sure we have all the latest changes
$ git pull
# branching out
$ git checkout -b feature/a

2. Clone database

We have a branch to work on. Now we need a fresh clone of production to have a safe destination of all our changes.

Clone database into new _DEV_FEATURE_A with:

$ cicd clone

(read more about cloning)

Now we have a new branch feature/a and new development clone _DEV_FEATURE_A. Both equal to production.

3. Modify and deploy changes

Work on files in the model/ folder and migrate all the changes with:

$ cicd migrate

Migrate is actually running two tasks: prepare followed by deploy. The second step can raise an error if you made an unsafe change. It usually happens the moment you changed table or stream, or you dropped a file from model folder.

In this case the script raises an error:

Code placeholder (<<HERE>>) found in the release candidate file.

You have to manually change the releases/release_candidate.sql file as described in unsafe changes section. Once done you can repeat the second step with:

$ cicd deploy

You should notice that while you modify files in model folder and deploy your changes in your clone, there is a new file being updated after each action. It's releases/feature_a.sql file. This file holds all the changes you've made.

4. Check your changes

You are give three ways to check your changes. Let's start from source code level.

To preview detailed diff of your branch against production run:

$ cicd diff

You can also compare all the releases applied on your clone and compare the with production by running history action:

$ cicd history

The last option to examine your changes is it preview all the objects definitions changed in your cloned database since it's creation. To do so run compare:

$ cicd compare

5. Test your changes

Before submitting a pull-request test your changes against a fresh copy of your database. There is a command that automates all the steps. It test_sync:

$ cicd test_sync

If it's all good you are ready to create a pull-request!

Use cases

1. Changing object definition

All situations in which you modify an object DDL. This including renaming the object (and a corresponding file).

a) safe changes

This is the simplest scenario. Open any .sql file under model/ folder and perform changes. Then run:

$ cicd migrate

The script will detect the change and reapply it on the server.

b) unsafe changes

Change table or stream definition in the model/ folder and commit your changes. Then run:

$ cicd prepare

Open the releases/release_candidate.sql file. You will notice that the changed table is listed in the release_candidate text, and you have to manually add a proper ALTER TABLE SQL statement.

Once done run:

$ cicd deploy

2. Adding a new object

It's as simple as adding a new file into model/ folder, and commiting your changes, and running:

$ cicd migrate

This works regardless the type of new object.

3. Dropping an object

Remove a file from the model/ folder and commit your changes. Then run:

$ cicd prepare

Open the releases/release_candidate.sql file. You will notice that the dropped file is listed in the release_candidate text, and you have to manually add a proper DROP OBJECT SQL statement.

Once done run:

$ cicd deploy

4. Altering a database state

All the other SQL statements that are not tracked in the model/ folder can be also applied and tracked. To do so simply run:

$ cicd prepare

This will generate an empty releases/release_candidate.sql file. Open it and add any SQL code you like.

Once done run:

$ cicd deploy

How it works

It's explained partially in the Actions section.

Troubleshooting

In most of the situations CICD is not leaving you alone. It explains what went wrong and what you can do about it.

For example if you start working on model folder before cloning, script will suggest you doing so the moment you run prepare for the first time.

If your deployment script failed, the script will print out the exact SQL statement that failed, allow you to fix it and deploy again.

In other cases adding --verbose can be useful.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snowflake-cicd-0.1.1.tar.gz (32.1 kB view hashes)

Uploaded Source

Built Distribution

snowflake_cicd-0.1.1-py3-none-any.whl (28.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page