Compares models in dbt during an open PR
Project description
Overview:
This repository is intended for comparing models
in dbt
that have changed during an open PR.
Note: It only currently supports BigQuery
.
Usage
The repository has been published as a Github Action
and PyPi Package
, which means it can be leveraged in a variety of ways:
- Directly in Python via
run_dbt_table_diff
. - Directly in Terminal via
python3 -m dbt_table_diff
. - In a Github Workflow File via
Github Actions
to automatically add comments on Open PRs.
Quick Start:
pip3 install dbt_table_diff
Example Code Usage:
from dbt_table_diff import run_dbt_table_diff
run_dbt_table_diff(
project_id="ultimate-bit-359101",
keyfile_path="secrets/bq_keyfile.json",
manifest_file="target/manifest.json",
dev_prefix="dev_",
prod_prefix="prod_",
fallback_prefix="fb_",
custom_checks_path="",
ignored_schemas=[],
irregular_schemas=[],
org_name="org-not-included",
repo_name="dbt_example",
pr_id="2",
auth_token="my_github_pat",
)
Example CLI Usage:
python3 -m dbt_table_diff -t $GH_TOKEN -o org-not-included -r dbt_example -l 2 \
--manifest_file 'target/manifest.json' --project_id 'ultimate-bit-359101' \
--keyfile_path 'secrets/bq_keyfile.json' --dev_prefix 'dev_' --prod_prefix 'prod_' --fallback_prefix 'fb_'
Example Github Action Usage:
Github Actions Input Arguments:
Input Parameter | Description |
---|---|
GCP_TOKEN | for connecting to BQ (runs dbt compile and dbt_table_diff/sql_checks to compare tables) |
GH_TOKEN | for connecting to Github (ie. fetches modified models/*.sql in your PR, adds comment on your PR) |
PR_NUMBER | for fetching open PR from github (Pull Request ID [int]) |
GH_REPO | for fetching open PR from github (Repository Name) |
GH_ORG | for fetching open PR from github (Repository owner/organization name) |
DBT_PROFILE_FILE | the local path in your repo to your profile.yml for dbt (this is necessary for compiling manifest.json during setup process) |
dev_prefix | the prefix used when running dbt locally (Your source schema/environment for comparison) |
prod_prefix | the prefix used when running dbt remotely (Your target schema/environment for comparison) |
fallback_prefix | useful if you have an overriden macro for generate_schema_name in your dbt project, which leverages a different prefix for some schemas in prod. |
irregular_schemas | comma separated string of schemas which use fallback_prefix |
project_id | for connecting to BQ (BigQuery Project ID) |
ignored_schemas | comma separated string of schemas to ignore (skip checking during github action) |
Step-By-Step Break Down of Process:
- Fetches list of files modified in Pull Request
- by CURLing
github.api.com/repos/{organization}/{repository}/pulls/{pull_request_id}/files
- by CURLing
- Filters on
relevant_files
- which are files matching
models/*.sql
- which are files matching
- Builds
manifest.json
- By running
dbt deps; dbt compile
- By running
- Parses
manifest.json
forrelevant_models
- using manifest-attribute
original_file_path
matchingrelevant_files
- using manifest-attribute
- Runs all SQL files in
dbt_table_diff/sql_checks
- for each of the
relevant_models
, compare the two dbt targets (dev_prefix
vsprod_prefix
)
- for each of the
- Saves output to file
- in a format supported by Github comments
- Posts comment on open PR
- leveraging
py-github-helper
PyPi package
- leveraging
Docs
python3 -m dbt_table_diff --help
usage: dbt_table_diff [-h] [-o ORG_NAME] [-r REPO_NAME] [-t AUTH_TOKEN] [-l PR_ID] [--manifest_file MANIFEST_FILE] [--project_id PROJECT_ID] [--keyfile_path KEYFILE_PATH] [--ignored_schemas IGNORED_SCHEMAS]
[--irregular_schemas IRREGULAR_SCHEMAS] [--dev_prefix DEV_PREFIX] [--prod_prefix PROD_PREFIX] [--fallback_prefix FALLBACK_PREFIX] [--custom_checks_path CUSTOM_CHECKS_PATH]
optional arguments:
-h, --help show this help message and exit
-o ORG_NAME, --org_name ORG_NAME
Owner of GitHub repository.
-r REPO_NAME, --repo_name REPO_NAME
Name of the GitHub repository.
-t AUTH_TOKEN, --auth_token AUTH_TOKEN
User's GitHub Personal Access Token.
-l PR_ID, --pr_id PR_ID
The issue # of the Pull Request.
--manifest_file MANIFEST_FILE
The path to dbt's manifest file.
--project_id PROJECT_ID
The BigQuery Project ID to leverage.
--keyfile_path KEYFILE_PATH
The path to the keyfile to use during BQ calls.
--ignored_schemas IGNORED_SCHEMAS
Folders in models/ to always ignore during row/col checks.
--irregular_schemas IRREGULAR_SCHEMAS
Folders in models/ which use 'fallback_prefix' in prod.
--dev_prefix DEV_PREFIX
Prefix used by development datasets in dbt.
--prod_prefix PROD_PREFIX
Prefix used by production datasets in dbt.
--fallback_prefix FALLBACK_PREFIX
Uncommon prefix used by only some production datasets in dbt.
--custom_checks_path CUSTOM_CHECKS_PATH
A local folder containing any custom SQL to run.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dbt_table_diff-2.2.2.tar.gz
(21.5 kB
view details)
Built Distribution
File details
Details for the file dbt_table_diff-2.2.2.tar.gz
.
File metadata
- Download URL: dbt_table_diff-2.2.2.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c5b77021974ae7f6aaa7aad774b186e997d04107794caa604e04750baa7f329 |
|
MD5 | 60cfa554363f1be7ae16300e967702b6 |
|
BLAKE2b-256 | 704b983ac9f4350cb193c5527c7770dcbfb696dd92e984d78ca00a36acdab2d6 |
File details
Details for the file dbt_table_diff-2.2.2-py3-none-any.whl
.
File metadata
- Download URL: dbt_table_diff-2.2.2-py3-none-any.whl
- Upload date:
- Size: 20.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5be352dc18107192321b2f0c0f3209835c2c994ca4166243e9ce02ebc991bd36 |
|
MD5 | d7fb319aa03e81338f2dab739bf38daf |
|
BLAKE2b-256 | 1f29e58ed78a34640b68b363865850925138ac7482a62fb503c7f03810ba5c71 |