Skip to main content

Compares models in dbt during an open PR

Project description

Overview:

This repository is intended for comparing models in dbt that have changed during an open PR.

Note: It only currently supports BigQuery.


Usage

The repository has been published as a Github Action and PyPi Package, which means it can be leveraged in a variety of ways:


Quick Start:

pip3 install dbt_table_diff

Example Code Usage:

from dbt_table_diff import run_dbt_table_diff

run_dbt_table_diff(
        project_id="ultimate-bit-359101",
        keyfile_path="secrets/bq_keyfile.json",
        manifest_file="target/manifest.json",
        dev_prefix="dev_",
        prod_prefix="prod_",
        fallback_prefix="fb_",
        custom_checks_path="",
        ignored_schemas=[],
        irregular_schemas=[],
        org_name="org-not-included",
        repo_name="dbt_example",
        pr_id="2",
        auth_token="my_github_pat",
)

Example CLI Usage:

python3 -m dbt_table_diff -t $GH_TOKEN -o org-not-included -r dbt_example -l 2 \
--manifest_file 'target/manifest.json' --project_id 'ultimate-bit-359101' \
--keyfile_path 'secrets/bq_keyfile.json' --dev_prefix 'dev_' --prod_prefix 'prod_' --fallback_prefix 'fb_'

Example Github Action Usage:

  • Overview of Github Actions
  • Open PR showing how to use dbt_table_diff as a Github Action.

Github Actions Input Arguments:

Input Parameter Description
GCP_TOKEN for connecting to BQ (runs dbt compile and dbt_table_diff/sql_checks to compare tables)
GH_TOKEN for connecting to Github (ie. fetches modified models/*.sql in your PR, adds comment on your PR)
PR_NUMBER for fetching open PR from github (Pull Request ID [int])
GH_REPO for fetching open PR from github (Repository Name)
GH_ORG for fetching open PR from github (Repository owner/organization name)
DBT_PROFILE_FILE the local path in your repo to your profile.yml for dbt (this is necessary for compiling manifest.json during setup process)
dev_prefix the prefix used when running dbt locally (Your source schema/environment for comparison)
prod_prefix the prefix used when running dbt remotely (Your target schema/environment for comparison)
fallback_prefix useful if you have an overriden macro for generate_schema_name in your dbt project, which leverages a different prefix for some schemas in prod.
irregular_schemas comma separated string of schemas which use fallback_prefix
project_id for connecting to BQ (BigQuery Project ID)
ignored_schemas comma separated string of schemas to ignore (skip checking during github action)

Step-By-Step Break Down of Process:

  • Fetches list of files modified in Pull Request
    • by CURLing github.api.com/repos/{organization}/{repository}/pulls/{pull_request_id}/files
  • Filters on relevant_files
    • which are files matching models/*.sql
  • Builds manifest.json
    • By running dbt deps; dbt compile
  • Parses manifest.json for relevant_models
    • using manifest-attribute original_file_path matching relevant_files
  • Runs all SQL files in dbt_table_diff/sql_checks
    • for each of the relevant_models, compare the two dbt targets (dev_prefix vs prod_prefix)
  • Saves output to file
    • in a format supported by Github comments
  • Posts comment on open PR
    • leveraging py-github-helper PyPi package

Docs

python3 -m dbt_table_diff --help

usage: dbt_table_diff [-h] [-o ORG_NAME] [-r REPO_NAME] [-t AUTH_TOKEN] [-l PR_ID] [--manifest_file MANIFEST_FILE] [--project_id PROJECT_ID] [--keyfile_path KEYFILE_PATH] [--ignored_schemas IGNORED_SCHEMAS]
                      [--irregular_schemas IRREGULAR_SCHEMAS] [--dev_prefix DEV_PREFIX] [--prod_prefix PROD_PREFIX] [--fallback_prefix FALLBACK_PREFIX] [--custom_checks_path CUSTOM_CHECKS_PATH]

optional arguments:
  -h, --help            show this help message and exit
  -o ORG_NAME, --org_name ORG_NAME
                        Owner of GitHub repository.
  -r REPO_NAME, --repo_name REPO_NAME
                        Name of the GitHub repository.
  -t AUTH_TOKEN, --auth_token AUTH_TOKEN
                        User's GitHub Personal Access Token.
  -l PR_ID, --pr_id PR_ID
                        The issue # of the Pull Request.
  --manifest_file MANIFEST_FILE
                        The path to dbt's manifest file.
  --project_id PROJECT_ID
                        The BigQuery Project ID to leverage.
  --keyfile_path KEYFILE_PATH
                        The path to the keyfile to use during BQ calls.
  --ignored_schemas IGNORED_SCHEMAS
                        Folders in models/ to always ignore during row/col checks.
  --irregular_schemas IRREGULAR_SCHEMAS
                        Folders in models/ which use 'fallback_prefix' in prod.
  --dev_prefix DEV_PREFIX
                        Prefix used by development datasets in dbt.
  --prod_prefix PROD_PREFIX
                        Prefix used by production datasets in dbt.
  --fallback_prefix FALLBACK_PREFIX
                        Uncommon prefix used by only some production datasets in dbt.
  --custom_checks_path CUSTOM_CHECKS_PATH
                        A local folder containing any custom SQL to run.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_table_diff-2.2.2.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

dbt_table_diff-2.2.2-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file dbt_table_diff-2.2.2.tar.gz.

File metadata

  • Download URL: dbt_table_diff-2.2.2.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for dbt_table_diff-2.2.2.tar.gz
Algorithm Hash digest
SHA256 1c5b77021974ae7f6aaa7aad774b186e997d04107794caa604e04750baa7f329
MD5 60cfa554363f1be7ae16300e967702b6
BLAKE2b-256 704b983ac9f4350cb193c5527c7770dcbfb696dd92e984d78ca00a36acdab2d6

See more details on using hashes here.

File details

Details for the file dbt_table_diff-2.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_table_diff-2.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5be352dc18107192321b2f0c0f3209835c2c994ca4166243e9ce02ebc991bd36
MD5 d7fb319aa03e81338f2dab739bf38daf
BLAKE2b-256 1f29e58ed78a34640b68b363865850925138ac7482a62fb503c7f03810ba5c71

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page