Skip to main content

Compares models in dbt during an open PR

Project description

dbt_table_diff

This repository is intended for comparing BigQuery models in dbt that have changed during an open PR.

PyPI version CodeFactor Grade GitHub license
GitHub pull requests GitHub issues GitHub contributors
GitHub Release Date GitHub last commit GitHub commit activity
GitHub forks GitHub stars GitHub watchers Twitter Follow

Usage

The repository has been published as a Github Action and PyPi Package, which means it can be leveraged in a variety of ways:


Quick Start:

pip3 install dbt_table_diff

Example Code Usage:

from dbt_table_diff import run_dbt_table_diff

run_dbt_table_diff(
        project_id="ultimate-bit-359101",
        keyfile_path="secrets/bq_keyfile.json",
        manifest_file="target/manifest.json",
        dev_prefix="dev_",
        prod_prefix="prod_",
        fallback_prefix="fb_",
        custom_checks_path="",
        ignored_schemas=[],
        irregular_schemas=[],
        org_name="org-not-included",
        repo_name="dbt_example",
        pr_id="2",
        auth_token="my_github_pat",
)

Example CLI Usage:

python3 -m dbt_table_diff -t $GH_TOKEN -o org-not-included -r dbt_example -l 2 \
--manifest_file 'target/manifest.json' --project_id 'ultimate-bit-359101' \
--keyfile_path 'secrets/bq_keyfile.json' --dev_prefix 'dev_' --prod_prefix 'prod_' --fallback_prefix 'fb_'

Example Github Action Usage:

  • Overview of Github Actions
  • Open PR showing how to use dbt_table_diff as a Github Action.

Github Actions Input Arguments:

Input Parameter Description
GCP_TOKEN for connecting to BQ (runs dbt compile and dbt_table_diff/sql_checks to compare tables)
GH_TOKEN for connecting to Github (ie. fetches modified models/*.sql in your PR, adds comment on your PR)
PR_NUMBER for fetching open PR from github (Pull Request ID [int])
GH_REPO for fetching open PR from github (Repository Name)
GH_ORG for fetching open PR from github (Repository owner/organization name)
DBT_PROFILE_FILE the local path in your repo to your profile.yml for dbt (this is necessary for compiling manifest.json during setup process)
dev_prefix the prefix used when running dbt locally (Your source schema/environment for comparison)
prod_prefix the prefix used when running dbt remotely (Your target schema/environment for comparison)
fallback_prefix useful if you have an overriden macro for generate_schema_name in your dbt project, which leverages a different prefix for some schemas in prod.
irregular_schemas comma separated string of schemas which use fallback_prefix
project_id for connecting to BQ (BigQuery Project ID)
ignored_schemas comma separated string of schemas to ignore (skip checking during github action)
custom_checks_path A local folder containing any custom SQL checks to run.

Step-By-Step Break Down of Process:

  • Fetches list of files modified in Pull Request
    • by CURLing github.api.com/repos/{organization}/{repository}/pulls/{pull_request_id}/files
  • Filters on relevant_files
    • which are files matching models/*.sql
  • Builds manifest.json
    • By running dbt deps; dbt compile
  • Parses manifest.json for relevant_models
    • using manifest-attribute original_file_path matching relevant_files
  • Runs all SQL files in dbt_table_diff/sql_checks
    • for each of the relevant_models, compare the two dbt targets (dev_prefix vs prod_prefix)
  • Saves output to file
    • in a format supported by Github comments
  • Posts comment on open PR
    • leveraging dbt_table_diff PyPi package

Docs

python3 -m dbt_table_diff --help

usage: dbt_table_diff [-h] [-o ORG_NAME] [-r REPO_NAME] [-t AUTH_TOKEN] [-l PR_ID] [--manifest_file MANIFEST_FILE] [--project_id PROJECT_ID] [--keyfile_path KEYFILE_PATH] [--ignored_schemas IGNORED_SCHEMAS]
                      [--irregular_schemas IRREGULAR_SCHEMAS] [--dev_prefix DEV_PREFIX] [--prod_prefix PROD_PREFIX] [--fallback_prefix FALLBACK_PREFIX] [--custom_checks_path CUSTOM_CHECKS_PATH]

optional arguments:
  -h, --help            show this help message and exit
  -o ORG_NAME, --org_name ORG_NAME
                        Owner of GitHub repository.
  -r REPO_NAME, --repo_name REPO_NAME
                        Name of the GitHub repository.
  -t AUTH_TOKEN, --auth_token AUTH_TOKEN
                        User's GitHub Personal Access Token.
  -l PR_ID, --pr_id PR_ID
                        The issue # of the Pull Request.
  --manifest_file MANIFEST_FILE
                        The path to dbt's manifest file.
  --project_id PROJECT_ID
                        The BigQuery Project ID to leverage.
  --keyfile_path KEYFILE_PATH
                        The path to the keyfile to use during BQ calls.
  --ignored_schemas IGNORED_SCHEMAS
                        Folders in models/ to always ignore during row/col checks.
  --irregular_schemas IRREGULAR_SCHEMAS
                        Folders in models/ which use 'fallback_prefix' in prod.
  --dev_prefix DEV_PREFIX
                        Prefix used by development datasets in dbt.
  --prod_prefix PROD_PREFIX
                        Prefix used by production datasets in dbt.
  --fallback_prefix FALLBACK_PREFIX
                        Uncommon prefix used by only some production datasets in dbt.
  --custom_checks_path CUSTOM_CHECKS_PATH
                        A local folder containing any custom SQL to run.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_table_diff-2.2.3.tar.gz (22.5 kB view hashes)

Uploaded Source

Built Distribution

dbt_table_diff-2.2.3-py3-none-any.whl (21.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page