Skip to main content

Compares models in dbt during an open PR

Project description

dbt_table_diff

This repository is intended for comparing BigQuery models in dbt that have changed during an open PR.

PyPI version CodeFactor Grade GitHub license
GitHub pull requests GitHub issues GitHub contributors
GitHub Release Date GitHub last commit GitHub commit activity
GitHub forks GitHub stars GitHub watchers Twitter Follow

Usage

The repository has been published as a Github Action and PyPi Package, which means it can be leveraged in a variety of ways:


Quick Start:

pip3 install dbt_table_diff

Example Code Usage:

from dbt_table_diff import run_dbt_table_diff

run_dbt_table_diff(
        project_id="ultimate-bit-359101",
        keyfile_path="secrets/bq_keyfile.json",
        manifest_file="target/manifest.json",
        dev_prefix="dev_",
        prod_prefix="prod_",
        fallback_prefix="fb_",
        custom_checks_path="",
        ignored_schemas=[],
        irregular_schemas=[],
        org_name="org-not-included",
        repo_name="dbt_example",
        pr_id="2",
        auth_token="my_github_pat",
)

Example CLI Usage:

python3 -m dbt_table_diff -t $GH_TOKEN -o org-not-included -r dbt_example -l 2 \
--manifest_file 'target/manifest.json' --project_id 'ultimate-bit-359101' \
--keyfile_path 'secrets/bq_keyfile.json' --dev_prefix 'dev_' --prod_prefix 'prod_' --fallback_prefix 'fb_'

Example Github Action Usage:

  • Overview of Github Actions
  • Open PR showing how to use dbt_table_diff as a Github Action.

Github Actions Input Arguments:

Input Parameter Description
GCP_TOKEN for connecting to BQ (runs dbt compile and dbt_table_diff/sql_checks to compare tables)
GH_TOKEN for connecting to Github (ie. fetches modified models/*.sql in your PR, adds comment on your PR)
PR_NUMBER for fetching open PR from github (Pull Request ID [int])
GH_REPO for fetching open PR from github (Repository Name)
GH_ORG for fetching open PR from github (Repository owner/organization name)
DBT_PROFILE_FILE the local path in your repo to your profile.yml for dbt (this is necessary for compiling manifest.json during setup process)
dev_prefix the prefix used when running dbt locally (Your source schema/environment for comparison)
prod_prefix the prefix used when running dbt remotely (Your target schema/environment for comparison)
fallback_prefix useful if you have an overriden macro for generate_schema_name in your dbt project, which leverages a different prefix for some schemas in prod.
irregular_schemas comma separated string of schemas which use fallback_prefix
project_id for connecting to BQ (BigQuery Project ID)
ignored_schemas comma separated string of schemas to ignore (skip checking during github action)
custom_checks_path A local folder containing any custom SQL checks to run.

Step-By-Step Break Down of Process:

  • Fetches list of files modified in Pull Request
    • by CURLing github.api.com/repos/{organization}/{repository}/pulls/{pull_request_id}/files
  • Filters on relevant_files
    • which are files matching models/*.sql
  • Builds manifest.json
    • By running dbt deps; dbt compile
  • Parses manifest.json for relevant_models
    • using manifest-attribute original_file_path matching relevant_files
  • Runs all SQL files in dbt_table_diff/sql_checks
    • for each of the relevant_models, compare the two dbt targets (dev_prefix vs prod_prefix)
  • Saves output to file
    • in a format supported by Github comments
  • Posts comment on open PR
    • leveraging dbt_table_diff PyPi package

Docs

python3 -m dbt_table_diff --help

usage: dbt_table_diff [-h] [-o ORG_NAME] [-r REPO_NAME] [-t AUTH_TOKEN] [-l PR_ID] [--manifest_file MANIFEST_FILE] [--project_id PROJECT_ID] [--keyfile_path KEYFILE_PATH] [--ignored_schemas IGNORED_SCHEMAS]
                      [--irregular_schemas IRREGULAR_SCHEMAS] [--dev_prefix DEV_PREFIX] [--prod_prefix PROD_PREFIX] [--fallback_prefix FALLBACK_PREFIX] [--custom_checks_path CUSTOM_CHECKS_PATH]

optional arguments:
  -h, --help            show this help message and exit
  -o ORG_NAME, --org_name ORG_NAME
                        Owner of GitHub repository.
  -r REPO_NAME, --repo_name REPO_NAME
                        Name of the GitHub repository.
  -t AUTH_TOKEN, --auth_token AUTH_TOKEN
                        User's GitHub Personal Access Token.
  -l PR_ID, --pr_id PR_ID
                        The issue # of the Pull Request.
  --manifest_file MANIFEST_FILE
                        The path to dbt's manifest file.
  --project_id PROJECT_ID
                        The BigQuery Project ID to leverage.
  --keyfile_path KEYFILE_PATH
                        The path to the keyfile to use during BQ calls.
  --ignored_schemas IGNORED_SCHEMAS
                        Folders in models/ to always ignore during row/col checks.
  --irregular_schemas IRREGULAR_SCHEMAS
                        Folders in models/ which use 'fallback_prefix' in prod.
  --dev_prefix DEV_PREFIX
                        Prefix used by development datasets in dbt.
  --prod_prefix PROD_PREFIX
                        Prefix used by production datasets in dbt.
  --fallback_prefix FALLBACK_PREFIX
                        Uncommon prefix used by only some production datasets in dbt.
  --custom_checks_path CUSTOM_CHECKS_PATH
                        A local folder containing any custom SQL to run.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_table_diff-2.2.3.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

dbt_table_diff-2.2.3-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file dbt_table_diff-2.2.3.tar.gz.

File metadata

  • Download URL: dbt_table_diff-2.2.3.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for dbt_table_diff-2.2.3.tar.gz
Algorithm Hash digest
SHA256 90673b6e454e0cf1755d1e66bb3c8fc6bd3233ec5e7d99ea0764c80523dbc6b8
MD5 d31e3b6f29c868dcaf413d3218b603f4
BLAKE2b-256 d6be17e30b960be498c45639aa8518847191a6dc761d85f37ffac4652b4468d5

See more details on using hashes here.

File details

Details for the file dbt_table_diff-2.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_table_diff-2.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1e6da17b03ef3366cf951fec40e86336ef66fe5c97e5b442bcc58d7289000279
MD5 a2537468a487cb8437b75a0dacfc2564
BLAKE2b-256 ef1d977b80064849dddb5e9f2fc8ff0142ccabc5b6e3047b6ec7090b45eb86e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page