Skip to main content

Column-level lineage breaking change detection for dbt Core CI pipelines

Project description

dbt-guard

PyPI CI License

Column-level lineage breaking change detection for dbt Core CI pipelines.

dbt-guard detects when a model's output columns change in a way that would break downstream consumers — before the code reaches production. It works by comparing two manifest.json files (the base branch vs. the PR branch) using static analysis only: no database connection required.

This tool addresses the gap described in dbt-core issue #6869: dbt has no built-in mechanism for blocking PRs that silently remove or rename columns that downstream models depend on.

Quick start

pip install dbt-guard

Then in your CI pipeline, after running dbt parse on both the base branch and the PR branch:

dbt-guard diff \
  --base path/to/base/target/ \
  --current path/to/current/target/ \
  --dialect snowflake \
  --format github

Exit code 0 means no breaking changes. Exit code 1 means breaking changes were detected.

GitHub Actions integration

- name: Generate base manifest
  run: |
    git stash
    dbt parse --profiles-dir . --target ci
    mkdir -p /tmp/base_target && cp target/manifest.json /tmp/base_target/
    git stash pop

- name: Generate current manifest
  run: dbt parse --profiles-dir . --target ci

- name: Column lineage check
  run: |
    dbt-guard diff \
      --base /tmp/base_target \
      --current target/ \
      --dialect snowflake \
      --format github

Bitbucket Pipelines integration

pipelines:
  pull-requests:
    '**':
      - step:
          name: Column lineage check
          image: python:3.12-slim
          script:
            - pip install dbt-guard dbt-core dbt-snowflake
            - git fetch origin $BITBUCKET_PR_DESTINATION_BRANCH
            - git stash
            - dbt parse --profiles-dir . --target ci
            - mkdir -p /tmp/base_target && cp target/manifest.json /tmp/base_target/
            - git stash pop
            - dbt parse --profiles-dir . --target ci
            - dbt-guard diff --base /tmp/base_target --current target/ --dialect snowflake

CLI reference

dbt-guard diff [OPTIONS]

Options:
  --base PATH           Directory containing base manifest.json  [required]
  --current PATH        Directory containing current manifest.json  [required]
  --dialect TEXT        SQL dialect: default, snowflake, bigquery, databricks,
                        redshift, trino  [default: default]
  --format TEXT         Output format: text, json, github  [default: text]
  --fail-on TEXT        When to exit non-zero: breaking, any, never
                        [default: breaking]
  --no-impact           Skip downstream impact analysis
  --max-depth INT       Max DAG hops for impact traversal  [default: 10]
  --output PATH         Write report to file instead of stdout
  --select MODEL        Limit diff to specific model names (repeatable)
  --quiet               Print one-line summary only
  --version             Show version and exit
  --help                Show this message and exit

Exit codes

Code Meaning
0 No breaking changes (or --fail-on never)
1 Breaking changes detected (or any changes with --fail-on any)
2 Tool error (manifest not found, invalid JSON, etc.)

How it works

  1. Parse both manifests. dbt-guard reads manifest.json from the base and current target directories. No dbt execution, no database connection.

  2. Extract column inventories. For each model, it reads the documented columns from manifest.json. If compiled SQL is present on disk (in target/compiled/), it additionally parses the SQL with SQLGlot to detect undocumented columns.

  3. Diff columns. For each model present in both manifests, it compares column sets:

    • Column removed → breaking
    • Column renamed (1 removed + 1 added, matching type) → breaking
    • Column type changed (only when documented on both sides) → breaking
    • Column added → non-breaking
  4. Impact analysis. For each breaking change, it traverses the child_map in the manifest via BFS to find downstream models affected transitively.

  5. Report. Output in text, JSON, or GitHub Actions annotation format.

What counts as breaking vs. non-breaking

Breaking changes will cause downstream consumers to fail or produce incorrect results:

Change Breaking? Why
Column removed Yes Downstream SELECT or JOIN on that column will fail
Column renamed Yes All references to the old name break
Column type changed Yes Implicit casts may fail or produce wrong results
Column added No Additive; downstream consumers are unaffected
New model added No Nothing depends on it yet
Model removed from current No Not diffed; dbt will surface this as a ref() error

Limitations

SELECT * expansion. When a model ends with SELECT * FROM final_cte, dbt-guard tries to resolve the star by tracing back through the CTE chain. If the star references a physical table (not a CTE), expansion fails and dbt-guard falls back to documented columns from schema.yml.

No catalog required. dbt-guard does not need catalog.json (the output of dbt docs generate). Column types are taken from schema.yml documentation when available. Type-change detection only fires when both the base and current sides have a documented data_type. Models where columns are entirely undocumented are still diffed by column name (removal/addition), just not by type.

Parse-only manifests. dbt parse does not compile SQL (compiled files are absent). In this mode, dbt-guard works exclusively from documented columns. Run dbt compile instead of dbt parse to enable SQL-based column extraction.

Rename heuristic. The rename detection (1 removed + 1 added with matching type) is a best-effort heuristic. If a model removes one column and adds a different one in the same PR, dbt-guard will report it as a rename. Use --format json to inspect the raw events.

Column ordering. dbt-guard does not detect column reordering. Changing the position of a column in a SELECT is non-breaking for named references but breaking for positional references (e.g. SELECT * FROM upstream in the middle of a CTE). This is a known gap.

Contributing

Contributions are welcome. The project uses standard Python tooling:

# Clone and install in editable mode with dev dependencies
git clone https://github.com/dbt-guard/dbt-guard
cd dbt-guard
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check dbt_guard/

# Type check
mypy dbt_guard/

The test suite uses synthetic manifest fixtures in tests/fixtures/manifests/. To add a new test scenario, add a manifest pair there and write the corresponding test.

Key design decisions:

  • Minimal dependencies: only sqlglot and click. No pandas, no dbt-core.
  • Graceful degradation: if SQL parsing fails, fall back to documented columns rather than raising.
  • Static analysis only: no database connection, no dbt run needed.

License

Apache 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_guard-0.1.2.tar.gz (28.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_guard-0.1.2-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file dbt_guard-0.1.2.tar.gz.

File metadata

  • Download URL: dbt_guard-0.1.2.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbt_guard-0.1.2.tar.gz
Algorithm Hash digest
SHA256 32fb9e1f1de661b6150b338084bc43c9bb115fc0d5c259fef2c97cc28c3c9856
MD5 087a79679a7d7797448a14c7ae818b9f
BLAKE2b-256 93ca6d8de86e2cf923ff91e1163d564948fac27d702a64aa5284750ad912d3c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbt_guard-0.1.2.tar.gz:

Publisher: release.yml on damione1/dbt-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dbt_guard-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: dbt_guard-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbt_guard-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b28dd61c28c0c6cd04e75c312816874cc61c9c730f74b0f8cc3b9a9493db5222
MD5 76ddd0040570915e1810208a2291253f
BLAKE2b-256 6432590f40c000a5f24f950de83543d79912b0554b4cd0f99411a6984cb1715b

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbt_guard-0.1.2-py3-none-any.whl:

Publisher: release.yml on damione1/dbt-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page