Column-level lineage breaking change detection for dbt Core CI pipelines
Project description
dbt-guard
Column-level lineage breaking change detection for dbt Core CI pipelines.
dbt-guard detects when a model's output columns change in a way that would break downstream consumers — before the code reaches production. It works by comparing two manifest.json files (the base branch vs. the PR branch) using static analysis only: no database connection required.
This tool addresses the gap described in dbt-core issue #6869: dbt has no built-in mechanism for blocking PRs that silently remove or rename columns that downstream models depend on.
Quick start
pip install dbt-guard
Then in your CI pipeline, after running dbt parse on both the base branch and the PR branch:
dbt-guard diff \
--base path/to/base/target/ \
--current path/to/current/target/ \
--dialect snowflake \
--format github
Exit code 0 means no breaking changes. Exit code 1 means breaking changes were detected.
GitHub Actions integration
- name: Generate base manifest
run: |
git stash
dbt parse --profiles-dir . --target ci
mkdir -p /tmp/base_target && cp target/manifest.json /tmp/base_target/
git stash pop
- name: Generate current manifest
run: dbt parse --profiles-dir . --target ci
- name: Column lineage check
run: |
dbt-guard diff \
--base /tmp/base_target \
--current target/ \
--dialect snowflake \
--format github
Bitbucket Pipelines integration
pipelines:
pull-requests:
'**':
- step:
name: Column lineage check
image: python:3.12-slim
script:
- pip install dbt-guard dbt-core dbt-snowflake
- git fetch origin $BITBUCKET_PR_DESTINATION_BRANCH
- git stash
- dbt parse --profiles-dir . --target ci
- mkdir -p /tmp/base_target && cp target/manifest.json /tmp/base_target/
- git stash pop
- dbt parse --profiles-dir . --target ci
- dbt-guard diff --base /tmp/base_target --current target/ --dialect snowflake
CLI reference
dbt-guard diff [OPTIONS]
Options:
--base PATH Directory containing base manifest.json [required]
--current PATH Directory containing current manifest.json [required]
--dialect TEXT SQL dialect: default, snowflake, bigquery, databricks,
redshift, trino [default: default]
--format TEXT Output format: text, json, github [default: text]
--fail-on TEXT When to exit non-zero: breaking, any, never
[default: breaking]
--no-impact Skip downstream impact analysis
--max-depth INT Max DAG hops for impact traversal [default: 10]
--output PATH Write report to file instead of stdout
--select MODEL Limit diff to specific model names (repeatable)
--quiet Print one-line summary only
--version Show version and exit
--help Show this message and exit
Exit codes
| Code | Meaning |
|---|---|
| 0 | No breaking changes (or --fail-on never) |
| 1 | Breaking changes detected (or any changes with --fail-on any) |
| 2 | Tool error (manifest not found, invalid JSON, etc.) |
How it works
-
Parse both manifests. dbt-guard reads
manifest.jsonfrom the base and current target directories. No dbt execution, no database connection. -
Extract column inventories. For each model, it reads the documented columns from
manifest.json. If compiled SQL is present on disk (intarget/compiled/), it additionally parses the SQL with SQLGlot to detect undocumented columns. -
Diff columns. For each model present in both manifests, it compares column sets:
- Column removed → breaking
- Column renamed (1 removed + 1 added, matching type) → breaking
- Column type changed (only when documented on both sides) → breaking
- Column added → non-breaking
-
Impact analysis. For each breaking change, it traverses the
child_mapin the manifest via BFS to find downstream models affected transitively. -
Report. Output in text, JSON, or GitHub Actions annotation format.
What counts as breaking vs. non-breaking
Breaking changes will cause downstream consumers to fail or produce incorrect results:
| Change | Breaking? | Why |
|---|---|---|
| Column removed | Yes | Downstream SELECT or JOIN on that column will fail |
| Column renamed | Yes | All references to the old name break |
| Column type changed | Yes | Implicit casts may fail or produce wrong results |
| Column added | No | Additive; downstream consumers are unaffected |
| New model added | No | Nothing depends on it yet |
| Model removed from current | No | Not diffed; dbt will surface this as a ref() error |
Limitations
SELECT * expansion. When a model ends with SELECT * FROM final_cte, dbt-guard tries to resolve the star by tracing back through the CTE chain. If the star references a physical table (not a CTE), expansion fails and dbt-guard falls back to documented columns from schema.yml.
No catalog required. dbt-guard does not need catalog.json (the output of dbt docs generate). Column types are taken from schema.yml documentation when available. Type-change detection only fires when both the base and current sides have a documented data_type. Models where columns are entirely undocumented are still diffed by column name (removal/addition), just not by type.
Parse-only manifests. dbt parse does not compile SQL (compiled files are absent). In this mode, dbt-guard works exclusively from documented columns. Run dbt compile instead of dbt parse to enable SQL-based column extraction.
Rename heuristic. The rename detection (1 removed + 1 added with matching type) is a best-effort heuristic. If a model removes one column and adds a different one in the same PR, dbt-guard will report it as a rename. Use --format json to inspect the raw events.
Column ordering. dbt-guard does not detect column reordering. Changing the position of a column in a SELECT is non-breaking for named references but breaking for positional references (e.g. SELECT * FROM upstream in the middle of a CTE). This is a known gap.
Contributing
Contributions are welcome. The project uses standard Python tooling:
# Clone and install in editable mode with dev dependencies
git clone https://github.com/dbt-guard/dbt-guard
cd dbt-guard
pip install -e ".[dev]"
# Run tests
pytest
# Lint
ruff check dbt_guard/
# Type check
mypy dbt_guard/
The test suite uses synthetic manifest fixtures in tests/fixtures/manifests/. To add a new test scenario, add a manifest pair there and write the corresponding test.
Key design decisions:
- Minimal dependencies: only
sqlglotandclick. No pandas, no dbt-core. - Graceful degradation: if SQL parsing fails, fall back to documented columns rather than raising.
- Static analysis only: no database connection, no
dbt runneeded.
License
Apache 2.0. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_guard-0.1.2.tar.gz.
File metadata
- Download URL: dbt_guard-0.1.2.tar.gz
- Upload date:
- Size: 28.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32fb9e1f1de661b6150b338084bc43c9bb115fc0d5c259fef2c97cc28c3c9856
|
|
| MD5 |
087a79679a7d7797448a14c7ae818b9f
|
|
| BLAKE2b-256 |
93ca6d8de86e2cf923ff91e1163d564948fac27d702a64aa5284750ad912d3c4
|
Provenance
The following attestation bundles were made for dbt_guard-0.1.2.tar.gz:
Publisher:
release.yml on damione1/dbt-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_guard-0.1.2.tar.gz -
Subject digest:
32fb9e1f1de661b6150b338084bc43c9bb115fc0d5c259fef2c97cc28c3c9856 - Sigstore transparency entry: 1204733300
- Sigstore integration time:
-
Permalink:
damione1/dbt-guard@759950ed1245f7b84358d2519cf5954d8fef0935 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/damione1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@759950ed1245f7b84358d2519cf5954d8fef0935 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dbt_guard-0.1.2-py3-none-any.whl.
File metadata
- Download URL: dbt_guard-0.1.2-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b28dd61c28c0c6cd04e75c312816874cc61c9c730f74b0f8cc3b9a9493db5222
|
|
| MD5 |
76ddd0040570915e1810208a2291253f
|
|
| BLAKE2b-256 |
6432590f40c000a5f24f950de83543d79912b0554b4cd0f99411a6984cb1715b
|
Provenance
The following attestation bundles were made for dbt_guard-0.1.2-py3-none-any.whl:
Publisher:
release.yml on damione1/dbt-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_guard-0.1.2-py3-none-any.whl -
Subject digest:
b28dd61c28c0c6cd04e75c312816874cc61c9c730f74b0f8cc3b9a9493db5222 - Sigstore transparency entry: 1204733312
- Sigstore integration time:
-
Permalink:
damione1/dbt-guard@759950ed1245f7b84358d2519cf5954d8fef0935 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/damione1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@759950ed1245f7b84358d2519cf5954d8fef0935 -
Trigger Event:
push
-
Statement type: