Column-level lineage breaking change detection for dbt Core CI pipelines
Project description
dbt-guard
Column-level lineage breaking change detection for dbt Core CI pipelines.
dbt-guard detects when a model's output columns change in a way that would break downstream consumers — before the code reaches production. It works by comparing two manifest.json files (the base branch vs. the PR branch) using static analysis only: no database connection required.
This tool addresses the gap described in dbt-core issue #6869: dbt has no built-in mechanism for blocking PRs that silently remove or rename columns that downstream models depend on.
Quick start
pip install dbt-guard
Then in your CI pipeline, after running dbt compile on both the base branch and the PR branch:
dbt-guard diff \
--base path/to/base/target/ \
--current path/to/current/target/ \
--dialect snowflake \
--format github \
--include-sources \
--include-exposures \
--column-lineage
Exit code 0 means no breaking changes. Exit code 1 means breaking changes were detected.
GitHub Actions integration
- name: Generate base manifest
run: |
git stash
dbt compile --profiles-dir . --target ci
cp -r target/ /tmp/base_target/
git stash pop
- name: Generate current manifest
run: dbt compile --profiles-dir . --target ci
- name: Column lineage check
run: |
dbt-guard diff \
--base /tmp/base_target \
--current target/ \
--dialect snowflake \
--format github \
--include-sources \
--include-exposures \
--column-lineage
Note:
dbt compileproduces compiled SQL files that enable column-level lineage resolution. If you usedbt parseinstead, dbt-guard still works but falls back to documented columns only (no column-level tracing).
Bitbucket Pipelines integration
pipelines:
pull-requests:
'**':
- step:
name: Column lineage check
image: python:3.12-slim
script:
- pip install dbt-guard dbt-core dbt-snowflake
- git fetch origin $BITBUCKET_PR_DESTINATION_BRANCH
- git stash
- dbt compile --profiles-dir . --target ci
- cp -r target/ /tmp/base_target/
- git stash pop
- dbt compile --profiles-dir . --target ci
- dbt-guard diff --base /tmp/base_target --current target/ --dialect snowflake --column-lineage
CLI reference
dbt-guard diff [OPTIONS]
Options:
--base PATH Directory containing base manifest.json [required]
--current PATH Directory containing current manifest.json [required]
--dialect TEXT SQL dialect: default, snowflake, bigquery, databricks,
redshift, trino [default: default]
--format TEXT Output format: text, json, github [default: text]
--fail-on TEXT When to exit non-zero: breaking, any, never
[default: breaking]
--no-impact Skip downstream impact analysis
--max-depth INT Max DAG hops for impact traversal [default: 10]
--output PATH Write report to file instead of stdout
--select MODEL Limit diff to specific model names (repeatable)
--quiet Print one-line summary only
--include-sources Include dbt sources in the diff analysis
--include-exposures Include dbt exposures in impact analysis
--include-snapshots Include dbt snapshots in the diff analysis
--column-lineage Enable column-level lineage to reduce false positives
--strict-lineage Fail if compiled SQL is missing (requires --column-lineage)
--warn-undocumented-sources Warn about sources with no documented columns
--version Show version and exit
--help Show this message and exit
Exit codes
| Code | Meaning |
|---|---|
| 0 | No breaking changes (or --fail-on never) |
| 1 | Breaking changes detected (or any changes with --fail-on any) |
| 2 | Tool error (manifest not found, invalid JSON, etc.) |
How it works
-
Parse both manifests. dbt-guard reads
manifest.jsonfrom the base and current target directories. No dbt execution, no database connection. -
Extract column inventories. For each model (and optionally sources and snapshots), it reads the documented columns from
manifest.json. If compiled SQL is present on disk (intarget/compiled/), it additionally parses the SQL with SQLGlot to detect undocumented columns. -
Diff columns. For each model present in both manifests, it compares column sets:
- Column removed → breaking
- Column renamed (1 removed + 1 added, matching type) → breaking
- Column type changed (only when documented on both sides) → breaking
- Column added → non-breaking
-
Impact analysis. For each breaking change, it traverses the
child_mapin the manifest via BFS to find downstream models affected transitively. -
Column-level lineage (opt-in). When
--column-lineageis enabled, dbt-guard parses each downstream model's compiled SQL and usessqlglot.lineageto trace column-to-column dependencies. Models that don't reference any changed column are cleared from the impact list. This propagates through the DAG: if model B references a changed column from model A, model B's affected output columns are tracked into model C, and so on. -
Exposure impact (opt-in). When
--include-exposuresis enabled, dbt-guard checks which exposures depend on changed or impacted models and reports owner, type, and URL for each affected exposure. -
Report. Output in text, JSON, or GitHub Actions annotation format.
Column-level lineage: eliminating false positives
Without --column-lineage, dbt-guard uses model-level BFS: if model A has a breaking change, every downstream model is flagged. This produces false positives when a downstream model doesn't actually use the changed column.
With --column-lineage, dbt-guard traces which output columns reference the changed upstream column. Models with no dependency are cleared and removed from the impact list.
Example: stg_users.phone removed
Without --column-lineage:
stg_users → int_order_summary ← IMPACTED (false positive — uses name, not phone)
stg_users → int_user_metrics ← IMPACTED (true positive — uses phone)
With --column-lineage:
stg_users → int_order_summary ← CLEARED
stg_users → int_user_metrics ← IMPACTED
Output formats
Text (default)
Human-readable report with sections for breaking changes, non-breaking changes, downstream impact, source changes, column lineage detail, cleared models, exposure impact, and warnings.
JSON
Machine-readable output for CI artifacts:
{
"summary": {
"breaking": 1,
"non_breaking": 1,
"impacted_models": 1,
"sources_changed": 1,
"models_cleared": 1,
"exposures_impacted": 1
},
"breaking_changes": [...],
"non_breaking_changes": [...],
"impacted_models": [...],
"source_changes": [...],
"column_lineage_impact": [...],
"cleared_models": [...],
"exposure_impact": [...],
"undocumented_sources": [...]
}
GitHub Actions
Annotation format:
::error::for breaking changes (model and source)::warning::for exposure impacts::notice::for models cleared by column-level lineage
What counts as breaking vs. non-breaking
| Change | Breaking? | Why |
|---|---|---|
| Column removed | Yes | Downstream SELECT or JOIN on that column will fail |
| Column renamed | Yes | All references to the old name break |
| Column type changed | Yes | Implicit casts may fail or produce wrong results |
| Column added | No | Additive; downstream consumers are unaffected |
| New model added | No | Nothing depends on it yet |
| Model removed from current | No | Not diffed; dbt will surface this as a ref() error |
| Source column removed | Yes | Models referencing that source column will fail |
| Source column type changed | Yes | Type mismatches in downstream models |
Limitations
SELECT * expansion. When a model ends with SELECT * FROM final_cte, dbt-guard tries to resolve the star by tracing back through the CTE chain. If the star references a physical table (not a CTE), expansion fails and dbt-guard falls back to documented columns from schema.yml.
No catalog required. dbt-guard does not need catalog.json (the output of dbt docs generate). Column types are taken from schema.yml documentation when available. Type-change detection only fires when both the base and current sides have a documented data_type. Models where columns are entirely undocumented are still diffed by column name (removal/addition), just not by type.
Parse-only manifests. dbt parse does not compile SQL (compiled files are absent). In this mode, dbt-guard works exclusively from documented columns. Run dbt compile instead of dbt parse to enable SQL-based column extraction and column-level lineage resolution.
Rename heuristic. The rename detection (1 removed + 1 added with matching type) is a best-effort heuristic. If a model removes one column and adds a different one in the same PR, dbt-guard will report it as a rename. Use --format json to inspect the raw events.
Column ordering. dbt-guard does not detect column reordering. Changing the position of a column in a SELECT is non-breaking for named references but breaking for positional references (e.g. SELECT * FROM upstream in the middle of a CTE). This is a known gap.
Column lineage accuracy. The --column-lineage feature relies on SQLGlot's ability to parse and trace column references through SQL. Complex SQL patterns (UDFs, dynamic SQL, certain dialect-specific syntax) may not resolve correctly. When tracing fails for a column, dbt-guard conservatively marks it as impacted rather than clearing it.
Contributing
Contributions are welcome. The project uses standard Python tooling:
# Clone and install in editable mode with dev dependencies
git clone https://github.com/dbt-guard/dbt-guard
cd dbt-guard
pip install -e ".[dev]"
# Run tests
pytest
# Lint
ruff check dbt_guard/
# Type check
mypy dbt_guard/
The test suite uses synthetic manifest fixtures in tests/fixtures/manifests/. To add a new test scenario, add a manifest pair there and write the corresponding test.
Key design decisions:
- Minimal dependencies: only
sqlglotandclick. No pandas, no dbt-core. - Graceful degradation: if SQL parsing or column lineage tracing fails, fall back to documented columns or model-level impact rather than raising.
- Static analysis only: no database connection, no
dbt runneeded.
License
Apache 2.0. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_guard-0.2.0.tar.gz.
File metadata
- Download URL: dbt_guard-0.2.0.tar.gz
- Upload date:
- Size: 44.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9986684f71120af6bf303b7d3f255f9f9f6c63012e2de322222a1fe8761e3ad6
|
|
| MD5 |
c4ae11c89858e6da7c47b9fbece5f481
|
|
| BLAKE2b-256 |
f58d88b956cca4cf41bfe189554a5784206230f91e9b8bfe3443012336e9ae42
|
Provenance
The following attestation bundles were made for dbt_guard-0.2.0.tar.gz:
Publisher:
release.yml on damione1/dbt-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_guard-0.2.0.tar.gz -
Subject digest:
9986684f71120af6bf303b7d3f255f9f9f6c63012e2de322222a1fe8761e3ad6 - Sigstore transparency entry: 1231046506
- Sigstore integration time:
-
Permalink:
damione1/dbt-guard@109619f1979c793abef4702c28c71015d05e9cd3 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/damione1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@109619f1979c793abef4702c28c71015d05e9cd3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dbt_guard-0.2.0-py3-none-any.whl.
File metadata
- Download URL: dbt_guard-0.2.0-py3-none-any.whl
- Upload date:
- Size: 33.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea8aff7601beb695cf348ace7a008325dc4c9e4c70803a11328ec1441cd5c100
|
|
| MD5 |
9c61dca1a25451c9e81847d6260659ef
|
|
| BLAKE2b-256 |
ec62a01648b8cb1f5791efc2b190b5e1fecdafce8eb3936a981d0ea2f54cf9cf
|
Provenance
The following attestation bundles were made for dbt_guard-0.2.0-py3-none-any.whl:
Publisher:
release.yml on damione1/dbt-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_guard-0.2.0-py3-none-any.whl -
Subject digest:
ea8aff7601beb695cf348ace7a008325dc4c9e4c70803a11328ec1441cd5c100 - Sigstore transparency entry: 1231046517
- Sigstore integration time:
-
Permalink:
damione1/dbt-guard@109619f1979c793abef4702c28c71015d05e9cd3 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/damione1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@109619f1979c793abef4702c28c71015d05e9cd3 -
Trigger Event:
push
-
Statement type: