Catch semantic breaking changes in dbt metrics before they land in production.
Project description
dbt-semguard
Catch semantic breaking changes in dbt metrics before they land in production.
dbt-semguard is a CLI-first semantic change detector for dbt Semantic Layer definitions. It compares two versions of the semantic contract, classifies changes as breaking, risky, or safe, and renders local or GitHub-friendly output without requiring warehouse access or dbt runtime internals.
What Is This For?
dbt-semguard is a semantic PR guard for dbt metrics and semantic models.
It answers one question:
What changed in the meaning of this metric?
That matters because many dbt changes are valid from a parser or build point of view, but still dangerous for downstream consumers.
For example, a PR may:
- change
gross_revenuefromsum(order_total)toavg(order_total) - remove a dimension people use to slice a KPI
- change a ratio metric denominator
- widen or narrow a metric filter
- change entity or time-grain semantics
In all of those cases, dbt may still parse successfully and CI may still be green. But the business meaning of the metric has changed, and dashboards, notebooks, reverse ETL jobs, or APIs may silently start returning different answers.
dbt-semguard exists to catch that class of change before it reaches production.
What It Does Exactly
dbt-semguard does not lint YAML style and it does not validate warehouse execution.
Instead, it:
- reads the dbt Semantic Layer definition from two inputs
- extracts only the semantic parts that affect meaning
- builds a canonical contract for each side
- diffs those contracts
- classifies each change as
breaking,risky, orsafe - renders the result for local CLI use or GitHub Actions
In practical terms, it helps teams review semantic changes the same way they already review code changes.
How It Works
The tool reduces dbt semantic definitions into a normalized contract that is easier to compare than raw YAML.
It keeps fields that affect meaning, such as:
- semantic model identity
- backing model name
- entities and entity types
- dimensions and time granularity
- measures and measure expressions
- metric type
- aggregation and expression
- filters
- ratio numerator and denominator
It intentionally ignores noise such as:
- descriptions
- docs blocks
- YAML ordering
- whitespace and comments
That means the output is focused on semantic drift, not formatting drift.
Install From PyPI
python -m pip install dbt-semguard
dbt-semguard requires Python 3.11 or newer.
Install From GitHub
python -m pip install "git+https://github.com/yeaight7/dbt-semguard.git@v0.5.4"
Use the GitHub install path when you need to pin directly to a repository tag.
Install From Source
git clone https://github.com/yeaight7/dbt-semguard.git
cd dbt-semguard
python -m pip install .
How To Use It
Run locally before opening a PR
Use this when you want to sanity-check semantic changes while you are still developing:
semguard diff --base-ref main --head-ref HEAD --project-dir .
semguard check --base-ref main --head-ref HEAD --project-dir . --fail-on breaking
Typical use:
diffwhen you want to inspect what changedcheckwhen you want a blocking exit code for automation or local scripts
For monorepos, always point --project-dir at the dbt project root you want to analyze:
semguard diff --base-ref main --head-ref HEAD --project-dir analytics/dbt
Git ref mode and local YAML mode now both scope discovery to this directory.
Compare exported contracts directly
Use this when you want to compare two precomputed semantic contracts:
semguard diff --base-contract base-contract.json --head-contract head-contract.json --format markdown
Compare manifests explicitly
Use this when your workflow already has dbt semantic_manifest.json artifacts available:
semguard diff --base-manifest base-semantic-manifest.json --head-manifest head-semantic-manifest.json --format json
Extract a contract
Use this when you want a stable machine-readable snapshot of semantic meaning:
semguard extract --source yaml --project-dir examples/ecommerce_dbt_project --output base-contract.json
semguard extract --source manifest --manifest semantic_manifest.json --output manifest-contract.json
Configure YAML discovery with .semguard.yml
Create .semguard.yml in your dbt project root to control which YAML files are scanned:
include:
- models/**/*.yml
- models/**/*.yaml
- metrics/**/*.yml
- metrics/**/*.yaml
- semantic_models/**/*.yml
- semantic_models/**/*.yaml
exclude:
- target/**
- dbt_packages/**
- .venv/**
- .github/**
If the file is not present, these defaults are applied automatically.
Example Review Flow
- A developer changes a metric or semantic model in dbt.
dbt-semguard diffcompares the base branch and the current branch.- The tool reports semantic changes only.
- The team decides whether the change is acceptable, needs migration planning, or should be blocked.
- In CI,
semguard check --fail-on breakingcan fail the PR automatically.
How To Read The Result
breaking: the semantic meaning changed in a way that should usually block by defaultrisky: the change may be legitimate, but downstream consumers should review itsafe: cosmetic-only changes that do not appear in the semantic diff
Output
diff and check emit one of:
textmarkdownjson
JSON reports contain:
summaryhighest_severityblockingchangesmetadata
Example Markdown report
## dbt-semguard report
### Breaking changes
#### Metric `gross_revenue`
- Metric `gross_revenue` changed aggregation from `sum` to `avg`.
Status: blocking
Example JSON report
{
"summary": {
"breaking": 3,
"risky": 1,
"safe": 0
},
"highest_severity": "breaking",
"blocking": true
}
Coverage
dbt-semguard currently covers the highest-value semantic changes in the latest dbt Semantic Layer spec.
Covered extractors and inputs:
- Latest-spec YAML projects
- Legacy top-level
semantic_models/metricsYAML projects - Explicit dbt
semantic_manifest.jsoninput - Canonical contract JSON emitted by
semguard extract
Covered semantic comparisons:
- Semantic model add/remove and backing model changes
- Semantic model default aggregation time dimension changes
- Entity add/remove, type changes, and expression changes
- Dimension add/remove, type changes, expression changes, and time granularity changes
- Measure add/remove, aggregation, expression, aggregation-time, and non-additive changes
- Simple metric aggregation, expression, label, filter, ownership, aggregation-time, and non-additive changes
- Ratio metric numerator and denominator changes
- Derived metric expression and input metric changes
- Cumulative metric input, window, grain-to-date, and period-aggregation changes
- Conversion metric entity, calculation, base metric, conversion metric, and constant-property changes
- Additive changes such as new entities, new dimensions, new measures, and new metrics
Current automated coverage:
- YAML extraction for the latest spec
- Manifest normalization
- Semantic diff severity mapping for breaking and risky changes
- Declarative field-coverage policy so contract fields are explicitly diffed, nested, or intentionally excluded
- Source diagnostics in extracted YAML contracts and change reports
- CLI
extract,diff, andcheck - Sticky PR comment delivery through the GitHub Action
- Checkout-free git ref mode
- Pre-release local action smoke coverage in CI, plus post-release published action smoke coverage in both git-ref and manifest modes, including spaced manifest paths
Current Limitations
Known v0.5.4 limitations are intentionally narrow:
- There is no allowlist for intentional semantic changes yet.
- Manifest parsing expects dbt
semantic_manifest.json, not the general-purpose dbtmanifest.jsonartifact. - Legacy YAML support covers top-level
semantic_models,measures, andtype_params, but cross-project ref semantics are still normalized conservatively into the singlemodel_namecontract field. - Rename handling is intentionally conservative: a rename is treated as a removal plus an addition.
- Source diagnostics are best-effort and currently strongest for YAML extraction; manifest-derived contracts may still lack file/line detail.
- GitHub integration supports sticky PR comments and inline annotations for pull_request workflows, but does not yet manage review-thread lifecycles.
Use As A GitHub Action
Use the included composite action from this repository:
jobs:
semguard:
runs-on: ubuntu-latest
permissions:
contents: read
issues: write
pull-requests: read
checks: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: yeaight7/dbt-semguard@v0.5.4
id: semguard
with:
base-ref: ${{ github.event.pull_request.base.sha }}
head-ref: ${{ github.sha }}
fail-on: breaking
pr-comment: true
pr-comment-mode: sticky
github-token: ${{ github.token }}
- name: Inspect semguard outputs
run: |
echo "Highest severity: ${{ steps.semguard.outputs.highest-severity }}"
echo "Blocking: ${{ steps.semguard.outputs.blocking }}"
The action now exposes structured outputs so downstream CI can branch on semantic severity without reparsing JSON:
steps.semguard.outputs.highest-severitysteps.semguard.outputs.blockingsteps.semguard.outputs.breaking-countsteps.semguard.outputs.risky-countsteps.semguard.outputs.safe-count
pr-comment-mode accepts:
sticky: update the previous dbt-semguard PR comment when one already existscreate: always publish a new PR comment instead of updating the previous one
The action writes:
- a Markdown summary to the workflow summary
- a JSON artifact named
semguard-report - structured step outputs for severity and counts
- an optional sticky PR comment when
pr-comment: true - inline check-run annotations when source diagnostics are available
- a failing status when the configured threshold is reached
The action requires Python 3.11 or newer. GitHub API calls for PR comments and annotations use a 30-second timeout so stalled API responses do not hold CI indefinitely.
When there are zero semantic changes, the Markdown artifact and workflow summary explicitly include No semantic changes detected. followed by Status: passing.
This is the recommended setup when you want the semantic review to happen automatically on every PR.
If you enable pr-comment: true, the workflow needs:
contents: readissues: writepull-requests: readchecks: write
Missing checks: write can prevent inline annotations and check runs from appearing even when the semantic diff succeeds.
For forked pull requests, the standard pull_request event usually does not get a write-capable GITHUB_TOKEN, so sticky PR comments and check-run annotations may be unavailable unless you adopt a separate trusted workflow pattern.
Troubleshooting
Common CI and configuration issues are covered in docs/troubleshooting.md.
Migration notes (v0.5.4)
- Severity handling now uses an internal enum while preserving the same JSON strings (
breaking,risky,safe). - SQL filter diffs preserve case and quote semantics while still ignoring insignificant operator spacing.
- GitHub workflow examples now scope write access to PR comments and check annotations only.
- Extractor internals are split into YAML, manifest, and normalization modules behind the same public facade.
- Native measure diffing, sub-day granularity severity, 30-second GitHub API timeouts, and git ref validation are included in the release surface.
- Git ref extraction now scopes strictly to
--project-dirfor monorepos. - YAML discovery now uses safe default include/exclude patterns.
- Optional
.semguard.ymlinclude/exclude rules are applied in both local and git-ref YAML extraction. - Invalid semantic YAML now raises user-facing errors with source context instead of raw
KeyErrortracebacks. - Composite action shell steps now read user-controlled values from environment variables instead of embedding GitHub expressions directly in Bash.
- Composite action now generates JSON, Markdown, summary text, and step outputs in a single pass before enforcing the blocking threshold.
- Composite action report files now live in an isolated runner temp directory derived from
artifact-name, which avoids workspace filename collisions in matrix-style CI jobs. - The repository now documents security reporting, contribution setup, and common action troubleshooting paths.
Example project
An example latest-spec dbt project lives in examples/ecommerce_dbt_project.
Documentation
- Contract spec
- How to use and explain dbt-semguard
- Severity rules
- Troubleshooting
- Roadmap
- Changelog
- Contributing
- Security policy
License
This project is open source under the MIT License. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_semguard-0.5.4.tar.gz.
File metadata
- Download URL: dbt_semguard-0.5.4.tar.gz
- Upload date:
- Size: 56.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
371c89767514f8af0896583d03cdec00a4d58007fbda607661dab06e0910922b
|
|
| MD5 |
d9207c46bc37ba502c7f51996b6cc720
|
|
| BLAKE2b-256 |
5cd95a73c8de69e4f523d016e88057014dd159e9f4e61af4c2dde061d9604b64
|
Provenance
The following attestation bundles were made for dbt_semguard-0.5.4.tar.gz:
Publisher:
publish.yml on yeaight7/dbt-semguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_semguard-0.5.4.tar.gz -
Subject digest:
371c89767514f8af0896583d03cdec00a4d58007fbda607661dab06e0910922b - Sigstore transparency entry: 1391965798
- Sigstore integration time:
-
Permalink:
yeaight7/dbt-semguard@92fc0defe50430eba4c0c19207693a26a6e16956 -
Branch / Tag:
refs/tags/v0.5.4 - Owner: https://github.com/yeaight7
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@92fc0defe50430eba4c0c19207693a26a6e16956 -
Trigger Event:
release
-
Statement type:
File details
Details for the file dbt_semguard-0.5.4-py3-none-any.whl.
File metadata
- Download URL: dbt_semguard-0.5.4-py3-none-any.whl
- Upload date:
- Size: 30.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62c22d63b33b26c0f138b5da2805c6082e2646f3082a615a94b4e748dd21fbc9
|
|
| MD5 |
1dd83d1dffdf4d2ef4c6cf6235a29f77
|
|
| BLAKE2b-256 |
f96851acc20472cb78bac211711b73052da0638e85b00e4db039821fc6f00a82
|
Provenance
The following attestation bundles were made for dbt_semguard-0.5.4-py3-none-any.whl:
Publisher:
publish.yml on yeaight7/dbt-semguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_semguard-0.5.4-py3-none-any.whl -
Subject digest:
62c22d63b33b26c0f138b5da2805c6082e2646f3082a615a94b4e748dd21fbc9 - Sigstore transparency entry: 1391965807
- Sigstore integration time:
-
Permalink:
yeaight7/dbt-semguard@92fc0defe50430eba4c0c19207693a26a6e16956 -
Branch / Tag:
refs/tags/v0.5.4 - Owner: https://github.com/yeaight7
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@92fc0defe50430eba4c0c19207693a26a6e16956 -
Trigger Event:
release
-
Statement type: