Skip to main content

Catch semantic breaking changes in dbt metrics before they land in production.

Project description

dbt-semguard

Catch semantic breaking changes in dbt metrics before they land in production.

dbt-semguard is a CLI-first semantic change detector for dbt Semantic Layer definitions. It compares two versions of the semantic contract, classifies changes as breaking, risky, or safe, and renders local or GitHub-friendly output without requiring warehouse access or dbt runtime internals.

What Is This For?

dbt-semguard is a semantic PR guard for dbt metrics and semantic models.

It answers one question:

What changed in the meaning of this metric?

That matters because many dbt changes are valid from a parser or build point of view, but still dangerous for downstream consumers.

For example, a PR may:

  • change gross_revenue from sum(order_total) to avg(order_total)
  • remove a dimension people use to slice a KPI
  • change a ratio metric denominator
  • widen or narrow a metric filter
  • change entity or time-grain semantics

In all of those cases, dbt may still parse successfully and CI may still be green. But the business meaning of the metric has changed, and dashboards, notebooks, reverse ETL jobs, or APIs may silently start returning different answers.

dbt-semguard exists to catch that class of change before it reaches production.

What It Does Exactly

dbt-semguard does not lint YAML style and it does not validate warehouse execution.

Instead, it:

  1. reads the dbt Semantic Layer definition from two inputs
  2. extracts only the semantic parts that affect meaning
  3. builds a canonical contract for each side
  4. diffs those contracts
  5. classifies each change as breaking, risky, or safe
  6. renders the result for local CLI use or GitHub Actions

In practical terms, it helps teams review semantic changes the same way they already review code changes.

How It Works

The tool reduces dbt semantic definitions into a normalized contract that is easier to compare than raw YAML.

It keeps fields that affect meaning, such as:

  • semantic model identity
  • backing model name
  • entities and entity types
  • dimensions and time granularity
  • metric type
  • aggregation and expression
  • filters
  • ratio numerator and denominator

It intentionally ignores noise such as:

  • descriptions
  • docs blocks
  • YAML ordering
  • whitespace and comments

That means the output is focused on semantic drift, not formatting drift.

Install From GitHub

python -m pip install "git+https://github.com/yeaight7/dbt-semguard.git@v0.5.1"

dbt-semguard requires Python 3.11 or newer.

Install From Source

git clone https://github.com/yeaight7/dbt-semguard.git
cd dbt-semguard
python -m pip install .

How To Use It

Run locally before opening a PR

Use this when you want to sanity-check semantic changes while you are still developing:

semguard diff --base-ref main --head-ref HEAD --project-dir .
semguard check --base-ref main --head-ref HEAD --project-dir . --fail-on breaking

Typical use:

  • diff when you want to inspect what changed
  • check when you want a blocking exit code for automation or local scripts

For monorepos, always point --project-dir at the dbt project root you want to analyze:

semguard diff --base-ref main --head-ref HEAD --project-dir analytics/dbt

Git ref mode and local YAML mode now both scope discovery to this directory.

Compare exported contracts directly

Use this when you want to compare two precomputed semantic contracts:

semguard diff --base-contract base-contract.json --head-contract head-contract.json --format markdown

Compare manifests explicitly

Use this when your workflow already has dbt semantic_manifest.json artifacts available:

semguard diff --base-manifest base-semantic-manifest.json --head-manifest head-semantic-manifest.json --format json

Extract a contract

Use this when you want a stable machine-readable snapshot of semantic meaning:

semguard extract --source yaml --project-dir examples/ecommerce_dbt_project --output base-contract.json
semguard extract --source manifest --manifest semantic_manifest.json --output manifest-contract.json

Configure YAML discovery with .semguard.yml

Create .semguard.yml in your dbt project root to control which YAML files are scanned:

include:
  - models/**/*.yml
  - models/**/*.yaml
  - metrics/**/*.yml
  - metrics/**/*.yaml
  - semantic_models/**/*.yml
  - semantic_models/**/*.yaml
exclude:
  - target/**
  - dbt_packages/**
  - .venv/**
  - .github/**

If the file is not present, these defaults are applied automatically.

Example Review Flow

  1. A developer changes a metric or semantic model in dbt.
  2. dbt-semguard diff compares the base branch and the current branch.
  3. The tool reports semantic changes only.
  4. The team decides whether the change is acceptable, needs migration planning, or should be blocked.
  5. In CI, semguard check --fail-on breaking can fail the PR automatically.

How To Read The Result

  • breaking: the semantic meaning changed in a way that should usually block by default
  • risky: the change may be legitimate, but downstream consumers should review it
  • safe: cosmetic-only changes that do not appear in the semantic diff

Output

diff and check emit one of:

  • text
  • markdown
  • json

JSON reports contain:

  • summary
  • highest_severity
  • blocking
  • changes
  • metadata

Example Markdown report

## dbt-semguard report

### Breaking changes
#### Metric `gross_revenue`
- Metric `gross_revenue` changed aggregation from `sum` to `avg`.

Status: blocking

Example JSON report

{
  "summary": {
    "breaking": 3,
    "risky": 1,
    "safe": 0
  },
  "highest_severity": "breaking",
  "blocking": true
}

Coverage

dbt-semguard currently covers the highest-value semantic changes in the latest dbt Semantic Layer spec.

Covered extractors and inputs:

  • Latest-spec YAML projects
  • Legacy top-level semantic_models / metrics YAML projects
  • Explicit dbt semantic_manifest.json input
  • Canonical contract JSON emitted by semguard extract

Covered semantic comparisons:

  • Semantic model add/remove and backing model changes
  • Semantic model default aggregation time dimension changes
  • Entity add/remove, type changes, and expression changes
  • Dimension add/remove, type changes, expression changes, and time granularity changes
  • Simple metric aggregation, expression, label, filter, ownership, aggregation-time, and non-additive changes
  • Ratio metric numerator and denominator changes
  • Derived metric expression and input metric changes
  • Cumulative metric input, window, grain-to-date, and period-aggregation changes
  • Conversion metric entity, calculation, base metric, conversion metric, and constant-property changes
  • Additive changes such as new entities, new dimensions, and new metrics

Current automated coverage:

  • YAML extraction for the latest spec
  • Manifest normalization
  • Semantic diff severity mapping for breaking and risky changes
  • Declarative field-coverage policy so contract fields are explicitly diffed, nested, or intentionally excluded
  • Source diagnostics in extracted YAML contracts and change reports
  • CLI extract, diff, and check
  • Sticky PR comment delivery through the GitHub Action
  • Checkout-free git ref mode
  • Pre-release local action smoke coverage in CI, plus post-release published action smoke coverage in both git-ref and manifest modes, including spaced manifest paths

Current Limitations

Known v0.5.1 limitations are intentionally narrow:

  • There is no fail-on: none advisory-only mode yet.
  • There is no allowlist for intentional semantic changes yet.
  • Manifest parsing expects dbt semantic_manifest.json, not the general-purpose dbt manifest.json artifact.
  • Legacy YAML support covers top-level semantic_models, measures, and type_params, but cross-project ref semantics are still normalized conservatively into the single model_name contract field.
  • Rename handling is intentionally conservative: a rename is treated as a removal plus an addition.
  • Source diagnostics are best-effort and currently strongest for YAML extraction; manifest-derived contracts may still lack file/line detail.
  • GitHub integration supports sticky PR comments for pull_request workflows, but does not yet manage review-thread lifecycles or inline annotations.
  • PyPI publishing is not available yet; install from GitHub or source instead.

Use As A GitHub Action

Use the included composite action from this repository:

jobs:
  semguard:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      issues: write
      pull-requests: read
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: yeaight7/dbt-semguard@v0.5.1
        id: semguard
        with:
          base-ref: ${{ github.event.pull_request.base.sha }}
          head-ref: ${{ github.sha }}
          fail-on: breaking
          pr-comment: true
          pr-comment-mode: sticky
          github-token: ${{ github.token }}

      - name: Inspect semguard outputs
        run: |
          echo "Highest severity: ${{ steps.semguard.outputs.highest-severity }}"
          echo "Blocking: ${{ steps.semguard.outputs.blocking }}"

The action now exposes structured outputs so downstream CI can branch on semantic severity without reparsing JSON:

  • steps.semguard.outputs.highest-severity
  • steps.semguard.outputs.blocking
  • steps.semguard.outputs.breaking-count
  • steps.semguard.outputs.risky-count
  • steps.semguard.outputs.safe-count

pr-comment-mode accepts:

  • sticky: update the previous dbt-semguard PR comment when one already exists
  • create: always publish a new PR comment instead of updating the previous one

The action writes:

  • a Markdown summary to the workflow summary
  • a JSON artifact named semguard-report
  • structured step outputs for severity and counts
  • an optional sticky PR comment when pr-comment: true
  • a failing status when the configured threshold is reached

When there are zero semantic changes, the Markdown artifact and workflow summary explicitly include No semantic changes detected. followed by Status: passing.

This is the recommended setup when you want the semantic review to happen automatically on every PR.

If you enable pr-comment: true, the workflow needs:

  • contents: read
  • issues: write
  • pull-requests: read

For forked pull requests, the standard pull_request event usually does not get a write-capable GITHUB_TOKEN, so sticky PR comments may be unavailable unless you adopt a separate trusted workflow pattern.

Troubleshooting

Common CI and configuration issues are covered in docs/troubleshooting.md.

Migration notes (v0.5.1)

  • Git ref extraction now scopes strictly to --project-dir for monorepos.
  • YAML discovery now uses safe default include/exclude patterns.
  • Optional .semguard.yml include/exclude rules are applied in both local and git-ref YAML extraction.
  • Invalid semantic YAML now raises user-facing errors with source context instead of raw KeyError tracebacks.
  • Composite action shell steps now read user-controlled values from environment variables instead of embedding GitHub expressions directly in Bash.
  • Composite action now generates JSON, Markdown, summary text, and step outputs in a single pass before enforcing the blocking threshold.
  • Composite action report files now live in an isolated runner temp directory derived from artifact-name, which avoids workspace filename collisions in matrix-style CI jobs.
  • The repository now documents security reporting, contribution setup, and common action troubleshooting paths.

Example project

An example latest-spec dbt project lives in examples/ecommerce_dbt_project.

Documentation

License

This project is open source under the MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_semguard-0.5.1.tar.gz (47.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_semguard-0.5.1-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file dbt_semguard-0.5.1.tar.gz.

File metadata

  • Download URL: dbt_semguard-0.5.1.tar.gz
  • Upload date:
  • Size: 47.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dbt_semguard-0.5.1.tar.gz
Algorithm Hash digest
SHA256 67fd551aba0d2915ac0fc553d33b8271c0434688fb11f2316d5e4144af91bd23
MD5 d0bfad1278eb7ab5039e792e1417ad8a
BLAKE2b-256 c3143c052f90d70c69b94d8693147af70ebf417c9addd4a045ba52a849030da9

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbt_semguard-0.5.1.tar.gz:

Publisher: publish.yml on yeaight7/dbt-semguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dbt_semguard-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: dbt_semguard-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 26.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dbt_semguard-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 39970f180ebc98a626f864f71aae8d6ffc09fc23d2fba60d5393b9cf8fdbd70c
MD5 055aede1d315040e6dab75ecb2acde94
BLAKE2b-256 03887caa5d20889933bf5c58a671d5045a0063b3fe16159b23fc7025e2ccc8fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbt_semguard-0.5.1-py3-none-any.whl:

Publisher: publish.yml on yeaight7/dbt-semguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page