Validate data contracts between dbt models and FastAPI/Pydantic APIs with accurate, low-false-positive schema checks

These details have not been verified by PyPI

Project description

🛡️ Data Contract Validator

Catch breaking changes between your dbt models and your FastAPI/Pydantic APIs — before they hit production.

🎯 What it solves

Your analytics team changes a dbt model. Your API team's FastAPI service still expects the old shape. Nobody notices until production 500s at 2 AM.

This tool sits on that boundary. It extracts the schema your dbt models produce and the schema your Pydantic models expect, compares them, and fails CI when the data side can no longer satisfy the API side.

   dbt models                 Data Contract Validator                FastAPI / Pydantic
(what the pipeline   ──▶   extract → normalize → compare   ◀──   (what the API expects)
    produces)                     ↓
                          critical issues block the build

Built for trust

A check that gates a deploy is only useful if it doesn't cry wolf. v1.1 re-architected extraction around that principle:

Canonical types — dbt varchar and Pydantic str are understood to be the same thing, so you don't get drowned in fake "type mismatch" warnings.
A real SQL parser (sqlglot) instead of regex — CTEs, || concatenation, window functions and quoted identifiers are parsed correctly.
Confidence-aware — if the tool can't fully resolve a model's columns (e.g. SELECT *), it will warn rather than falsely block your build.

⚡ Quick start

pip install data-contract-validator

# Initialize config + CI workflow in your dbt project
contract-validator init --interactive

# Sanity-check the setup
contract-validator test

# Validate
contract-validator validate

🚀 Getting started, step by step

If you're setting this up on a project for the first time, the order below avoids the sharp edges:

Install into the same environment dbt runs in (not a separate venv) — the tool needs to see your dbt project:
```
pip install data-contract-validator
```
Already have .retl-validator.yml committed by a teammate? Skip to step 5.
Generate the config + CI workflow (one-time):
```
contract-validator init --interactive
```
You'll be asked: where your dbt project is, which API framework you use, whether your models live in this local project or a different GitHub repo, and then the local path (or the org/repo + path within it). It's asked explicitly rather than guessed from the path's shape — a local path like app/models is syntactically identical to a GitHub org/repo string, so there's no reliable way to infer which one you mean. If you pick GitHub, it checks the path actually exists before writing the config — so a typo surfaces here instead of at validate time.

init refuses to touch an existing .retl-validator.yml or workflow file — it won't clobber hand-added mapping entries just because you upgraded the package and re-ran init. Pass --force if you really want to regenerate them from the new version's defaults.
Pre-commit hook: init --interactive asks whether you want one set up right after creating the config and CI workflow — say yes there and it's done. To add one later (or if you used non-interactive init, which doesn't prompt), run it standalone:
```
contract-validator setup-precommit --install-hooks
```
If the target repo is private, set a token before running anything that talks to GitHub locally:
```
export GITHUB_TOKEN=$(gh auth token)   # or a PAT with repo read access
```
See Private GitHub repos need GITHUB_TOKEN below for why this is easy to miss.
Sanity-check the setup:
```
contract-validator test
```
Confirms the config parses, the dbt project is found, and the target (local path or GitHub path) is reachable. If this fails, validate will fail the same way — fix it here first.
Run it:
```
contract-validator validate
```
When it reports a critical issue, diagnose before assuming your dbt model is wrong:
- Real missing column/table → fix the dbt model.
- Target name doesn't match the dbt model by convention (renamed/prefixed) → add an entry under mapping.tables in .retl-validator.yml (see When do I need mapping?).
- A table that's genuinely populated by something other than dbt (e.g. a separate streaming pipeline) and has no source model on purpose → add it to mapping.exclude. table=True alone is not used to infer this automatically — see FastAPI side for why.
For accurate type-checking (not just column-presence checks), run dbt docs generate before validate so it picks up catalog.json (Tier 1, real warehouse types) instead of inferring from SQL text — see How extraction works below.

One-off validation (no config file)

# Local dbt project against a local Pydantic models file or directory
contract-validator validate \
  --dbt-project ./my-dbt-project \
  --fastapi-local ./my-api/app/models.py

# dbt project against models in another GitHub repo (microservices)
contract-validator validate \
  --dbt-project . \
  --fastapi-repo "my-org/my-api" \
  --fastapi-path "app/models.py"

🔍 How extraction works (and why it's accurate)

dbt side — tiered, best-source-wins

Tier	Source	Types	Confidence	Notes
1	`target/catalog.json`	Real warehouse types	high	Produced by `dbt docs generate`. Most accurate.
2	`sqlglot` SQL parse	Inferred (often unknown)	medium	Trusted column names; enriched with documented types from `manifest.json`. Detects `SELECT *`.
3	regex parse	Guessed	low	Last resort. Never used to hard-fail a build.

The tool auto-detects what's available and degrades gracefully — so it works offline in pre-commit and with full type fidelity in a warehouse-connected CI job.

💡 Tip: run dbt docs generate in CI before validating to unlock Tier 1 (real types). Without it, you still get accurate column-presence checks from Tier 2. The workflow init generates includes this step already, commented out — it needs your warehouse adapter and credentials filled in, which can't be guessed, so it isn't active by default.

FastAPI side

Pydantic / SQLModel classes are parsed from source with Python's ast (no imports executed). Optional[...] controls whether a field is required. An explicit __tablename__ is used as the table name when present; otherwise the class name is converted to snake_case.

table=True SQLModel classes are validated the same as any other class — they are not skipped. Whether a table is meant to come from dbt is business knowledge that isn't recoverable from the Python source: two structurally identical table=True classes can need opposite treatment (one is a normal dbt-fed table your API also returns directly; another is populated by a Kafka stream and was never meant to have a dbt model). Use mapping.exclude to state the latter case explicitly rather than relying on table=True to imply it.

🚦 What gets flagged

Severity	Meaning	Example
🚨 Critical	Blocks the build	API requires a column the dbt model no longer produces
⚠️ Warning	Worth a look, non-blocking	A real type mismatch, or a missing column on a model we couldn't fully resolve

$ contract-validator validate

🛡️ Data Contract Validation Results:
Status: ❌ FAILED
Critical: 1 | Warnings: 0

🚨 Critical Issues (Must Fix):
  💥 user_analytics
     Column: total_orders
     Problem: Target REQUIRES column 'total_orders' but source doesn't provide it
     🔧 Fix: Add column 'total_orders' to source model for table 'user_analytics'

🔧 Configuration (`.retl-validator.yml`)

version: "1.0"
name: "my-project-contracts"

source:
  dbt:
    project_path: "."
    auto_compile: true
    # Force Tier 2/3 SQL parsing even if catalog/manifest exist:
    disable_manifest: false

target:
  fastapi:
    # GitHub repo:
    type: "github"
    repo: "my-org/my-api"
    path: "app/models.py"
    # ...or local:
    # type: "local"
    # path: "../my-api/app/models.py"

# Optional: explicit mapping for when names don't line up by convention.
mapping:
  tables:
    # target (Pydantic) table : source (dbt) model
    user_analytics: user_analytics_summary
  columns:
    user_analytics:
      # target column : source column
      userId: user_id
  # Target tables with no source model on purpose (e.g. Kafka-populated,
  # not dbt) -- see "When do I need mapping?" below.
  exclude:
    - feed_interaction

validation:
  fail_on: ["missing_tables", "missing_required_columns"]
  warn_on: ["type_mismatches", "missing_optional_columns"]

Private GitHub repos need `GITHUB_TOKEN`

If target.*.repo points at a private repository, contract-validator needs a token with read access to it. Where that token comes from is different locally vs. in CI — and the CI case has a sharp edge worth understanding before it silently fails on a PR.

Locally, set the GITHUB_TOKEN environment variable before running the CLI. On bash/zsh that's export (there's nothing to install — export just makes the variable visible to the contract-validator process you run next):

export GITHUB_TOKEN=$(gh auth token)   # or a PAT with repo read access
contract-validator validate

GitHub's API 404s (not 403s) an unauthenticated request to a private path, so without a token this looks identical to a plain typo in path — contract-validator init --interactive and contract-validator test both check target.*.path actually exists and will point you at this if the lookup 404s with no token set.

In CI, the workflow init generates for a GitHub target wires up GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }} — a token you create, not the auto-provided secrets.GITHUB_TOKEN. That auto-provided token only has access to the repository the workflow is running in, so if your dbt repo and your API repo are different repos, it silently can't read the target the first time that target is private — and a PAT works identically for a public target too, so there's no reason to default to the token that only sometimes works. To finish the setup the generated workflow expects:

Create a token with read access to the target repo — a fine-grained PAT scoped to just that repo's Contents (read-only) is the least-privilege option; a classic PAT with the repo scope also works.
In the repo running the workflow (your dbt repo): Settings → Secrets and variables → Actions → New repository secret. Name it API_REPO_TOKEN exactly (that's the name the generated workflow already references) and paste the token as the value.
⚠️ GitHub rejects any secret name starting with GITHUB_ — it's a reserved prefix. You cannot create a secret literally called GITHUB_TOKEN; that's not a naming suggestion, the UI will refuse it. That's exactly why the workflow's secret is named API_REPO_TOKEN instead, even though the environment variable it feeds is GITHUB_TOKEN — two different things with confusingly similar names:
```
env:
  GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}
#  ^^^^^^^^^^^   local variable name, can be anything -- the CLI
#                just needs it called GITHUB_TOKEN to find it
#                            ^^^^^^^^^^^^^^ the *secret's* name --
#                            this is what GitHub restricts
```

Skip all of this for a local target — init omits the whole env: block since a local target never talks to the GitHub API at all.

When do I need `mapping`?

Most of the time you don't. Names are matched automatically across:

snake_case / camelCase / casing — UserAnalytics → user_analytics, userId → user_id
plural ↔ singular — dbt's plural users matches Pydantic's User (→ user) with no config (and it won't over-match — address is never confused with addres).

Reach for mapping.tables / mapping.columns only when a model or column is named so differently that convention can't bridge it (e.g. Pydantic user_id ↔ dbt customer_identifier).

mapping.exclude is different — it's not about renamed models, it's for a target table that has no source model on purpose, because it's populated by something other than dbt (a Kafka stream, a cron job, etc.). This can't be inferred from the code (a table=True SQLModel class looks identical whether or not dbt is supposed to feed it), so it has to be a deliberate, human-stated exception:

mapping:
  exclude:
    - feed_interaction
    - affiliate_reward

Anything not listed is validated normally — including table=True classes, which are treated the same as any other target and are not silently skipped.

🐍 Python API

from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor

dbt = DBTExtractor(project_path="./dbt-project")
fastapi = FastAPIExtractor.from_github_repo("my-org/my-api", "app/models.py")

validator = ContractValidator(
    source_extractor=dbt,
    target_extractor=fastapi,
    mapping={"tables": {"user_analytics": "user_analytics_summary"}},  # optional
)
result = validator.validate()

if not result.success:
    for issue in result.critical_issues:
        print(f"💥 {issue.table}.{issue.column}: {issue.message}")

🪝 CI / pre-commit integration

GitHub Actions

contract-validator init generates a workflow for you. Minimal version:

name: 🛡️ Data Contract Validation
on:
  pull_request:
    paths: ["models/**/*.sql", "dbt_project.yml", "**/*models*.py"]
jobs:
  validate-contracts:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v4
        with: { python-version: "3.11" }
      - run: pip install data-contract-validator
      # Optional: `dbt docs generate` here for real warehouse types (Tier 1)
      - run: contract-validator validate --output github
        env:
          GITHUB_TOKEN: ${{ secrets.API_REPO_TOKEN }}

GITHUB_TOKEN here is only needed if target is a github repo (init omits the whole env: block for a local target). secrets.API_REPO_TOKEN is a token you create yourself, not GitHub's auto-provided secrets.GITHUB_TOKEN — see Private GitHub repos need GITHUB_TOKEN above for why, and how to set it up.

Pre-commit

contract-validator setup-precommit --install-hooks

repos:
  - repo: https://github.com/OGsiji/data-contract-validator
    rev: v1.1.0
    hooks:
      - id: contract-validation

🧪 Output formats

contract-validator validate --output terminal   # human-friendly (default)
contract-validator validate --output json        # machine-readable for CI
contract-validator validate --output github       # GitHub Actions annotations

🚀 Supported frameworks

Source: dbt (all adapters — Snowflake, BigQuery, Redshift, Postgres, …). Target: FastAPI (Pydantic v2 + SQLModel).

The extractor architecture is intentionally pluggable (BaseExtractor → Dict[str, Schema] with canonical types), so additional sources/targets can be added without touching the validator. Open an issue to request one.

🛠️ Development & testing

git clone https://github.com/OGsiji/data-contract-validator
cd data-contract-validator

python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"     # or: pip install -e ".[test]"

# Run the suite
pytest

# Lint / format
black data_contract_validator tests

The test suite covers the canonical type system (tests/test_core/test_types.py), the tiered dbt extractor including sqlglot CTE handling and catalog.json (tests/test_extractors/test_dbt.py), and the confidence/mapping behavior of the validator (tests/test_core/test_validator.py).

Adding an extractor

from data_contract_validator.extractors.base import BaseExtractor
from data_contract_validator.core.types import CanonicalType

class MyExtractor(BaseExtractor):
    def extract_schemas(self):
        # return Dict[str, Schema]; use self._make_column(...) so each column
        # carries a canonical_type the validator can compare.
        ...

🗺️ Roadmap

Real compatibility semantics (nullability, additive vs. breaking changes)
Reporter/logging abstraction (quiet/embeddable core)
A canonical, language-neutral contract artifact + baseline/snapshot diffing
More targets (Django, SQLAlchemy, GraphQL, OpenAPI)

📄 License

MIT — see LICENSE.

🆘 Support

🐛 Issues: https://github.com/OGsiji/data-contract-validator/issues
📧 Email: ogunniransiji@gmail.com

If this saves you a production incident, please ⭐ the repo.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.7

Jul 3, 2026

1.1.1

Jun 30, 2026

1.1.0

Jun 30, 2026

1.0.5

Oct 24, 2025

1.0.4

Sep 17, 2025

1.0.4b0 pre-release

Sep 18, 2025

1.0.4a0 pre-release

Sep 18, 2025

1.0.3

Aug 10, 2025

1.0.2

Aug 10, 2025

1.0.2.dev1 pre-release

Aug 10, 2025

1.0.1

Aug 10, 2025

1.0.0

Aug 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_contract_validator-1.1.7.tar.gz (51.4 kB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

data_contract_validator-1.1.7-py3-none-any.whl (44.3 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file data_contract_validator-1.1.7.tar.gz.

File metadata

Download URL: data_contract_validator-1.1.7.tar.gz
Upload date: Jul 3, 2026
Size: 51.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for data_contract_validator-1.1.7.tar.gz
Algorithm	Hash digest
SHA256	`e31e13ae518593ac7d2852a750c9563e9db977ab01dd342c7c09358f7b858df2`
MD5	`2e9f863c70bea77879312ecc32f000cf`
BLAKE2b-256	`07f203f12817ac056a3005979c953cd4bb8096e5a440a417d3787ac2bb92f5b1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_contract_validator-1.1.7.tar.gz:

Publisher: publish.yml on OGsiji/data-contract-validator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: data_contract_validator-1.1.7.tar.gz
- Subject digest: e31e13ae518593ac7d2852a750c9563e9db977ab01dd342c7c09358f7b858df2
- Sigstore transparency entry: 2064490109
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: OGsiji/data-contract-validator@09f28da9305b36c6f52f8996aca01b7bf9e1222e
- Branch / Tag: refs/tags/v1.1.7
- Owner: https://github.com/OGsiji
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@09f28da9305b36c6f52f8996aca01b7bf9e1222e
- Trigger Event: release

File details

Details for the file data_contract_validator-1.1.7-py3-none-any.whl.

File metadata

Download URL: data_contract_validator-1.1.7-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 44.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for data_contract_validator-1.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3fec515db24dc49c6a0831f80de494de689191ed679343d2f1e9428f6b22ee07`
MD5	`12d6b4160a8acff91150e32bc505fce2`
BLAKE2b-256	`d7cbc323354f9b2f087de26143c96a22322abdd399d7f532e36fd8f58fbcfe33`

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_contract_validator-1.1.7-py3-none-any.whl:

Publisher: publish.yml on OGsiji/data-contract-validator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: data_contract_validator-1.1.7-py3-none-any.whl
- Subject digest: 3fec515db24dc49c6a0831f80de494de689191ed679343d2f1e9428f6b22ee07
- Sigstore transparency entry: 2064490124
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: OGsiji/data-contract-validator@09f28da9305b36c6f52f8996aca01b7bf9e1222e
- Branch / Tag: refs/tags/v1.1.7
- Owner: https://github.com/OGsiji
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@09f28da9305b36c6f52f8996aca01b7bf9e1222e
- Trigger Event: release

data-contract-validator 1.1.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🛡️ Data Contract Validator

🎯 What it solves

Built for trust

⚡ Quick start

🚀 Getting started, step by step

One-off validation (no config file)

🔍 How extraction works (and why it's accurate)

dbt side — tiered, best-source-wins

FastAPI side

🚦 What gets flagged

🔧 Configuration (.retl-validator.yml)

Private GitHub repos need GITHUB_TOKEN

When do I need mapping?

🐍 Python API

🪝 CI / pre-commit integration

GitHub Actions

Pre-commit

🧪 Output formats

🚀 Supported frameworks

🛠️ Development & testing

Adding an extractor

🗺️ Roadmap

📄 License

🆘 Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

🔧 Configuration (`.retl-validator.yml`)

Private GitHub repos need `GITHUB_TOKEN`

When do I need `mapping`?