Skip to main content

Validate data contracts between dbt models and FastAPI/Pydantic APIs with accurate, low-false-positive schema checks

Project description

🛡️ Data Contract Validator

Catch breaking changes between your dbt models and your FastAPI/Pydantic APIs — before they hit production.

PyPI version Tests License: MIT

🎯 What it solves

Your analytics team changes a dbt model. Your API team's FastAPI service still expects the old shape. Nobody notices until production 500s at 2 AM.

This tool sits on that boundary. It extracts the schema your dbt models produce and the schema your Pydantic models expect, compares them, and fails CI when the data side can no longer satisfy the API side.

   dbt models                 Data Contract Validator                FastAPI / Pydantic
(what the pipeline   ──▶   extract → normalize → compare   ◀──   (what the API expects)
    produces)                     ↓
                          critical issues block the build

Built for trust

A check that gates a deploy is only useful if it doesn't cry wolf. v1.1 re-architected extraction around that principle:

  • Canonical types — dbt varchar and Pydantic str are understood to be the same thing, so you don't get drowned in fake "type mismatch" warnings.
  • A real SQL parser (sqlglot) instead of regex — CTEs, || concatenation, window functions and quoted identifiers are parsed correctly.
  • Confidence-aware — if the tool can't fully resolve a model's columns (e.g. SELECT *), it will warn rather than falsely block your build.

⚡ Quick start

pip install data-contract-validator
# Initialize config + CI workflow in your dbt project
contract-validator init --interactive

# Sanity-check the setup
contract-validator test

# Validate
contract-validator validate

One-off validation (no config file)

# Local dbt project against a local Pydantic models file or directory
contract-validator validate \
  --dbt-project ./my-dbt-project \
  --fastapi-local ./my-api/app/models.py

# dbt project against models in another GitHub repo (microservices)
contract-validator validate \
  --dbt-project . \
  --fastapi-repo "my-org/my-api" \
  --fastapi-path "app/models.py"

🔍 How extraction works (and why it's accurate)

dbt side — tiered, best-source-wins

Tier Source Types Confidence Notes
1 target/catalog.json Real warehouse types high Produced by dbt docs generate. Most accurate.
2 sqlglot SQL parse Inferred (often unknown) medium Trusted column names; enriched with documented types from manifest.json. Detects SELECT *.
3 regex parse Guessed low Last resort. Never used to hard-fail a build.

The tool auto-detects what's available and degrades gracefully — so it works offline in pre-commit and with full type fidelity in a warehouse-connected CI job.

💡 Tip: run dbt docs generate in CI before validating to unlock Tier 1 (real types). Without it, you still get accurate column-presence checks from Tier 2.

FastAPI side

Pydantic / SQLModel classes are parsed from source with Python's ast (no imports executed). Optional[...] controls whether a field is required; table=True SQLModel classes (DB tables, not API contracts) are skipped.

🚦 What gets flagged

Severity Meaning Example
🚨 Critical Blocks the build API requires a column the dbt model no longer produces
⚠️ Warning Worth a look, non-blocking A real type mismatch, or a missing column on a model we couldn't fully resolve
$ contract-validator validate

🛡️ Data Contract Validation Results:
Status:  FAILED
Critical: 1 | Warnings: 0

🚨 Critical Issues (Must Fix):
  💥 user_analytics
     Column: total_orders
     Problem: Target REQUIRES column 'total_orders' but source doesn't provide it
     🔧 Fix: Add column 'total_orders' to source model for table 'user_analytics'

🔧 Configuration (.retl-validator.yml)

version: "1.0"
name: "my-project-contracts"

source:
  dbt:
    project_path: "."
    auto_compile: true
    # Force Tier 2/3 SQL parsing even if catalog/manifest exist:
    disable_manifest: false

target:
  fastapi:
    # GitHub repo:
    type: "github"
    repo: "my-org/my-api"
    path: "app/models.py"
    # ...or local:
    # type: "local"
    # path: "../my-api/app/models.py"

# Optional: explicit mapping for when names don't line up by convention.
mapping:
  tables:
    # target (Pydantic) table : source (dbt) model
    user_analytics: user_analytics_summary
  columns:
    user_analytics:
      # target column : source column
      userId: user_id

validation:
  fail_on: ["missing_tables", "missing_required_columns"]
  warn_on: ["type_mismatches", "missing_optional_columns"]

When do I need mapping?

By default, names are matched across snake_case / camelCase / casing (UserAnalyticsuser_analytics, userIduser_id). Reach for mapping only when a model or column is named so differently that the convention can't bridge it (e.g. Pydantic user_id ↔ dbt customer_identifier).

🐍 Python API

from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor

dbt = DBTExtractor(project_path="./dbt-project")
fastapi = FastAPIExtractor.from_github_repo("my-org/my-api", "app/models.py")

validator = ContractValidator(
    source_extractor=dbt,
    target_extractor=fastapi,
    mapping={"tables": {"user_analytics": "user_analytics_summary"}},  # optional
)
result = validator.validate()

if not result.success:
    for issue in result.critical_issues:
        print(f"💥 {issue.table}.{issue.column}: {issue.message}")

🪝 CI / pre-commit integration

GitHub Actions

contract-validator init generates a workflow for you. Minimal version:

name: 🛡️ Data Contract Validation
on:
  pull_request:
    paths: ["models/**/*.sql", "dbt_project.yml", "**/*models*.py"]
jobs:
  validate-contracts:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v4
        with: { python-version: "3.11" }
      - run: pip install data-contract-validator
      # Optional: `dbt docs generate` here for real warehouse types (Tier 1)
      - run: contract-validator validate --output github
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Pre-commit

contract-validator setup-precommit --install-hooks
repos:
  - repo: https://github.com/OGsiji/data-contract-validator
    rev: v1.1.0
    hooks:
      - id: contract-validation

🧪 Output formats

contract-validator validate --output terminal   # human-friendly (default)
contract-validator validate --output json        # machine-readable for CI
contract-validator validate --output github       # GitHub Actions annotations

🚀 Supported frameworks

Source: dbt (all adapters — Snowflake, BigQuery, Redshift, Postgres, …). Target: FastAPI (Pydantic v2 + SQLModel).

The extractor architecture is intentionally pluggable (BaseExtractorDict[str, Schema] with canonical types), so additional sources/targets can be added without touching the validator. Open an issue to request one.

🛠️ Development & testing

git clone https://github.com/OGsiji/data-contract-validator
cd data-contract-validator

python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"     # or: pip install -e ".[test]"

# Run the suite
pytest

# Lint / format
black data_contract_validator tests

The test suite covers the canonical type system (tests/test_core/test_types.py), the tiered dbt extractor including sqlglot CTE handling and catalog.json (tests/test_extractors/test_dbt.py), and the confidence/mapping behavior of the validator (tests/test_core/test_validator.py).

Adding an extractor

from data_contract_validator.extractors.base import BaseExtractor
from data_contract_validator.core.types import CanonicalType

class MyExtractor(BaseExtractor):
    def extract_schemas(self):
        # return Dict[str, Schema]; use self._make_column(...) so each column
        # carries a canonical_type the validator can compare.
        ...

🗺️ Roadmap

  • Real compatibility semantics (nullability, additive vs. breaking changes)
  • Reporter/logging abstraction (quiet/embeddable core)
  • A canonical, language-neutral contract artifact + baseline/snapshot diffing
  • More targets (Django, SQLAlchemy, GraphQL, OpenAPI)

📄 License

MIT — see LICENSE.

🆘 Support

If this saves you a production incident, please ⭐ the repo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_contract_validator-1.1.0.tar.gz (39.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_contract_validator-1.1.0-py3-none-any.whl (36.7 kB view details)

Uploaded Python 3

File details

Details for the file data_contract_validator-1.1.0.tar.gz.

File metadata

  • Download URL: data_contract_validator-1.1.0.tar.gz
  • Upload date:
  • Size: 39.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for data_contract_validator-1.1.0.tar.gz
Algorithm Hash digest
SHA256 19363e52f84dffe87b6fd70615d3c6f551c089d8e81afc8e67038f89238138ca
MD5 2cfc908e414db4ab99eb223f2f370acc
BLAKE2b-256 7332cd014158d49569544a7db0aaa0c9a4fa378ec09ac37139b1d18a6f40896b

See more details on using hashes here.

File details

Details for the file data_contract_validator-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_contract_validator-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5bfed4aee21b01b345c7ff4e745f8278af705dcd6cd592804786bed22abeac33
MD5 62aacf8b7e6d238b4ca5509d0ce4ce27
BLAKE2b-256 48347d69bede2c8b2dacd161dbe7e266e2b2e32f6d527683a4c46f094ff753ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page