Skip to main content

Validate data contracts between dbt models and FastAPI/Pydantic APIs with accurate, low-false-positive schema checks

Project description

🛡️ Data Contract Validator

Catch breaking changes between your dbt models and your FastAPI/Pydantic APIs — before they hit production.

PyPI version Tests License: MIT

🎯 What it solves

Your analytics team changes a dbt model. Your API team's FastAPI service still expects the old shape. Nobody notices until production 500s at 2 AM.

This tool sits on that boundary. It extracts the schema your dbt models produce and the schema your Pydantic models expect, compares them, and fails CI when the data side can no longer satisfy the API side.

   dbt models                 Data Contract Validator                FastAPI / Pydantic
(what the pipeline   ──▶   extract → normalize → compare   ◀──   (what the API expects)
    produces)                     ↓
                          critical issues block the build

Built for trust

A check that gates a deploy is only useful if it doesn't cry wolf. v1.1 re-architected extraction around that principle:

  • Canonical types — dbt varchar and Pydantic str are understood to be the same thing, so you don't get drowned in fake "type mismatch" warnings.
  • A real SQL parser (sqlglot) instead of regex — CTEs, || concatenation, window functions and quoted identifiers are parsed correctly.
  • Confidence-aware — if the tool can't fully resolve a model's columns (e.g. SELECT *), it will warn rather than falsely block your build.

⚡ Quick start

pip install data-contract-validator
# Initialize config + CI workflow in your dbt project
contract-validator init --interactive

# Sanity-check the setup
contract-validator test

# Validate
contract-validator validate

One-off validation (no config file)

# Local dbt project against a local Pydantic models file or directory
contract-validator validate \
  --dbt-project ./my-dbt-project \
  --fastapi-local ./my-api/app/models.py

# dbt project against models in another GitHub repo (microservices)
contract-validator validate \
  --dbt-project . \
  --fastapi-repo "my-org/my-api" \
  --fastapi-path "app/models.py"

🔍 How extraction works (and why it's accurate)

dbt side — tiered, best-source-wins

Tier Source Types Confidence Notes
1 target/catalog.json Real warehouse types high Produced by dbt docs generate. Most accurate.
2 sqlglot SQL parse Inferred (often unknown) medium Trusted column names; enriched with documented types from manifest.json. Detects SELECT *.
3 regex parse Guessed low Last resort. Never used to hard-fail a build.

The tool auto-detects what's available and degrades gracefully — so it works offline in pre-commit and with full type fidelity in a warehouse-connected CI job.

💡 Tip: run dbt docs generate in CI before validating to unlock Tier 1 (real types). Without it, you still get accurate column-presence checks from Tier 2.

FastAPI side

Pydantic / SQLModel classes are parsed from source with Python's ast (no imports executed). Optional[...] controls whether a field is required; table=True SQLModel classes (DB tables, not API contracts) are skipped.

🚦 What gets flagged

Severity Meaning Example
🚨 Critical Blocks the build API requires a column the dbt model no longer produces
⚠️ Warning Worth a look, non-blocking A real type mismatch, or a missing column on a model we couldn't fully resolve
$ contract-validator validate

🛡️ Data Contract Validation Results:
Status:  FAILED
Critical: 1 | Warnings: 0

🚨 Critical Issues (Must Fix):
  💥 user_analytics
     Column: total_orders
     Problem: Target REQUIRES column 'total_orders' but source doesn't provide it
     🔧 Fix: Add column 'total_orders' to source model for table 'user_analytics'

🔧 Configuration (.retl-validator.yml)

version: "1.0"
name: "my-project-contracts"

source:
  dbt:
    project_path: "."
    auto_compile: true
    # Force Tier 2/3 SQL parsing even if catalog/manifest exist:
    disable_manifest: false

target:
  fastapi:
    # GitHub repo:
    type: "github"
    repo: "my-org/my-api"
    path: "app/models.py"
    # ...or local:
    # type: "local"
    # path: "../my-api/app/models.py"

# Optional: explicit mapping for when names don't line up by convention.
mapping:
  tables:
    # target (Pydantic) table : source (dbt) model
    user_analytics: user_analytics_summary
  columns:
    user_analytics:
      # target column : source column
      userId: user_id

validation:
  fail_on: ["missing_tables", "missing_required_columns"]
  warn_on: ["type_mismatches", "missing_optional_columns"]

When do I need mapping?

Most of the time you don't. Names are matched automatically across:

  • snake_case / camelCase / casing — UserAnalyticsuser_analytics, userIduser_id
  • plural ↔ singular — dbt's plural users matches Pydantic's User (→ user) with no config (and it won't over-match — address is never confused with addres).

Reach for mapping only when a model or column is named so differently that convention can't bridge it (e.g. Pydantic user_id ↔ dbt customer_identifier).

🐍 Python API

from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor

dbt = DBTExtractor(project_path="./dbt-project")
fastapi = FastAPIExtractor.from_github_repo("my-org/my-api", "app/models.py")

validator = ContractValidator(
    source_extractor=dbt,
    target_extractor=fastapi,
    mapping={"tables": {"user_analytics": "user_analytics_summary"}},  # optional
)
result = validator.validate()

if not result.success:
    for issue in result.critical_issues:
        print(f"💥 {issue.table}.{issue.column}: {issue.message}")

🪝 CI / pre-commit integration

GitHub Actions

contract-validator init generates a workflow for you. Minimal version:

name: 🛡️ Data Contract Validation
on:
  pull_request:
    paths: ["models/**/*.sql", "dbt_project.yml", "**/*models*.py"]
jobs:
  validate-contracts:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v4
        with: { python-version: "3.11" }
      - run: pip install data-contract-validator
      # Optional: `dbt docs generate` here for real warehouse types (Tier 1)
      - run: contract-validator validate --output github
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Pre-commit

contract-validator setup-precommit --install-hooks
repos:
  - repo: https://github.com/OGsiji/data-contract-validator
    rev: v1.1.0
    hooks:
      - id: contract-validation

🧪 Output formats

contract-validator validate --output terminal   # human-friendly (default)
contract-validator validate --output json        # machine-readable for CI
contract-validator validate --output github       # GitHub Actions annotations

🚀 Supported frameworks

Source: dbt (all adapters — Snowflake, BigQuery, Redshift, Postgres, …). Target: FastAPI (Pydantic v2 + SQLModel).

The extractor architecture is intentionally pluggable (BaseExtractorDict[str, Schema] with canonical types), so additional sources/targets can be added without touching the validator. Open an issue to request one.

🛠️ Development & testing

git clone https://github.com/OGsiji/data-contract-validator
cd data-contract-validator

python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"     # or: pip install -e ".[test]"

# Run the suite
pytest

# Lint / format
black data_contract_validator tests

The test suite covers the canonical type system (tests/test_core/test_types.py), the tiered dbt extractor including sqlglot CTE handling and catalog.json (tests/test_extractors/test_dbt.py), and the confidence/mapping behavior of the validator (tests/test_core/test_validator.py).

Adding an extractor

from data_contract_validator.extractors.base import BaseExtractor
from data_contract_validator.core.types import CanonicalType

class MyExtractor(BaseExtractor):
    def extract_schemas(self):
        # return Dict[str, Schema]; use self._make_column(...) so each column
        # carries a canonical_type the validator can compare.
        ...

🗺️ Roadmap

  • Real compatibility semantics (nullability, additive vs. breaking changes)
  • Reporter/logging abstraction (quiet/embeddable core)
  • A canonical, language-neutral contract artifact + baseline/snapshot diffing
  • More targets (Django, SQLAlchemy, GraphQL, OpenAPI)

📄 License

MIT — see LICENSE.

🆘 Support

If this saves you a production incident, please ⭐ the repo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_contract_validator-1.1.1.tar.gz (40.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_contract_validator-1.1.1-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file data_contract_validator-1.1.1.tar.gz.

File metadata

  • Download URL: data_contract_validator-1.1.1.tar.gz
  • Upload date:
  • Size: 40.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for data_contract_validator-1.1.1.tar.gz
Algorithm Hash digest
SHA256 3816bed94ab475d96e5bdef604781cee990ef7bba1e4868c2bda9147405bb6b8
MD5 6f3267d358bc4b27075440622f6def66
BLAKE2b-256 7f365e5f164a37eecc90ffdc2803f114d9e5793787f54ad84624c3e6c07a3f75

See more details on using hashes here.

File details

Details for the file data_contract_validator-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for data_contract_validator-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7fee0a1410b021fcb48ee390a897b506c4899ab9ec758f9cf0a5448cf1586424
MD5 7ab6ae52280c16246a8c3a4e46acb0d4
BLAKE2b-256 79b256d21718cfbdf18ab9802ab5234adee32c51110b6cd5b2d4fa5ae9529d2e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page