Validate data contracts between dbt models and FastAPI/Pydantic APIs with accurate, low-false-positive schema checks
Project description
🛡️ Data Contract Validator
Catch breaking changes between your dbt models and your FastAPI/Pydantic APIs — before they hit production.
🎯 What it solves
Your analytics team changes a dbt model. Your API team's FastAPI service still expects the old shape. Nobody notices until production 500s at 2 AM.
This tool sits on that boundary. It extracts the schema your dbt models produce and the schema your Pydantic models expect, compares them, and fails CI when the data side can no longer satisfy the API side.
dbt models Data Contract Validator FastAPI / Pydantic
(what the pipeline ──▶ extract → normalize → compare ◀── (what the API expects)
produces) ↓
critical issues block the build
Built for trust
A check that gates a deploy is only useful if it doesn't cry wolf. v1.1 re-architected extraction around that principle:
- Canonical types — dbt
varcharand Pydanticstrare understood to be the same thing, so you don't get drowned in fake "type mismatch" warnings. - A real SQL parser (
sqlglot) instead of regex — CTEs,||concatenation, window functions and quoted identifiers are parsed correctly. - Confidence-aware — if the tool can't fully resolve a model's columns
(e.g.
SELECT *), it will warn rather than falsely block your build.
⚡ Quick start
pip install data-contract-validator
# Initialize config + CI workflow in your dbt project
contract-validator init --interactive
# Sanity-check the setup
contract-validator test
# Validate
contract-validator validate
One-off validation (no config file)
# Local dbt project against a local Pydantic models file or directory
contract-validator validate \
--dbt-project ./my-dbt-project \
--fastapi-local ./my-api/app/models.py
# dbt project against models in another GitHub repo (microservices)
contract-validator validate \
--dbt-project . \
--fastapi-repo "my-org/my-api" \
--fastapi-path "app/models.py"
🔍 How extraction works (and why it's accurate)
dbt side — tiered, best-source-wins
| Tier | Source | Types | Confidence | Notes |
|---|---|---|---|---|
| 1 | target/catalog.json |
Real warehouse types | high | Produced by dbt docs generate. Most accurate. |
| 2 | sqlglot SQL parse |
Inferred (often unknown) | medium | Trusted column names; enriched with documented types from manifest.json. Detects SELECT *. |
| 3 | regex parse | Guessed | low | Last resort. Never used to hard-fail a build. |
The tool auto-detects what's available and degrades gracefully — so it works offline in pre-commit and with full type fidelity in a warehouse-connected CI job.
💡 Tip: run
dbt docs generatein CI before validating to unlock Tier 1 (real types). Without it, you still get accurate column-presence checks from Tier 2.
FastAPI side
Pydantic / SQLModel classes are parsed from source with Python's ast (no
imports executed). Optional[...] controls whether a field is required;
table=True SQLModel classes (DB tables, not API contracts) are skipped.
🚦 What gets flagged
| Severity | Meaning | Example |
|---|---|---|
| 🚨 Critical | Blocks the build | API requires a column the dbt model no longer produces |
| ⚠️ Warning | Worth a look, non-blocking | A real type mismatch, or a missing column on a model we couldn't fully resolve |
$ contract-validator validate
🛡️ Data Contract Validation Results:
Status: ❌ FAILED
Critical: 1 | Warnings: 0
🚨 Critical Issues (Must Fix):
💥 user_analytics
Column: total_orders
Problem: Target REQUIRES column 'total_orders' but source doesn't provide it
🔧 Fix: Add column 'total_orders' to source model for table 'user_analytics'
🔧 Configuration (.retl-validator.yml)
version: "1.0"
name: "my-project-contracts"
source:
dbt:
project_path: "."
auto_compile: true
# Force Tier 2/3 SQL parsing even if catalog/manifest exist:
disable_manifest: false
target:
fastapi:
# GitHub repo:
type: "github"
repo: "my-org/my-api"
path: "app/models.py"
# ...or local:
# type: "local"
# path: "../my-api/app/models.py"
# Optional: explicit mapping for when names don't line up by convention.
mapping:
tables:
# target (Pydantic) table : source (dbt) model
user_analytics: user_analytics_summary
columns:
user_analytics:
# target column : source column
userId: user_id
validation:
fail_on: ["missing_tables", "missing_required_columns"]
warn_on: ["type_mismatches", "missing_optional_columns"]
When do I need mapping?
Most of the time you don't. Names are matched automatically across:
snake_case/camelCase/ casing —UserAnalytics→user_analytics,userId→user_id- plural ↔ singular — dbt's plural
usersmatches Pydantic'sUser(→user) with no config (and it won't over-match —addressis never confused withaddres).
Reach for mapping only when a model or column is named so differently that
convention can't bridge it (e.g. Pydantic user_id ↔ dbt customer_identifier).
🐍 Python API
from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor
dbt = DBTExtractor(project_path="./dbt-project")
fastapi = FastAPIExtractor.from_github_repo("my-org/my-api", "app/models.py")
validator = ContractValidator(
source_extractor=dbt,
target_extractor=fastapi,
mapping={"tables": {"user_analytics": "user_analytics_summary"}}, # optional
)
result = validator.validate()
if not result.success:
for issue in result.critical_issues:
print(f"💥 {issue.table}.{issue.column}: {issue.message}")
🪝 CI / pre-commit integration
GitHub Actions
contract-validator init generates a workflow for you. Minimal version:
name: 🛡️ Data Contract Validation
on:
pull_request:
paths: ["models/**/*.sql", "dbt_project.yml", "**/*models*.py"]
jobs:
validate-contracts:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with: { python-version: "3.11" }
- run: pip install data-contract-validator
# Optional: `dbt docs generate` here for real warehouse types (Tier 1)
- run: contract-validator validate --output github
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Pre-commit
contract-validator setup-precommit --install-hooks
repos:
- repo: https://github.com/OGsiji/data-contract-validator
rev: v1.1.0
hooks:
- id: contract-validation
🧪 Output formats
contract-validator validate --output terminal # human-friendly (default)
contract-validator validate --output json # machine-readable for CI
contract-validator validate --output github # GitHub Actions annotations
🚀 Supported frameworks
Source: dbt (all adapters — Snowflake, BigQuery, Redshift, Postgres, …). Target: FastAPI (Pydantic v2 + SQLModel).
The extractor architecture is intentionally pluggable (BaseExtractor →
Dict[str, Schema] with canonical types), so additional sources/targets can be
added without touching the validator. Open an issue
to request one.
🛠️ Development & testing
git clone https://github.com/OGsiji/data-contract-validator
cd data-contract-validator
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]" # or: pip install -e ".[test]"
# Run the suite
pytest
# Lint / format
black data_contract_validator tests
The test suite covers the canonical type system (tests/test_core/test_types.py),
the tiered dbt extractor including sqlglot CTE handling and catalog.json
(tests/test_extractors/test_dbt.py), and the confidence/mapping behavior of
the validator (tests/test_core/test_validator.py).
Adding an extractor
from data_contract_validator.extractors.base import BaseExtractor
from data_contract_validator.core.types import CanonicalType
class MyExtractor(BaseExtractor):
def extract_schemas(self):
# return Dict[str, Schema]; use self._make_column(...) so each column
# carries a canonical_type the validator can compare.
...
🗺️ Roadmap
- Real compatibility semantics (nullability, additive vs. breaking changes)
- Reporter/logging abstraction (quiet/embeddable core)
- A canonical, language-neutral contract artifact + baseline/snapshot diffing
- More targets (Django, SQLAlchemy, GraphQL, OpenAPI)
📄 License
MIT — see LICENSE.
🆘 Support
- 🐛 Issues: https://github.com/OGsiji/data-contract-validator/issues
- 📧 Email: ogunniransiji@gmail.com
If this saves you a production incident, please ⭐ the repo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_contract_validator-1.1.1.tar.gz.
File metadata
- Download URL: data_contract_validator-1.1.1.tar.gz
- Upload date:
- Size: 40.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3816bed94ab475d96e5bdef604781cee990ef7bba1e4868c2bda9147405bb6b8
|
|
| MD5 |
6f3267d358bc4b27075440622f6def66
|
|
| BLAKE2b-256 |
7f365e5f164a37eecc90ffdc2803f114d9e5793787f54ad84624c3e6c07a3f75
|
File details
Details for the file data_contract_validator-1.1.1-py3-none-any.whl.
File metadata
- Download URL: data_contract_validator-1.1.1-py3-none-any.whl
- Upload date:
- Size: 37.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fee0a1410b021fcb48ee390a897b506c4899ab9ec758f9cf0a5448cf1586424
|
|
| MD5 |
7ab6ae52280c16246a8c3a4e46acb0d4
|
|
| BLAKE2b-256 |
79b256d21718cfbdf18ab9802ab5234adee32c51110b6cd5b2d4fa5ae9529d2e
|