Prevent production API breaks by validating data contracts between DBT models and API frameworks
Project description
๐ก๏ธ Data Contract Validator
Prevent production API breaks by validating data contracts between your data pipelines and API frameworks
๐ฏ What This Solves
Ever deployed a DBT model change only to break your FastAPI in production? This tool prevents that by validating data contracts between your data pipelines and APIs before deployment.
DBT Models Contract FastAPI Models
(What data Validator (What APIs
produces) โ๏ธ VALIDATES โ๏ธ expect)
โ โ โ
Schema Finds Schema
Extraction Mismatches Extraction
โก Quick Start
Installation
pip install data-contract-validator
30-Second Setup
# 1. Initialize in your project
contract-validator init --interactive
# 2. Test setup
contract-validator test
# 3. Validate contracts
contract-validator validate
# 4. Commit and push - you're protected! ๐ก๏ธ
Basic Usage
# Validate local DBT project against FastAPI models
contract-validator validate \
--dbt-project ./my-dbt-project \
--fastapi-local ./my-api/models.py
# Validate across repositories (microservices)
contract-validator validate \
--dbt-project . \
--fastapi-repo "my-org/my-api-repo" \
--fastapi-path "app/models.py"
๐ Real Example: Production Validation
Actual output from a production analytics project:
$ contract-validator validate
๐ Starting contract validation...
๐ Extracting source schemas...
โ
Found 14 DBT models (user_analytics_summary: 54 columns)
๐ฏ Extracting target schemas...
โ
Found 3 FastAPI models
๐ Validating schema compatibility...
๐ก๏ธ Results:
โ
PASSED - 0 critical issues (no production breaks!)
โ ๏ธ 42 warnings (type mismatches to review)
Issues caught:
โ ๏ธ user_analytics_summary.age_years: source 'varchar' vs target 'integer'
โ ๏ธ user_analytics_summary.is_verified: source 'varchar' vs target 'boolean'
โ ๏ธ user_analytics_summary.user_created_at: source 'varchar' vs target 'timestamp'
๐ Your API contracts are protected!
๐จ What It Prevents
Before Data Contract Validation:
-- Analytics team changes DBT model
select
user_id,
email,
-- total_orders, โ REMOVED this column
revenue
from users
# API team's FastAPI model (unchanged)
class UserAnalytics(BaseModel):
user_id: str
email: str
total_orders: int # โ Still expects this!
revenue: float
Result: ๐ฅ Production API breaks, angry customers, 2AM debugging
After Data Contract Validation:
$ git push
โ VALIDATION FAILED
๐ฅ user_analytics.total_orders: FastAPI REQUIRES column but DBT removed it
๐ง Fix: Add 'total_orders' back to DBT model or update FastAPI model
# Push blocked until fixed โ
Result: ๐ก๏ธ Production protected, issues caught in CI/CD
๐ ๏ธ Pre-commit Integration
Automatic Setup (Recommended)
# Initialize with pre-commit support
contract-validator init --interactive
contract-validator setup-precommit --install-hooks
# Now every commit validates contracts automatically! ๐ก๏ธ
Manual Setup
If you prefer manual setup:
-
Install pre-commit:
pip install pre-commit
-
Add to
.pre-commit-config.yaml:repos: - repo: https://github.com/OGsiji/data-contract-validator rev: v1.0.0 hooks: - id: contract-validation name: Validate Data Contracts files: '^(.*models.*\.(sql|py)|\.retl-validator\.yml|dbt_project\.yml)$'
-
Install hooks:
pre-commit install
How It Works
$ git add models/user_analytics.sql
$ git commit -m "update user analytics model"
# Pre-commit automatically runs:
๐ Validating Data Contracts...
โ
Contract validation passed
[main abc1234] update user analytics model
On Validation Failure
$ git commit -m "remove important column"
๐ Validating Data Contracts...
โ CRITICAL: user_analytics.total_revenue missing
๐ก Fix the issue before committing
# Commit blocked until fixed! ๐ก๏ธ
Skip Validation (Emergency Only)
# Only for emergencies!
git commit -m "emergency fix" --no-verify
Benefits of Pre-commit Integration
- โ Catches issues before they reach CI/CD
- โ Faster feedback loop (seconds, not minutes)
- โ No broken commits in your git history
- โ Team protection - everyone gets validation
- โ Zero configuration after setup
๐ฆ GitHub Actions Integration
Add this to .github/workflows/validate-contracts.yml:
name: ๐ก๏ธ Data Contract Validation
on:
pull_request:
paths:
- 'models/**/*.sql'
- 'dbt_project.yml'
- '**/*models*.py'
jobs:
validate-contracts:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Validate contracts
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
pip install data-contract-validator
contract-validator validate
Auto-generated when you run contract-validator init!
๐ง Configuration
Auto-Generated Config (.retl-validator.yml)
version: '1.0'
name: 'my-project-contracts'
source:
dbt:
project_path: '.'
auto_compile: true
target:
fastapi:
# For GitHub repos
type: "github"
repo: "my-org/my-api"
path: "app/models.py"
# For local files
# type: "local"
# path: "../my-api/models.py"
validation:
fail_on: ['missing_tables', 'missing_required_columns']
warn_on: ['type_mismatches', 'missing_optional_columns']
Command Line Options
contract-validator validate \
--dbt-project ./dbt-project \ # DBT project path
--fastapi-repo "org/repo" \ # GitHub repo
--fastapi-path "app/models.py" \ # Path to models
--github-token "$GITHUB_TOKEN" \ # For private repos
--output json # json, terminal, github
๐ Supported Frameworks
Data Sources โ
- DBT (all adapters: Snowflake, BigQuery, Redshift, etc.)
API Frameworks โ
- FastAPI (Pydantic + SQLModel)
Coming Soon ๐
- Django, Flask-SQLAlchemy
- Databricks, Airflow
- Request other frameworks
๐ฏ Output Formats
Terminal (Default)
๐ก๏ธ Data Contract Validation Results:
Status: โ
PASSED
Critical: 0 | Warnings: 5
โ ๏ธ Warnings:
user_analytics.age: Type mismatch (varchar vs integer)
user_analytics.country: Type mismatch (integer vs varchar)
๐ Your API contracts are protected!
JSON (for CI/CD)
{
"success": true,
"critical_issues": 0,
"warnings": 5,
"issues": [
{
"severity": "warning",
"table": "user_analytics",
"column": "age",
"message": "Type mismatch: source 'varchar' vs target 'integer'",
"suggested_fix": "Update target to expect 'varchar' or fix source type"
}
]
}
GitHub Actions
::warning::user_analytics.age: Type mismatch detected
โ
Contract validation passed - no critical issues
๐๏ธ Architecture
Simple Python API
from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor
# Initialize extractors
dbt = DBTExtractor(project_path='./dbt-project')
fastapi = FastAPIExtractor.from_github_repo('my-org/my-api', 'app/models.py')
# Run validation
validator = ContractValidator(source=dbt, target=fastapi)
result = validator.validate()
if not result.success:
print(f"โ {len(result.critical_issues)} critical issues found")
for issue in result.critical_issues:
print(f"๐ฅ {issue.table}.{issue.column}: {issue.message}")
CLI Interface
# Interactive setup
contract-validator init --interactive
# Test configuration
contract-validator test
# Run validation
contract-validator validate
# Setup pre-commit hooks
contract-validator setup-precommit --install-hooks
# Multiple output formats
contract-validator validate --output json
๐ Development Workflow
With Pre-commit (Recommended)
# Team workflow with automated validation
git clone your-dbt-project
cd your-dbt-project
# One-time setup for new team members
contract-validator init --interactive
contract-validator setup-precommit --install-hooks
# Protected development workflow:
# 1. Make changes to DBT models
# 2. git add models/my_model.sql
# 3. git commit -m "update model" # โ Validation runs here automatically
# 4. If validation passes โ commit succeeds
# 5. If validation fails โ fix issues first
# 6. git push # โ CI/CD validation as backup
Manual Workflow
# Traditional workflow
# 1. Make changes
# 2. contract-validator validate # Manual validation
# 3. git commit
# 4. git push
๐ค Contributing
We welcome contributions! This tool is actively used in production.
Development Setup
git clone https://github.com/OGsiji/data-contract-validator
cd data-contract-validator
pip install -e ".[dev]"
pytest
Adding New Extractors
from retl_validator.extractors import BaseExtractor
class MyFrameworkExtractor(BaseExtractor):
def extract_schemas(self) -> Dict[str, Schema]:
# Your implementation
return schemas
Reporting Issues
- ๐ Bugs: GitHub Issues
- ๐ก Features: GitHub Discussions
๐ Documentation
- Quick Start Guide - Get running in 2 minutes
- Configuration Reference - All config options
- GitHub Actions Setup - CI/CD integration
- Examples - Real-world usage
- Pre-commit Integration - Automated validation
๐ Real-World Usage
This tool is actively preventing production incidents in:
- Analytics pipelines with 50+ DBT models
- Microservices architectures with multiple APIs
- Data engineering teams using Snowflake, BigQuery, Redshift
- Cross-repository validation in large organizations
Proven to catch:
- โ Type mismatches (varchar vs integer)
- โ Missing columns (API expects columns DBT doesn't provide)
- โ Schema drift (gradual model changes)
- โ Breaking changes before they reach production
๐ก๏ธ Multiple Layers of Protection
- Pre-commit hooks: Immediate feedback (fastest)
- CI/CD validation: Team protection (backup)
- Manual validation: Development testing
- Configuration files: Team standards
This creates a comprehensive safety net for your data contracts.
๐ License
MIT License - see LICENSE for details.
๐ Support
- ๐ Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
- ๐ง Email: ogunniransiji@gmail.com
โญ Star the Project
If this tool helps you prevent production incidents, please โญ star the repository!
๐ก๏ธ Built by data engineers, for data engineers. Stop breaking production with data changes!
๐ Get Started Now
pip install data-contract-validator
contract-validator init --interactive
contract-validator setup-precommit --install-hooks
# 2 minutes to production protection with automated validation!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_contract_validator-1.0.1.tar.gz.
File metadata
- Download URL: data_contract_validator-1.0.1.tar.gz
- Upload date:
- Size: 27.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4cf0a62710928771d56eecc5459b0908867878651e7898410d1d914458b5164
|
|
| MD5 |
364215d837c1802451fcc94a39a25b00
|
|
| BLAKE2b-256 |
b8a4ae9caba7f1bdc9f0518433c8a34f2838ed475e79e382cb6a6a5d1dc5f5fa
|
File details
Details for the file data_contract_validator-1.0.1-py3-none-any.whl.
File metadata
- Download URL: data_contract_validator-1.0.1-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c23d975c9f1c684dbf02c983f5a183c70e08862fc9d7afca173102a926f4742e
|
|
| MD5 |
d81fc34b440d1b36531cf50d5f2c39ff
|
|
| BLAKE2b-256 |
475453f3f26f2c5c6585c7afa0a44f4440eb9e398faaea83a62c15195f064d04
|