Skip to main content

Prevent production API breaks by validating data contracts between DBT models and API frameworks

Project description

๐Ÿ›ก๏ธ Data Contract Validator

Prevent production API breaks by validating data contracts between your data pipelines and API frameworks

PyPI version Tests License: MIT

๐ŸŽฏ What This Solves

Ever deployed a DBT model change only to break your FastAPI in production? This tool prevents that by validating data contracts between your data pipelines and APIs before deployment.

DBT Models          Contract           FastAPI Models
(What data          Validator          (What APIs
 produces)          โ†•๏ธ VALIDATES โ†•๏ธ      expect)
     โ†“                   โ†“                   โ†“
   Schema              Finds              Schema
 Extraction          Mismatches         Extraction

โšก Quick Start

Installation

pip install data-contract-validator

Basic Usage

# Validate local DBT project against FastAPI models
contract-validator validate \
  --dbt-project ./my-dbt-project \
  --fastapi-models ./my-api/models.py

# Validate across repositories (perfect for microservices)
contract-validator validate \
  --dbt-project . \
  --fastapi-repo "my-org/my-api-repo" \
  --fastapi-path "app/models.py"

GitHub Actions Integration

# .github/workflows/validate-contracts.yml
name: Validate Data Contracts
on: [pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    
    - name: Install validator
      run: pip install data-contract-validator
    
    - name: Validate contracts
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      run: |
        contract-validator validate \
          --dbt-project . \
          --fastapi-repo "my-org/my-api" \
          --github-token "$GITHUB_TOKEN"

๐Ÿ” What It Validates

โŒ Critical Issues (Block Deployment)

  • Missing tables: API expects user_analytics but DBT doesn't provide it
  • Missing required columns: API requires total_revenue but DBT model doesn't have it

โš ๏ธ Warnings (Non-blocking)

  • Type mismatches: DBT provides varchar but API expects integer
  • Missing optional columns: API can handle missing optional fields

โ„น๏ธ Info (Good to Know)

  • Extra columns: DBT provides columns that API doesn't use

๐ŸŽฏ Real-World Example

Before (Production Breaks) ๐Ÿ’ฅ

-- DBT model changes
select
    user_id,
    email,
    -- total_orders,  โŒ REMOVED this column
    revenue
from users
# FastAPI model (unchanged)
class UserAnalytics(BaseModel):
    user_id: str
    email: str
    total_orders: int  # โŒ Still expects this!
    revenue: float

Result: API breaks in production ๐Ÿ’€

After (Caught by Validator) โœ…

โŒ VALIDATION FAILED
๐Ÿ’ฅ user_analytics.total_orders: FastAPI REQUIRES column but DBT removed it
๐Ÿ”ง Fix: Add 'total_orders' back to DBT model or update FastAPI model

Result: Issue caught in CI/CD, production safe! ๐Ÿ›ก๏ธ

๐Ÿš€ Supported Frameworks

Data Sources

  • โœ… DBT (dbt-core, all adapters)
  • ๐Ÿ”„ Databricks (coming soon)
  • ๐Ÿ”„ Airflow (coming soon)

API Frameworks

  • โœ… FastAPI (Pydantic + SQLModel)
  • ๐Ÿ”„ Django (coming soon)
  • ๐Ÿ”„ Flask-SQLAlchemy (coming soon)

Want to add support for your framework? See extending guide

๐Ÿ“ฆ Installation Options

Option 1: PyPI (Recommended)

pip install data-contract-validator

Option 2: From Source

git clone https://github.com/your-org/data-contract-validator
cd data-contract-validator
pip install -e .

Option 3: GitHub Actions Only

- name: Validate Contracts
  uses: your-org/data-contract-validator@v1
  with:
    dbt-project: '.'
    fastapi-repo: 'my-org/my-api'

๐Ÿ”ง Configuration

Command Line

contract-validator validate \
  --dbt-project ./dbt-project \           # DBT project path
  --fastapi-repo "org/repo" \             # GitHub repo
  --fastapi-path "app/models.py" \        # Path to models
  --github-token "$GITHUB_TOKEN" \        # For private repos
  --output json                           # Output format

Configuration File

# .contract-validator.yml
version: '1.0'
sources:
  dbt:
    project_path: './dbt-project'
    auto_update_schemas: true

targets:
  fastapi:
    repo: 'my-org/my-api'
    path: 'app/models.py'
    
validation:
  fail_on: ['missing_tables', 'missing_required_columns']
  warn_on: ['type_mismatches', 'missing_optional_columns']

๐Ÿ“Š Output Formats

Terminal (Default)

๐Ÿ” Contract Validation Results:

โŒ CRITICAL ISSUES:
  ๐Ÿ’ฅ user_analytics.total_revenue: FastAPI expects this column but DBT doesn't provide it
     ๐Ÿ”ง Fix: Add 'total_revenue' to your DBT model

โœ… VALIDATION PASSED (with warnings)

GitHub Actions

::error::user_analytics.total_revenue: Missing required column
::warning::user_analytics.age: Type mismatch (varchar vs integer)

JSON

{
  "success": false,
  "issues": [
    {
      "severity": "error",
      "table": "user_analytics", 
      "column": "total_revenue",
      "message": "FastAPI expects column but DBT doesn't provide it",
      "suggestion": "Add 'total_revenue' to your DBT model"
    }
  ]
}

๐Ÿ—๏ธ Architecture

# Simple, extensible architecture
from data_contract_validator import ContractValidator
from data_contract_validator.extractors import DBTExtractor, FastAPIExtractor

# Initialize extractors
dbt = DBTExtractor(project_path='./dbt-project')
fastapi = FastAPIExtractor(repo='my-org/my-api', path='app/models.py')

# Run validation
validator = ContractValidator(source=dbt, target=fastapi)
result = validator.validate()

if not result.success:
    print(f"โŒ {len(result.critical_issues)} critical issues found")
    for issue in result.critical_issues:
        print(f"๐Ÿ’ฅ {issue.table}.{issue.column}: {issue.message}")

๐Ÿค Contributing

We love contributions! See CONTRIBUTING.md for guidelines.

Quick Setup

git clone https://github.com/your-org/data-contract-validator
cd data-contract-validator
pip install -e ".[dev]"
pytest

Adding New Extractors

from data_contract_validator.extractors import BaseExtractor

class MyFrameworkExtractor(BaseExtractor):
    def extract_schemas(self) -> Dict[str, Schema]:
        # Your implementation
        return schemas

๐ŸŽ‰ Success Stories

"We prevented 15 production incidents in our first month using this tool. It's now required in all our data pipeline PRs."
โ€” Data Engineering Team, TechCorp

"Finally! A tool that validates the contract between our DBT models and FastAPI services. No more surprise 500 errors."
โ€” Platform Team, StartupCo

๐Ÿ“š Documentation

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ†˜ Support

โญ Star History

If this tool helps you prevent production incidents, please star the repo! โญ


Built with โค๏ธ by data engineers, for data engineers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_contract_validator-1.0.0.tar.gz (24.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_contract_validator-1.0.0-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file data_contract_validator-1.0.0.tar.gz.

File metadata

  • Download URL: data_contract_validator-1.0.0.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for data_contract_validator-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c107c8d04f9936468b4f751f0b5b402f46e4bc8fb9b0c8b1b932f74ae9e83a34
MD5 98d6bcd62181f85e32d513272cdaa7d9
BLAKE2b-256 61345fdda55ed5ad3cbc6d731039cb42e60223c11a1025c0f647d1f7c71121ba

See more details on using hashes here.

File details

Details for the file data_contract_validator-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_contract_validator-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dcca25302a0734484756f67af4539224feb0fc339a33718bf5439678df27be66
MD5 c2d4c60ecfca82e4705551e4560af661
BLAKE2b-256 e3885db7cd62a574fbba2591c48dac32d7882114e02a869e98d5177d2172abdd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page