Skip to main content

Adding pre-commit-fixes

Project description

๐Ÿ›ก๏ธ Data Contract Validator

Prevent production API breaks by validating data contracts between your data pipelines and API frameworks

PyPI version Tests License: MIT

๐ŸŽฏ What This Solves

Ever deployed a DBT model change only to break your FastAPI in production? This tool prevents that by validating data contracts between your data pipelines and APIs before deployment.

DBT Models          Contract           FastAPI Models
(What data          Validator          (What APIs
 produces)          โ†•๏ธ VALIDATES โ†•๏ธ      expect)
     โ†“                   โ†“                   โ†“
   Schema              Finds              Schema
 Extraction          Mismatches         Extraction

โšก Quick Start

Installation

pip install data-contract-validator

30-Second Setup

# 1. Initialize in your project
contract-validator init --interactive

# 2. Test setup
contract-validator test

# 3. Validate contracts
contract-validator validate

# 4. Commit and push - you're protected! ๐Ÿ›ก๏ธ

Basic Usage

# Validate local DBT project against FastAPI models
contract-validator validate \
  --dbt-project ./my-dbt-project \
  --fastapi-local ./my-api/models.py

# Validate across repositories (microservices)
contract-validator validate \
  --dbt-project . \
  --fastapi-repo "my-org/my-api-repo" \
  --fastapi-path "app/models.py"

๐Ÿ” Real Example: Production Validation

Actual output from a production analytics project:

$ contract-validator validate

๐Ÿ” Starting contract validation...
๐Ÿ“Š Extracting source schemas...
   โœ… Found 14 DBT models (user_analytics_summary: 54 columns)
๐ŸŽฏ Extracting target schemas...  
   โœ… Found 3 FastAPI models
๐Ÿ” Validating schema compatibility...

๐Ÿ›ก๏ธ Results:
โœ… PASSED - 0 critical issues (no production breaks!)
โš ๏ธ  42 warnings (type mismatches to review)

Issues caught:
โš ๏ธ  user_analytics_summary.age_years: source 'varchar' vs target 'integer'
โš ๏ธ  user_analytics_summary.is_verified: source 'varchar' vs target 'boolean'
โš ๏ธ  user_analytics_summary.user_created_at: source 'varchar' vs target 'timestamp'

๐ŸŽ‰ Your API contracts are protected!

๐Ÿšจ What It Prevents

Before Data Contract Validation:

-- Analytics team changes DBT model
select
    user_id,
    email,
    -- total_orders,  โŒ REMOVED this column
    revenue
from users
# API team's FastAPI model (unchanged)
class UserAnalytics(BaseModel):
    user_id: str
    email: str
    total_orders: int  # โŒ Still expects this!
    revenue: float

Result: ๐Ÿ’ฅ Production API breaks, angry customers, 2AM debugging

After Data Contract Validation:

$ git push

โŒ VALIDATION FAILED
๐Ÿ’ฅ user_analytics.total_orders: FastAPI REQUIRES column but DBT removed it
๐Ÿ”ง Fix: Add 'total_orders' back to DBT model or update FastAPI model

# Push blocked until fixed โœ‹

Result: ๐Ÿ›ก๏ธ Production protected, issues caught in CI/CD

๐Ÿ› ๏ธ Pre-commit Integration

Automatic Setup (Recommended)

# Initialize with pre-commit support
contract-validator init --interactive
contract-validator setup-precommit --install-hooks

# Now every commit validates contracts automatically! ๐Ÿ›ก๏ธ

Manual Setup

If you prefer manual setup:

  1. Install pre-commit:

    pip install pre-commit
    
  2. Add to .pre-commit-config.yaml:

    repos:
      - repo: https://github.com/OGsiji/data-contract-validator
        rev: v1.0.0
        hooks:
          - id: contract-validation
            name: Validate Data Contracts
            files: '^(.*models.*\.(sql|py)|\.retl-validator\.yml|dbt_project\.yml)$'
    
  3. Install hooks:

    pre-commit install
    

How It Works

$ git add models/user_analytics.sql
$ git commit -m "update user analytics model"

# Pre-commit automatically runs:
๐Ÿ” Validating Data Contracts...
โœ… Contract validation passed
[main abc1234] update user analytics model

On Validation Failure

$ git commit -m "remove important column"

๐Ÿ” Validating Data Contracts...
โŒ CRITICAL: user_analytics.total_revenue missing
๐Ÿ’ก Fix the issue before committing

# Commit blocked until fixed! ๐Ÿ›ก๏ธ

Skip Validation (Emergency Only)

# Only for emergencies!
git commit -m "emergency fix" --no-verify

Benefits of Pre-commit Integration

  • โœ… Catches issues before they reach CI/CD
  • โœ… Faster feedback loop (seconds, not minutes)
  • โœ… No broken commits in your git history
  • โœ… Team protection - everyone gets validation
  • โœ… Zero configuration after setup

๐Ÿ“ฆ GitHub Actions Integration

Add this to .github/workflows/validate-contracts.yml:

name: ๐Ÿ›ก๏ธ Data Contract Validation

on:
  pull_request:
    paths:
      - 'models/**/*.sql'
      - 'dbt_project.yml'
      - '**/*models*.py'

jobs:
  validate-contracts:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    
    - name: Validate contracts
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      run: |
        pip install data-contract-validator
        contract-validator validate

Auto-generated when you run contract-validator init!

๐Ÿ”ง Configuration

Auto-Generated Config (.retl-validator.yml)

version: '1.0'
name: 'my-project-contracts'

source:
  dbt:
    project_path: '.'
    auto_compile: true

target:
  fastapi:
    # For GitHub repos
    type: "github"
    repo: "my-org/my-api"
    path: "app/models.py"
    
    # For local files
    # type: "local"
    # path: "../my-api/models.py"

validation:
  fail_on: ['missing_tables', 'missing_required_columns']
  warn_on: ['type_mismatches', 'missing_optional_columns']

Command Line Options

contract-validator validate \
  --dbt-project ./dbt-project \           # DBT project path
  --fastapi-repo "org/repo" \             # GitHub repo
  --fastapi-path "app/models.py" \        # Path to models
  --github-token "$GITHUB_TOKEN" \        # For private repos
  --output json                           # json, terminal, github

๐Ÿš€ Supported Frameworks

Data Sources โœ…

  • DBT (all adapters: Snowflake, BigQuery, Redshift, etc.)

API Frameworks โœ…

  • FastAPI (Pydantic + SQLModel)

Coming Soon ๐Ÿ”„

๐ŸŽฏ Output Formats

Terminal (Default)

๐Ÿ›ก๏ธ Data Contract Validation Results:
Status: โœ… PASSED
Critical: 0 | Warnings: 5

โš ๏ธ  Warnings:
  user_analytics.age: Type mismatch (varchar vs integer)
  user_analytics.country: Type mismatch (integer vs varchar)

๐ŸŽ‰ Your API contracts are protected!

JSON (for CI/CD)

{
  "success": true,
  "critical_issues": 0,
  "warnings": 5,
  "issues": [
    {
      "severity": "warning",
      "table": "user_analytics", 
      "column": "age",
      "message": "Type mismatch: source 'varchar' vs target 'integer'",
      "suggested_fix": "Update target to expect 'varchar' or fix source type"
    }
  ]
}

GitHub Actions

::warning::user_analytics.age: Type mismatch detected
โœ… Contract validation passed - no critical issues

๐Ÿ—๏ธ Architecture

Simple Python API

from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor

# Initialize extractors
dbt = DBTExtractor(project_path='./dbt-project')
fastapi = FastAPIExtractor.from_github_repo('my-org/my-api', 'app/models.py')

# Run validation
validator = ContractValidator(source=dbt, target=fastapi)
result = validator.validate()

if not result.success:
    print(f"โŒ {len(result.critical_issues)} critical issues found")
    for issue in result.critical_issues:
        print(f"๐Ÿ’ฅ {issue.table}.{issue.column}: {issue.message}")

CLI Interface

# Interactive setup
contract-validator init --interactive

# Test configuration
contract-validator test

# Run validation
contract-validator validate

# Setup pre-commit hooks
contract-validator setup-precommit --install-hooks

# Multiple output formats
contract-validator validate --output json

๐Ÿ”„ Development Workflow

With Pre-commit (Recommended)

# Team workflow with automated validation
git clone your-dbt-project
cd your-dbt-project

# One-time setup for new team members
contract-validator init --interactive
contract-validator setup-precommit --install-hooks

# Protected development workflow:
# 1. Make changes to DBT models
# 2. git add models/my_model.sql
# 3. git commit -m "update model"  # โ† Validation runs here automatically
# 4. If validation passes โ†’ commit succeeds
# 5. If validation fails โ†’ fix issues first
# 6. git push  # โ† CI/CD validation as backup

Manual Workflow

# Traditional workflow
# 1. Make changes
# 2. contract-validator validate  # Manual validation
# 3. git commit
# 4. git push

๐Ÿค Contributing

We welcome contributions! This tool is actively used in production.

Development Setup

git clone https://github.com/OGsiji/data-contract-validator
cd data-contract-validator
pip install -e ".[dev]"
pytest

Adding New Extractors

from retl_validator.extractors import BaseExtractor

class MyFrameworkExtractor(BaseExtractor):
    def extract_schemas(self) -> Dict[str, Schema]:
        # Your implementation
        return schemas

Reporting Issues

๐Ÿ“š Documentation

๐ŸŽ‰ Real-World Usage

This tool is actively preventing production incidents in:

  • Analytics pipelines with 50+ DBT models
  • Microservices architectures with multiple APIs
  • Data engineering teams using Snowflake, BigQuery, Redshift
  • Cross-repository validation in large organizations

Proven to catch:

  • โœ… Type mismatches (varchar vs integer)
  • โœ… Missing columns (API expects columns DBT doesn't provide)
  • โœ… Schema drift (gradual model changes)
  • โœ… Breaking changes before they reach production

๐Ÿ›ก๏ธ Multiple Layers of Protection

  1. Pre-commit hooks: Immediate feedback (fastest)
  2. CI/CD validation: Team protection (backup)
  3. Manual validation: Development testing
  4. Configuration files: Team standards

This creates a comprehensive safety net for your data contracts.

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿ†˜ Support

โญ Star the Project

If this tool helps you prevent production incidents, please โญ star the repository!


๐Ÿ›ก๏ธ Built by data engineers, for data engineers. Stop breaking production with data changes!

๐Ÿš€ Get Started Now

pip install data-contract-validator
contract-validator init --interactive
contract-validator setup-precommit --install-hooks
# 2 minutes to production protection with automated validation!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_contract_validator-1.0.4a0.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_contract_validator-1.0.4a0-py3-none-any.whl (28.2 kB view details)

Uploaded Python 3

File details

Details for the file data_contract_validator-1.0.4a0.tar.gz.

File metadata

File hashes

Hashes for data_contract_validator-1.0.4a0.tar.gz
Algorithm Hash digest
SHA256 a45798ff626cf6ed67ec953a2d1fbeb57bd0700ac53c7f66287789ae5d74706a
MD5 db29889589faf1c8609d847d63ff20b1
BLAKE2b-256 930744d1bf36dc3ae03c7625cbe656d5b8baefdb62cffcd5e7ffe4b9e071f6e2

See more details on using hashes here.

File details

Details for the file data_contract_validator-1.0.4a0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_contract_validator-1.0.4a0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b7e88af13ec7ba5c1115021892bef6925fb3f5c2fcee748e98920f12b76c1b2
MD5 21122b6e6886575e9464eeb2a8d9302a
BLAKE2b-256 0710264e9844e7da590a93262e20d89635e615ddcab682a2a006925153db6688

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page