Skip to main content

Adding pre-commit-fixes

Project description

๐Ÿ›ก๏ธ Data Contract Validator

Prevent production API breaks by validating data contracts between your data pipelines and API frameworks

PyPI version Tests License: MIT

๐ŸŽฏ What This Solves

Ever deployed a DBT model change only to break your FastAPI in production? This tool prevents that by validating data contracts between your data pipelines and APIs before deployment.

DBT Models          Contract           FastAPI Models
(What data          Validator          (What APIs
 produces)          โ†•๏ธ VALIDATES โ†•๏ธ      expect)
     โ†“                   โ†“                   โ†“
   Schema              Finds              Schema
 Extraction          Mismatches         Extraction

โšก Quick Start

Installation

pip install data-contract-validator

30-Second Setup

# 1. Initialize in your project
contract-validator init --interactive

# 2. Test setup
contract-validator test

# 3. Validate contracts
contract-validator validate

# 4. Commit and push - you're protected! ๐Ÿ›ก๏ธ

Basic Usage

# Validate local DBT project against FastAPI models
contract-validator validate \
  --dbt-project ./my-dbt-project \
  --fastapi-local ./my-api/models.py

# Validate across repositories (microservices)
contract-validator validate \
  --dbt-project . \
  --fastapi-repo "my-org/my-api-repo" \
  --fastapi-path "app/models.py"

๐Ÿ” Real Example: Production Validation

Actual output from a production analytics project:

$ contract-validator validate

๐Ÿ” Starting contract validation...
๐Ÿ“Š Extracting source schemas...
   โœ… Found 14 DBT models (user_analytics_summary: 54 columns)
๐ŸŽฏ Extracting target schemas...  
   โœ… Found 3 FastAPI models
๐Ÿ” Validating schema compatibility...

๐Ÿ›ก๏ธ Results:
โœ… PASSED - 0 critical issues (no production breaks!)
โš ๏ธ  42 warnings (type mismatches to review)

Issues caught:
โš ๏ธ  user_analytics_summary.age_years: source 'varchar' vs target 'integer'
โš ๏ธ  user_analytics_summary.is_verified: source 'varchar' vs target 'boolean'
โš ๏ธ  user_analytics_summary.user_created_at: source 'varchar' vs target 'timestamp'

๐ŸŽ‰ Your API contracts are protected!

๐Ÿšจ What It Prevents

Before Data Contract Validation:

-- Analytics team changes DBT model
select
    user_id,
    email,
    -- total_orders,  โŒ REMOVED this column
    revenue
from users
# API team's FastAPI model (unchanged)
class UserAnalytics(BaseModel):
    user_id: str
    email: str
    total_orders: int  # โŒ Still expects this!
    revenue: float

Result: ๐Ÿ’ฅ Production API breaks, angry customers, 2AM debugging

After Data Contract Validation:

$ git push

โŒ VALIDATION FAILED
๐Ÿ’ฅ user_analytics.total_orders: FastAPI REQUIRES column but DBT removed it
๐Ÿ”ง Fix: Add 'total_orders' back to DBT model or update FastAPI model

# Push blocked until fixed โœ‹

Result: ๐Ÿ›ก๏ธ Production protected, issues caught in CI/CD

๐Ÿ› ๏ธ Pre-commit Integration

Automatic Setup (Recommended)

# Initialize with pre-commit support
contract-validator init --interactive
contract-validator setup-precommit --install-hooks

# Now every commit validates contracts automatically! ๐Ÿ›ก๏ธ

Manual Setup

If you prefer manual setup:

  1. Install pre-commit:

    pip install pre-commit
    
  2. Add to .pre-commit-config.yaml:

    repos:
      - repo: https://github.com/OGsiji/data-contract-validator
        rev: v1.0.0
        hooks:
          - id: contract-validation
            name: Validate Data Contracts
            files: '^(.*models.*\.(sql|py)|\.retl-validator\.yml|dbt_project\.yml)$'
    
  3. Install hooks:

    pre-commit install
    

How It Works

$ git add models/user_analytics.sql
$ git commit -m "update user analytics model"

# Pre-commit automatically runs:
๐Ÿ” Validating Data Contracts...
โœ… Contract validation passed
[main abc1234] update user analytics model

On Validation Failure

$ git commit -m "remove important column"

๐Ÿ” Validating Data Contracts...
โŒ CRITICAL: user_analytics.total_revenue missing
๐Ÿ’ก Fix the issue before committing

# Commit blocked until fixed! ๐Ÿ›ก๏ธ

Skip Validation (Emergency Only)

# Only for emergencies!
git commit -m "emergency fix" --no-verify

Benefits of Pre-commit Integration

  • โœ… Catches issues before they reach CI/CD
  • โœ… Faster feedback loop (seconds, not minutes)
  • โœ… No broken commits in your git history
  • โœ… Team protection - everyone gets validation
  • โœ… Zero configuration after setup

๐Ÿ“ฆ GitHub Actions Integration

Add this to .github/workflows/validate-contracts.yml:

name: ๐Ÿ›ก๏ธ Data Contract Validation

on:
  pull_request:
    paths:
      - 'models/**/*.sql'
      - 'dbt_project.yml'
      - '**/*models*.py'

jobs:
  validate-contracts:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    
    - name: Validate contracts
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      run: |
        pip install data-contract-validator
        contract-validator validate

Auto-generated when you run contract-validator init!

๐Ÿ”ง Configuration

Auto-Generated Config (.retl-validator.yml)

version: '1.0'
name: 'my-project-contracts'

source:
  dbt:
    project_path: '.'
    auto_compile: true

target:
  fastapi:
    # For GitHub repos
    type: "github"
    repo: "my-org/my-api"
    path: "app/models.py"
    
    # For local files
    # type: "local"
    # path: "../my-api/models.py"

validation:
  fail_on: ['missing_tables', 'missing_required_columns']
  warn_on: ['type_mismatches', 'missing_optional_columns']

Command Line Options

contract-validator validate \
  --dbt-project ./dbt-project \           # DBT project path
  --fastapi-repo "org/repo" \             # GitHub repo
  --fastapi-path "app/models.py" \        # Path to models
  --github-token "$GITHUB_TOKEN" \        # For private repos
  --output json                           # json, terminal, github

๐Ÿš€ Supported Frameworks

Data Sources โœ…

  • DBT (all adapters: Snowflake, BigQuery, Redshift, etc.)

API Frameworks โœ…

  • FastAPI (Pydantic + SQLModel)

Coming Soon ๐Ÿ”„

๐ŸŽฏ Output Formats

Terminal (Default)

๐Ÿ›ก๏ธ Data Contract Validation Results:
Status: โœ… PASSED
Critical: 0 | Warnings: 5

โš ๏ธ  Warnings:
  user_analytics.age: Type mismatch (varchar vs integer)
  user_analytics.country: Type mismatch (integer vs varchar)

๐ŸŽ‰ Your API contracts are protected!

JSON (for CI/CD)

{
  "success": true,
  "critical_issues": 0,
  "warnings": 5,
  "issues": [
    {
      "severity": "warning",
      "table": "user_analytics", 
      "column": "age",
      "message": "Type mismatch: source 'varchar' vs target 'integer'",
      "suggested_fix": "Update target to expect 'varchar' or fix source type"
    }
  ]
}

GitHub Actions

::warning::user_analytics.age: Type mismatch detected
โœ… Contract validation passed - no critical issues

๐Ÿ—๏ธ Architecture

Simple Python API

from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor

# Initialize extractors
dbt = DBTExtractor(project_path='./dbt-project')
fastapi = FastAPIExtractor.from_github_repo('my-org/my-api', 'app/models.py')

# Run validation
validator = ContractValidator(source=dbt, target=fastapi)
result = validator.validate()

if not result.success:
    print(f"โŒ {len(result.critical_issues)} critical issues found")
    for issue in result.critical_issues:
        print(f"๐Ÿ’ฅ {issue.table}.{issue.column}: {issue.message}")

CLI Interface

# Interactive setup
contract-validator init --interactive

# Test configuration
contract-validator test

# Run validation
contract-validator validate

# Setup pre-commit hooks
contract-validator setup-precommit --install-hooks

# Multiple output formats
contract-validator validate --output json

๐Ÿ”„ Development Workflow

With Pre-commit (Recommended)

# Team workflow with automated validation
git clone your-dbt-project
cd your-dbt-project

# One-time setup for new team members
contract-validator init --interactive
contract-validator setup-precommit --install-hooks

# Protected development workflow:
# 1. Make changes to DBT models
# 2. git add models/my_model.sql
# 3. git commit -m "update model"  # โ† Validation runs here automatically
# 4. If validation passes โ†’ commit succeeds
# 5. If validation fails โ†’ fix issues first
# 6. git push  # โ† CI/CD validation as backup

Manual Workflow

# Traditional workflow
# 1. Make changes
# 2. contract-validator validate  # Manual validation
# 3. git commit
# 4. git push

๐Ÿค Contributing

We welcome contributions! This tool is actively used in production.

Development Setup

git clone https://github.com/OGsiji/data-contract-validator
cd data-contract-validator
pip install -e ".[dev]"
pytest

Adding New Extractors

from retl_validator.extractors import BaseExtractor

class MyFrameworkExtractor(BaseExtractor):
    def extract_schemas(self) -> Dict[str, Schema]:
        # Your implementation
        return schemas

Reporting Issues

๐Ÿ“š Documentation

๐ŸŽ‰ Real-World Usage

This tool is actively preventing production incidents in:

  • Analytics pipelines with 50+ DBT models
  • Microservices architectures with multiple APIs
  • Data engineering teams using Snowflake, BigQuery, Redshift
  • Cross-repository validation in large organizations

Proven to catch:

  • โœ… Type mismatches (varchar vs integer)
  • โœ… Missing columns (API expects columns DBT doesn't provide)
  • โœ… Schema drift (gradual model changes)
  • โœ… Breaking changes before they reach production

๐Ÿ›ก๏ธ Multiple Layers of Protection

  1. Pre-commit hooks: Immediate feedback (fastest)
  2. CI/CD validation: Team protection (backup)
  3. Manual validation: Development testing
  4. Configuration files: Team standards

This creates a comprehensive safety net for your data contracts.

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿ†˜ Support

โญ Star the Project

If this tool helps you prevent production incidents, please โญ star the repository!


๐Ÿ›ก๏ธ Built by data engineers, for data engineers. Stop breaking production with data changes!

๐Ÿš€ Get Started Now

pip install data-contract-validator
contract-validator init --interactive
contract-validator setup-precommit --install-hooks
# 2 minutes to production protection with automated validation!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_contract_validator-1.0.2.dev1.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_contract_validator-1.0.2.dev1-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file data_contract_validator-1.0.2.dev1.tar.gz.

File metadata

File hashes

Hashes for data_contract_validator-1.0.2.dev1.tar.gz
Algorithm Hash digest
SHA256 ab4206cb31e69374fb9bbd8ac33e9991795d5c1791e8cbf95120438fbe431448
MD5 c17589004554e544e1b71996e6a7caf5
BLAKE2b-256 87c49f145456d8343a1ccb5fee4a539cba6953c837c9c76a1057614abfb6bed6

See more details on using hashes here.

File details

Details for the file data_contract_validator-1.0.2.dev1-py3-none-any.whl.

File metadata

File hashes

Hashes for data_contract_validator-1.0.2.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 51772a114e73fd94cf7aa1ceab27d6a7432d8032596964832f71add81826073d
MD5 6a7df9fb40760267079908efa5293d55
BLAKE2b-256 6c6dfe9fd79e9c9187f7f604b689965f992ae6799a068c826d12314310273d06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page