Skip to main content

Critical bug fixes: DBT extractor return paths, function signatures, error handling, and GitHub API rate limiting

Project description

๐Ÿ›ก๏ธ Data Contract Validator

Prevent production API breaks by validating data contracts between your data pipelines and API frameworks

PyPI version Tests License: MIT

๐ŸŽฏ What This Solves

Ever deployed a DBT model change only to break your FastAPI in production? This tool prevents that by validating data contracts between your data pipelines and APIs before deployment.

DBT Models          Contract           FastAPI Models
(What data          Validator          (What APIs
 produces)          โ†•๏ธ VALIDATES โ†•๏ธ      expect)
     โ†“                   โ†“                   โ†“
   Schema              Finds              Schema
 Extraction          Mismatches         Extraction

โšก Quick Start

Installation

pip install data-contract-validator

30-Second Setup

# 1. Initialize in your project
contract-validator init --interactive

# 2. Test setup
contract-validator test

# 3. Validate contracts
contract-validator validate

# 4. Commit and push - you're protected! ๐Ÿ›ก๏ธ

Basic Usage

# Validate local DBT project against FastAPI models
contract-validator validate \
  --dbt-project ./my-dbt-project \
  --fastapi-local ./my-api/models.py

# Validate across repositories (microservices)
contract-validator validate \
  --dbt-project . \
  --fastapi-repo "my-org/my-api-repo" \
  --fastapi-path "app/models.py"

๐Ÿ” Real Example: Production Validation

Actual output from a production analytics project:

$ contract-validator validate

๐Ÿ” Starting contract validation...
๐Ÿ“Š Extracting source schemas...
   โœ… Found 14 DBT models (user_analytics_summary: 54 columns)
๐ŸŽฏ Extracting target schemas...  
   โœ… Found 3 FastAPI models
๐Ÿ” Validating schema compatibility...

๐Ÿ›ก๏ธ Results:
โœ… PASSED - 0 critical issues (no production breaks!)
โš ๏ธ  42 warnings (type mismatches to review)

Issues caught:
โš ๏ธ  user_analytics_summary.age_years: source 'varchar' vs target 'integer'
โš ๏ธ  user_analytics_summary.is_verified: source 'varchar' vs target 'boolean'
โš ๏ธ  user_analytics_summary.user_created_at: source 'varchar' vs target 'timestamp'

๐ŸŽ‰ Your API contracts are protected!

๐Ÿšจ What It Prevents

Before Data Contract Validation:

-- Analytics team changes DBT model
select
    user_id,
    email,
    -- total_orders,  โŒ REMOVED this column
    revenue
from users
# API team's FastAPI model (unchanged)
class UserAnalytics(BaseModel):
    user_id: str
    email: str
    total_orders: int  # โŒ Still expects this!
    revenue: float

Result: ๐Ÿ’ฅ Production API breaks, angry customers, 2AM debugging

After Data Contract Validation:

$ git push

โŒ VALIDATION FAILED
๐Ÿ’ฅ user_analytics.total_orders: FastAPI REQUIRES column but DBT removed it
๐Ÿ”ง Fix: Add 'total_orders' back to DBT model or update FastAPI model

# Push blocked until fixed โœ‹

Result: ๐Ÿ›ก๏ธ Production protected, issues caught in CI/CD

๐Ÿ› ๏ธ Pre-commit Integration

Automatic Setup (Recommended)

# Initialize with pre-commit support
contract-validator init --interactive
contract-validator setup-precommit --install-hooks

# Now every commit validates contracts automatically! ๐Ÿ›ก๏ธ

Manual Setup

If you prefer manual setup:

  1. Install pre-commit:

    pip install pre-commit
    
  2. Add to .pre-commit-config.yaml:

    repos:
      - repo: https://github.com/OGsiji/data-contract-validator
        rev: v1.0.0
        hooks:
          - id: contract-validation
            name: Validate Data Contracts
            files: '^(.*models.*\.(sql|py)|\.retl-validator\.yml|dbt_project\.yml)$'
    
  3. Install hooks:

    pre-commit install
    

How It Works

$ git add models/user_analytics.sql
$ git commit -m "update user analytics model"

# Pre-commit automatically runs:
๐Ÿ” Validating Data Contracts...
โœ… Contract validation passed
[main abc1234] update user analytics model

On Validation Failure

$ git commit -m "remove important column"

๐Ÿ” Validating Data Contracts...
โŒ CRITICAL: user_analytics.total_revenue missing
๐Ÿ’ก Fix the issue before committing

# Commit blocked until fixed! ๐Ÿ›ก๏ธ

Skip Validation (Emergency Only)

# Only for emergencies!
git commit -m "emergency fix" --no-verify

Benefits of Pre-commit Integration

  • โœ… Catches issues before they reach CI/CD
  • โœ… Faster feedback loop (seconds, not minutes)
  • โœ… No broken commits in your git history
  • โœ… Team protection - everyone gets validation
  • โœ… Zero configuration after setup

๐Ÿ“ฆ GitHub Actions Integration

Add this to .github/workflows/validate-contracts.yml:

name: ๐Ÿ›ก๏ธ Data Contract Validation

on:
  pull_request:
    paths:
      - 'models/**/*.sql'
      - 'dbt_project.yml'
      - '**/*models*.py'

jobs:
  validate-contracts:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    
    - name: Validate contracts
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      run: |
        pip install data-contract-validator
        contract-validator validate

Auto-generated when you run contract-validator init!

๐Ÿ”ง Configuration

Auto-Generated Config (.retl-validator.yml)

version: '1.0'
name: 'my-project-contracts'

source:
  dbt:
    project_path: '.'
    auto_compile: true

target:
  fastapi:
    # For GitHub repos
    type: "github"
    repo: "my-org/my-api"
    path: "app/models.py"
    
    # For local files
    # type: "local"
    # path: "../my-api/models.py"

validation:
  fail_on: ['missing_tables', 'missing_required_columns']
  warn_on: ['type_mismatches', 'missing_optional_columns']

Command Line Options

contract-validator validate \
  --dbt-project ./dbt-project \           # DBT project path
  --fastapi-repo "org/repo" \             # GitHub repo
  --fastapi-path "app/models.py" \        # Path to models
  --github-token "$GITHUB_TOKEN" \        # For private repos
  --output json                           # json, terminal, github

๐Ÿš€ Supported Frameworks

Data Sources โœ…

  • DBT (all adapters: Snowflake, BigQuery, Redshift, etc.)

API Frameworks โœ…

  • FastAPI (Pydantic + SQLModel)

Coming Soon ๐Ÿ”„

๐ŸŽฏ Output Formats

Terminal (Default)

๐Ÿ›ก๏ธ Data Contract Validation Results:
Status: โœ… PASSED
Critical: 0 | Warnings: 5

โš ๏ธ  Warnings:
  user_analytics.age: Type mismatch (varchar vs integer)
  user_analytics.country: Type mismatch (integer vs varchar)

๐ŸŽ‰ Your API contracts are protected!

JSON (for CI/CD)

{
  "success": true,
  "critical_issues": 0,
  "warnings": 5,
  "issues": [
    {
      "severity": "warning",
      "table": "user_analytics", 
      "column": "age",
      "message": "Type mismatch: source 'varchar' vs target 'integer'",
      "suggested_fix": "Update target to expect 'varchar' or fix source type"
    }
  ]
}

GitHub Actions

::warning::user_analytics.age: Type mismatch detected
โœ… Contract validation passed - no critical issues

๐Ÿ—๏ธ Architecture

Simple Python API

from data_contract_validator import ContractValidator, DBTExtractor, FastAPIExtractor

# Initialize extractors
dbt = DBTExtractor(project_path='./dbt-project')
fastapi = FastAPIExtractor.from_github_repo('my-org/my-api', 'app/models.py')

# Run validation
validator = ContractValidator(source=dbt, target=fastapi)
result = validator.validate()

if not result.success:
    print(f"โŒ {len(result.critical_issues)} critical issues found")
    for issue in result.critical_issues:
        print(f"๐Ÿ’ฅ {issue.table}.{issue.column}: {issue.message}")

CLI Interface

# Interactive setup
contract-validator init --interactive

# Test configuration
contract-validator test

# Run validation
contract-validator validate

# Setup pre-commit hooks
contract-validator setup-precommit --install-hooks

# Multiple output formats
contract-validator validate --output json

๐Ÿ”„ Development Workflow

With Pre-commit (Recommended)

# Team workflow with automated validation
git clone your-dbt-project
cd your-dbt-project

# One-time setup for new team members
contract-validator init --interactive
contract-validator setup-precommit --install-hooks

# Protected development workflow:
# 1. Make changes to DBT models
# 2. git add models/my_model.sql
# 3. git commit -m "update model"  # โ† Validation runs here automatically
# 4. If validation passes โ†’ commit succeeds
# 5. If validation fails โ†’ fix issues first
# 6. git push  # โ† CI/CD validation as backup

Manual Workflow

# Traditional workflow
# 1. Make changes
# 2. contract-validator validate  # Manual validation
# 3. git commit
# 4. git push

๐Ÿค Contributing

We welcome contributions! This tool is actively used in production.

Development Setup

git clone https://github.com/OGsiji/data-contract-validator
cd data-contract-validator
pip install -e ".[dev]"
pytest

Adding New Extractors

from retl_validator.extractors import BaseExtractor

class MyFrameworkExtractor(BaseExtractor):
    def extract_schemas(self) -> Dict[str, Schema]:
        # Your implementation
        return schemas

Reporting Issues

๐Ÿ“š Documentation

๐ŸŽ‰ Real-World Usage

This tool is actively preventing production incidents in:

  • Analytics pipelines with 50+ DBT models
  • Microservices architectures with multiple APIs
  • Data engineering teams using Snowflake, BigQuery, Redshift
  • Cross-repository validation in large organizations

Proven to catch:

  • โœ… Type mismatches (varchar vs integer)
  • โœ… Missing columns (API expects columns DBT doesn't provide)
  • โœ… Schema drift (gradual model changes)
  • โœ… Breaking changes before they reach production

๐Ÿ›ก๏ธ Multiple Layers of Protection

  1. Pre-commit hooks: Immediate feedback (fastest)
  2. CI/CD validation: Team protection (backup)
  3. Manual validation: Development testing
  4. Configuration files: Team standards

This creates a comprehensive safety net for your data contracts.

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿ†˜ Support

โญ Star the Project

If this tool helps you prevent production incidents, please โญ star the repository!


๐Ÿ›ก๏ธ Built by data engineers, for data engineers. Stop breaking production with data changes!

๐Ÿš€ Get Started Now

pip install data-contract-validator
contract-validator init --interactive
contract-validator setup-precommit --install-hooks
# 2 minutes to production protection with automated validation!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_contract_validator-1.0.5.tar.gz (31.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_contract_validator-1.0.5-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file data_contract_validator-1.0.5.tar.gz.

File metadata

  • Download URL: data_contract_validator-1.0.5.tar.gz
  • Upload date:
  • Size: 31.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for data_contract_validator-1.0.5.tar.gz
Algorithm Hash digest
SHA256 1dd5d22548b24dd5e54efb1a7ab079d86aee81243c0410f5ee990ece6bcdce40
MD5 6353b5d1f498e9d84d3da998bcd74e68
BLAKE2b-256 b5e9ca2bfb83880c196b3222df1feeafecfe188e92a24a67644dae7d85143795

See more details on using hashes here.

File details

Details for the file data_contract_validator-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for data_contract_validator-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 71058ec054eedc21bd8a54f40c7f616234ba6f3e3fc63eceea5572bef6c78c0f
MD5 b5864eb7d352673d90e0a0f08198e797
BLAKE2b-256 5a1bdd635c405485ee5363aab0ed1e2d2f1c64fa39f31be102f36eaa7519450b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page