Skip to main content

Lightweight schema drift detection and data contract enforcement tool

Project description

Schema Drift Guard

Schema Drift Guard is a lightweight CLI tool to detect schema drift and enforce data contracts in data pipelines.

It helps data teams detect unexpected schema changes before they break pipelines, dashboards, or downstream systems.

The tool can automatically:

  • Detect schema drift
  • Detect column type changes
  • Profile dataset columns
  • Suggest schema tests
  • Update YAML schema files
  • Maintain schema version history
  • Enforce checks in CI/CD pipelines

Installation

Install from PyPI:

pip install schema-drift-guard

Or install locally for development:

git clone https://github.com/smohsin46/schema-drift-guard.git
cd schema-drift-guard
pip install -e .

Quick Start

Run a schema check against a dataset and schema definition.

schema-guard check \
  --source-type csv \
  --source data/orders.csv \
  --schema schemas/orders.yml

Example output:

⚠️ Schema drift detected

New columns detected:
  + discount

Updating schema YAML...

➕ Adding column to schema: discount

Supported Data Sources

Current connectors:

Source Type Description
csv Local CSV files
snowflake Snowflake warehouse tables

Example Snowflake command:

schema-guard check \
  --source-type snowflake \
  --source orders \
  --schema schemas/orders.yml \
  --account <account> \
  --user <user> \
  --password <password> \
  --warehouse <warehouse> \
  --database <database> \
  --schema-name <schema>

Schema YAML Format

Example schema definition:

columns:
  - name: order_id
    type: integer

  - name: user_id
    type: integer

  - name: price
    type: float

  - name: created_at
    type: string

When new columns are detected, the tool can automatically update the schema.


Column Profiling

The tool profiles dataset columns and reports statistics such as:

  • null percentage
  • distinct count
  • minimum values
  • maximum values

Example:

Column: price
  null_percent: 0.0
  distinct_count: 152
  min: 2.5
  max: 500.0

Automatic Test Generation

Schema Drift Guard can generate useful tests automatically.

Examples:

Column Name Generated Tests
id not_null, unique
email not_null
price not_null, accepted_range

Example generated YAML:

- name: price
  tests:
    - not_null
    - accepted_range:
        min: 0
        max: 500

CI/CD Pipeline Enforcement

Schema Drift Guard can fail pipelines when drift is detected.

schema-guard check \
  --source-type csv \
  --source data/orders.csv \
  --schema schemas/orders.yml \
  --fail-on-drift

If drift is detected:

❌ Schema drift detected. Failing pipeline.

This allows teams to enforce data contracts in automated workflows.


Example GitHub Actions Workflow

name: Schema Check

on: [pull_request]

jobs:
  schema_guard:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Install tool
        run: pip install schema-drift-guard

      - name: Run schema check
        run: |
          schema-guard check \
            --source-type csv \
            --source data/orders.csv \
            --schema schemas/orders.yml \
            --fail-on-drift

Features

✔ Schema drift detection ✔ Column type drift detection ✔ Automatic schema updates ✔ Column profiling ✔ Automatic range test generation ✔ Schema version history ✔ Pluggable connectors ✔ Installable CLI tool ✔ CI/CD pipeline enforcement ✔ Snowflake warehouse support


Project Structure

schema-drift-guard

cli/
connectors/
core/
detectors/
generators/
schemas/
tests/

README.md
pyproject.toml

Roadmap

Future improvements:

  • BigQuery connector
  • Postgres connector
  • dbt project integration
  • Metadata-only warehouse scanning
  • AI-assisted schema suggestions

Contributing

Contributions are welcome.

To contribute:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

License

MIT License


Author

Mohsin Shaikh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schema_drift_guard-0.2.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

schema_drift_guard-0.2.0-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file schema_drift_guard-0.2.0.tar.gz.

File metadata

  • Download URL: schema_drift_guard-0.2.0.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for schema_drift_guard-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c989ff2cc340bfca2f0656091d2fa5cea0fbb6efc0e447187f9af66ac35c9f64
MD5 75260636618f0e19eb2d40c2f91ccd39
BLAKE2b-256 38009fb07a5545097ac0b7882cf13fbe6e0c67de48d8f28642c0ab0318dcfbf7

See more details on using hashes here.

File details

Details for the file schema_drift_guard-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for schema_drift_guard-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 442e9e790aa3b644b359a36ae6383a21b6078322088e3ca62cf2de5233ac119a
MD5 101480fdec6766d652335ca34a5e0861
BLAKE2b-256 1486c22ff6c656cf515ac81b95742ba6f90874da22ae6f2bd36fcc3fcf82f2d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page