Lightweight schema drift detection and data contract enforcement tool
Project description
Schema Drift Guard
Schema Drift Guard is a lightweight CLI tool to detect schema drift and enforce data contracts in data pipelines.
It helps data teams detect unexpected schema changes before they break pipelines, dashboards, or downstream systems.
The tool can automatically:
- Detect schema drift
- Detect column type changes
- Profile dataset columns
- Suggest schema tests
- Update YAML schema files
- Maintain schema version history
- Enforce checks in CI/CD pipelines
Installation
Install from PyPI:
pip install schema-drift-guard
Or install locally for development:
git clone https://github.com/smohsin46/schema-drift-guard.git
cd schema-drift-guard
pip install -e .
Quick Start
Run a schema check against a dataset and schema definition.
schema-guard check \
--source-type csv \
--source data/orders.csv \
--schema schemas/orders.yml
Example output:
⚠️ Schema drift detected
New columns detected:
+ discount
Updating schema YAML...
➕ Adding column to schema: discount
Supported Data Sources
Current connectors:
| Source Type | Description |
|---|---|
| csv | Local CSV files |
| snowflake | Snowflake warehouse tables |
Example Snowflake command:
schema-guard check \
--source-type snowflake \
--source orders \
--schema schemas/orders.yml \
--account <account> \
--user <user> \
--password <password> \
--warehouse <warehouse> \
--database <database> \
--schema-name <schema>
Schema YAML Format
Example schema definition:
columns:
- name: order_id
type: integer
- name: user_id
type: integer
- name: price
type: float
- name: created_at
type: string
When new columns are detected, the tool can automatically update the schema.
Column Profiling
The tool profiles dataset columns and reports statistics such as:
- null percentage
- distinct count
- minimum values
- maximum values
Example:
Column: price
null_percent: 0.0
distinct_count: 152
min: 2.5
max: 500.0
Automatic Test Generation
Schema Drift Guard can generate useful tests automatically.
Examples:
| Column Name | Generated Tests |
|---|---|
| id | not_null, unique |
| not_null | |
| price | not_null, accepted_range |
Example generated YAML:
- name: price
tests:
- not_null
- accepted_range:
min: 0
max: 500
CI/CD Pipeline Enforcement
Schema Drift Guard can fail pipelines when drift is detected.
schema-guard check \
--source-type csv \
--source data/orders.csv \
--schema schemas/orders.yml \
--fail-on-drift
If drift is detected:
❌ Schema drift detected. Failing pipeline.
This allows teams to enforce data contracts in automated workflows.
Example GitHub Actions Workflow
name: Schema Check
on: [pull_request]
jobs:
schema_guard:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install tool
run: pip install schema-drift-guard
- name: Run schema check
run: |
schema-guard check \
--source-type csv \
--source data/orders.csv \
--schema schemas/orders.yml \
--fail-on-drift
Features
✔ Schema drift detection ✔ Column type drift detection ✔ Automatic schema updates ✔ Column profiling ✔ Automatic range test generation ✔ Schema version history ✔ Pluggable connectors ✔ Installable CLI tool ✔ CI/CD pipeline enforcement ✔ Snowflake warehouse support
Project Structure
schema-drift-guard
cli/
connectors/
core/
detectors/
generators/
schemas/
tests/
README.md
pyproject.toml
Roadmap
Future improvements:
- BigQuery connector
- Postgres connector
- dbt project integration
- Metadata-only warehouse scanning
- AI-assisted schema suggestions
Contributing
Contributions are welcome.
To contribute:
- Fork the repository
- Create a feature branch
- Submit a pull request
License
MIT License
Author
Mohsin Shaikh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schema_drift_guard-0.2.0.tar.gz.
File metadata
- Download URL: schema_drift_guard-0.2.0.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c989ff2cc340bfca2f0656091d2fa5cea0fbb6efc0e447187f9af66ac35c9f64
|
|
| MD5 |
75260636618f0e19eb2d40c2f91ccd39
|
|
| BLAKE2b-256 |
38009fb07a5545097ac0b7882cf13fbe6e0c67de48d8f28642c0ab0318dcfbf7
|
File details
Details for the file schema_drift_guard-0.2.0-py3-none-any.whl.
File metadata
- Download URL: schema_drift_guard-0.2.0-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
442e9e790aa3b644b359a36ae6383a21b6078322088e3ca62cf2de5233ac119a
|
|
| MD5 |
101480fdec6766d652335ca34a5e0861
|
|
| BLAKE2b-256 |
1486c22ff6c656cf515ac81b95742ba6f90874da22ae6f2bd36fcc3fcf82f2d2
|