Detect schema drift before your SQL queries blow up.
Project description
sqldrift
Detect schema drift before your SQL queries blow up.
sqldrift validates SQL queries against a live database schema to catch missing tables before execution. Built on sqlglot for robust, multi-dialect SQL parsing.
Why?
When AI agents or automated pipelines generate SQL, they may reference tables that have been renamed, dropped, or migrated. Running those queries blindly causes production failures. sqldrift catches the problem before it happens.
Features
- ๐ Schema drift detection โ identifies missing tables before query execution
- ๐ง CTE-aware โ correctly ignores Common Table Expressions
- ๐ค Case-insensitive matching (configurable)
- ๐ Schema-qualified name support โ handles
schema.tablereferences - โก Optimized for scale โ class-based validator with O(1) lookups for 4,000+ tables
- ๐๏ธ LRU caching โ up to 282x speedup for repeated queries
- ๐๏ธ Multi-dialect โ PostgreSQL, MySQL, BigQuery, and more via sqlglot
Installation
pip install .
For development:
pip install -e ".[dev]"
Quick Start
Simple one-off validation
from sqldrift import validate_query
live_tables = ["users", "orders", "products"]
success, message = validate_query(
"SELECT * FROM users JOIN orders ON users.id = orders.user_id",
live_tables,
)
# (True, "Query is safe to execute.")
success, message = validate_query(
"SELECT * FROM deleted_table",
live_tables,
)
# (False, "Schema Drift Detected: The following tables were not found: ['deleted_table']")
Reusable validator (recommended for large schemas)
from sqldrift import SchemaValidator
validator = SchemaValidator(live_tables)
# Validate many queries โ the table set is built once
success, message = validator.validate("SELECT * FROM users")
Cached validator (best for repeated queries)
from sqldrift import CachedSchemaValidator
cached = CachedSchemaValidator(live_tables, cache_size=256)
success, message = cached.validate(query) # subsequent identical calls are cached
Where Do Table Lists Come From?
sqldrift does not connect to a database โ you provide the list of tables that currently exist in your schema. This keeps the library database-agnostic and flexible.
Common ways to source your table list:
# PostgreSQL via psycopg2
cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'")
live_tables = [row[0] for row in cur.fetchall()]
# MySQL via mysql-connector
cur.execute("SHOW TABLES")
live_tables = [row[0] for row in cur.fetchall()]
# SQLAlchemy (any backend)
from sqlalchemy import inspect
live_tables = inspect(engine).get_table_names()
# Static config (YAML, JSON, etc.)
import json
with open("schema.json") as f:
live_tables = json.load(f)["tables"]
Then pass the list to any validator:
validator = SchemaValidator(live_tables)
success, msg = validator.validate("SELECT * FROM users JOIN orders ON ...")
Integration Examples
Detailed guides with runnable scripts for specific database integrations:
| Integration | Guide |
|---|---|
| AWS Glue Catalog + Athena | examples/glue_athena/ |
| GCP BigQuery | examples/bigquery/ |
| Azure Synapse Analytics | examples/azure_synapse/ |
| AWS Redshift (Data API) | examples/redshift/ |
| PostgreSQL | examples/postgresql/ |
| MySQL | examples/mysql/ |
API Reference
validate_query(sql_query, live_tables, *, dialect=None)
Standalone function for one-off validation.
| Parameter | Type | Description |
|---|---|---|
sql_query |
str |
The SQL query to validate |
live_tables |
list[str] |
Tables that exist in the schema |
dialect |
str | None |
SQL dialect ("postgres", "mysql", etc.) |
Returns: tuple[bool, str]
SchemaValidator(live_tables, *, case_sensitive=False, preserve_schema=False)
Class-based validator with pre-computed table lookups.
| Option | Default | Description |
|---|---|---|
case_sensitive |
False |
Enable case-sensitive table name matching |
preserve_schema |
False |
Match full schema.table names instead of base name |
Methods:
| Method | Description |
|---|---|
validate(sql_query, dialect) |
Validate a query against the schema |
update_schema(live_tables) |
Hot-swap the schema at runtime |
table_exists(table_name) |
Check if a specific table exists |
get_table_count() |
Return the number of registered tables |
CachedSchemaValidator
Extends SchemaValidator with LRU caching. Accepts an additional cache_size parameter (default: 128).
Additional methods: clear_cache(), get_cache_info()
Project Structure
sqldrift/
โโโ pyproject.toml
โโโ LICENSE
โโโ README.md
โโโ src/
โ โโโ sqldrift/
โ โโโ __init__.py # Public API
โ โโโ validator.py # Core validate_query function
โ โโโ optimized.py # SchemaValidator & CachedSchemaValidator
โโโ tests/
โ โโโ test_validator.py # pytest suite (20 tests)
โโโ examples/
โ โโโ usage_examples.py
โโโ benchmarks/
โโโ benchmark.py
Performance
Benchmarked with 4,000 tables:
| Method | Speedup vs function |
|---|---|
| SchemaValidator | ~2.6x faster |
| CachedValidator | ~282x faster |
python benchmarks/benchmark.py
Testing
pip install -e ".[dev]"
pytest
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqldrift-0.1.0.tar.gz.
File metadata
- Download URL: sqldrift-0.1.0.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c15c8b64e06d17b8527887ae8cd07a88bc5de404535041bf3917b0886ae00b14
|
|
| MD5 |
3aab16039eb452e514c45fd823e8f3c7
|
|
| BLAKE2b-256 |
02b8db9d820fcdb30a2486520ec3ce54da8ca824f1a11923b3b3fd6818445aa2
|
Provenance
The following attestation bundles were made for sqldrift-0.1.0.tar.gz:
Publisher:
publish.yml on aruncse01/sqldrift
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqldrift-0.1.0.tar.gz -
Subject digest:
c15c8b64e06d17b8527887ae8cd07a88bc5de404535041bf3917b0886ae00b14 - Sigstore transparency entry: 954492289
- Sigstore integration time:
-
Permalink:
aruncse01/sqldrift@ca3a7c93c29112ad79a08b43d97a397c0d39a303 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/aruncse01
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ca3a7c93c29112ad79a08b43d97a397c0d39a303 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sqldrift-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sqldrift-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bea47ef7d16f959d098e2473a21b2d90499bea995c190d09e13ae46b51fb7f39
|
|
| MD5 |
1cf96b4dc02b3a1fe7fec67ffadb41b0
|
|
| BLAKE2b-256 |
7a0c2ee90688dab19e66a8611557cf7bb1f2eb0df5e93cb4b7deeebc479eeba3
|
Provenance
The following attestation bundles were made for sqldrift-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on aruncse01/sqldrift
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqldrift-0.1.0-py3-none-any.whl -
Subject digest:
bea47ef7d16f959d098e2473a21b2d90499bea995c190d09e13ae46b51fb7f39 - Sigstore transparency entry: 954492290
- Sigstore integration time:
-
Permalink:
aruncse01/sqldrift@ca3a7c93c29112ad79a08b43d97a397c0d39a303 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/aruncse01
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ca3a7c93c29112ad79a08b43d97a397c0d39a303 -
Trigger Event:
release
-
Statement type: