Skip to main content

Detect schema drift before your SQL queries blow up.

Project description

sqldrift

Detect schema drift before your SQL queries blow up.

sqldrift validates SQL queries against a live database schema to catch missing tables before execution. Built on sqlglot for robust, multi-dialect SQL parsing.

Why?

When AI agents or automated pipelines generate SQL, they may reference tables that have been renamed, dropped, or migrated. Running those queries blindly causes production failures. sqldrift catches the problem before it happens.

Features

  • ๐Ÿ” Schema drift detection โ€” identifies missing tables before query execution
  • ๐Ÿง  CTE-aware โ€” correctly ignores Common Table Expressions
  • ๐Ÿ”ค Case-insensitive matching (configurable)
  • ๐Ÿ“› Schema-qualified name support โ€” handles schema.table references
  • โšก Optimized for scale โ€” class-based validator with O(1) lookups for 4,000+ tables
  • ๐Ÿ—„๏ธ LRU caching โ€” up to 282x speedup for repeated queries
  • ๐Ÿ—ƒ๏ธ Multi-dialect โ€” PostgreSQL, MySQL, BigQuery, and more via sqlglot

Installation

pip install .

For development:

pip install -e ".[dev]"

Quick Start

Simple one-off validation

from sqldrift import validate_query

live_tables = ["users", "orders", "products"]

success, message = validate_query(
    "SELECT * FROM users JOIN orders ON users.id = orders.user_id",
    live_tables,
)
# (True, "Query is safe to execute.")

success, message = validate_query(
    "SELECT * FROM deleted_table",
    live_tables,
)
# (False, "Schema Drift Detected: The following tables were not found: ['deleted_table']")

Reusable validator (recommended for large schemas)

from sqldrift import SchemaValidator

validator = SchemaValidator(live_tables)

# Validate many queries โ€” the table set is built once
success, message = validator.validate("SELECT * FROM users")

Cached validator (best for repeated queries)

from sqldrift import CachedSchemaValidator

cached = CachedSchemaValidator(live_tables, cache_size=256)
success, message = cached.validate(query)  # subsequent identical calls are cached

Where Do Table Lists Come From?

sqldrift does not connect to a database โ€” you provide the list of tables that currently exist in your schema. This keeps the library database-agnostic and flexible.

Common ways to source your table list:

# PostgreSQL via psycopg2
cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'")
live_tables = [row[0] for row in cur.fetchall()]

# MySQL via mysql-connector
cur.execute("SHOW TABLES")
live_tables = [row[0] for row in cur.fetchall()]

# SQLAlchemy (any backend)
from sqlalchemy import inspect
live_tables = inspect(engine).get_table_names()

# Static config (YAML, JSON, etc.)
import json
with open("schema.json") as f:
    live_tables = json.load(f)["tables"]

Then pass the list to any validator:

validator = SchemaValidator(live_tables)
success, msg = validator.validate("SELECT * FROM users JOIN orders ON ...")

Integration Examples

Detailed guides with runnable scripts for specific database integrations:

Integration Guide
AWS Glue Catalog + Athena examples/glue_athena/
GCP BigQuery examples/bigquery/
Azure Synapse Analytics examples/azure_synapse/
AWS Redshift (Data API) examples/redshift/
PostgreSQL examples/postgresql/
MySQL examples/mysql/

API Reference

validate_query(sql_query, live_tables, *, dialect=None)

Standalone function for one-off validation.

Parameter Type Description
sql_query str The SQL query to validate
live_tables list[str] Tables that exist in the schema
dialect str | None SQL dialect ("postgres", "mysql", etc.)

Returns: tuple[bool, str]

SchemaValidator(live_tables, *, case_sensitive=False, preserve_schema=False)

Class-based validator with pre-computed table lookups.

Option Default Description
case_sensitive False Enable case-sensitive table name matching
preserve_schema False Match full schema.table names instead of base name

Methods:

Method Description
validate(sql_query, dialect) Validate a query against the schema
update_schema(live_tables) Hot-swap the schema at runtime
table_exists(table_name) Check if a specific table exists
get_table_count() Return the number of registered tables

CachedSchemaValidator

Extends SchemaValidator with LRU caching. Accepts an additional cache_size parameter (default: 128).

Additional methods: clear_cache(), get_cache_info()

Project Structure

sqldrift/
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ sqldrift/
โ”‚       โ”œโ”€โ”€ __init__.py        # Public API
โ”‚       โ”œโ”€โ”€ validator.py       # Core validate_query function
โ”‚       โ””โ”€โ”€ optimized.py       # SchemaValidator & CachedSchemaValidator
โ”œโ”€โ”€ tests/
โ”‚   โ””โ”€โ”€ test_validator.py      # pytest suite (20 tests)
โ”œโ”€โ”€ examples/
โ”‚   โ””โ”€โ”€ usage_examples.py
โ””โ”€โ”€ benchmarks/
    โ””โ”€โ”€ benchmark.py

Performance

Benchmarked with 4,000 tables:

Method Speedup vs function
SchemaValidator ~2.6x faster
CachedValidator ~282x faster
python benchmarks/benchmark.py

Testing

pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqldrift-0.1.0.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqldrift-0.1.0-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file sqldrift-0.1.0.tar.gz.

File metadata

  • Download URL: sqldrift-0.1.0.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqldrift-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c15c8b64e06d17b8527887ae8cd07a88bc5de404535041bf3917b0886ae00b14
MD5 3aab16039eb452e514c45fd823e8f3c7
BLAKE2b-256 02b8db9d820fcdb30a2486520ec3ce54da8ca824f1a11923b3b3fd6818445aa2

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqldrift-0.1.0.tar.gz:

Publisher: publish.yml on aruncse01/sqldrift

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqldrift-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sqldrift-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqldrift-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bea47ef7d16f959d098e2473a21b2d90499bea995c190d09e13ae46b51fb7f39
MD5 1cf96b4dc02b3a1fe7fec67ffadb41b0
BLAKE2b-256 7a0c2ee90688dab19e66a8611557cf7bb1f2eb0df5e93cb4b7deeebc479eeba3

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqldrift-0.1.0-py3-none-any.whl:

Publisher: publish.yml on aruncse01/sqldrift

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page