Detect schema drift before your SQL queries blow up.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

aruncse01

These details have not been verified by PyPI

Project description

sqldrift

Detect schema drift before your SQL queries blow up.

sqldrift validates SQL queries against a live database schema to catch missing tables before execution. Built on sqlglot for robust, multi-dialect SQL parsing.

Why?

When AI agents or automated pipelines generate SQL, they may reference tables that have been renamed, dropped, or migrated. Running those queries blindly causes production failures. sqldrift catches the problem before it happens.

Features

Schema drift detection — identifies missing tables before query execution
CTE-aware — correctly ignores Common Table Expressions
Case-insensitive matching (configurable)
Schema-qualified name support — handles schema.table references
Optimized for scale — class-based validator with O(1) lookups for 4,000+ tables
LRU caching — up to 282x speedup for repeated queries
Multi-dialect — PostgreSQL, MySQL, BigQuery, and more via sqlglot

Installation

pip install .

For development:

pip install -e ".[dev]"

Quick Start

Simple one-off validation

from sqldrift import validate_query

live_tables = ["users", "orders", "products"]

success, message = validate_query(
    "SELECT * FROM users JOIN orders ON users.id = orders.user_id",
    live_tables,
)
# (True, "Query is safe to execute.")

success, message = validate_query(
    "SELECT * FROM deleted_table",
    live_tables,
)
# (False, "Schema Drift Detected: The following tables were not found: ['deleted_table']")

Reusable validator (recommended for large schemas)

from sqldrift import SchemaValidator

validator = SchemaValidator(live_tables)

# Validate many queries — the table set is built once
success, message = validator.validate("SELECT * FROM users")

Column-Level Drift Detection

While the standard validator checks for missing tables, ColumnValidator checks for missing columns. This requires providing a schema definition with column names.

from sqldrift import ColumnValidator

# Define your schema (table -> columns)
schema = {
    "users": {
        "columns": ["id", "name", "email", "created_at"],
        "types": ["INTEGER", "VARCHAR", "VARCHAR", "TIMESTAMP"], # optional
    },
    "orders": {
        "columns": ["id", "user_id", "total"],
    },
}

validator = ColumnValidator(schema)

# Valid query
success, msg = validator.validate("SELECT name FROM users")
# (True, "All columns exist.")

# Invalid query (column 'tier' missing)
success, msg = validator.validate("SELECT tier FROM users")
# (False, "Column Drift Detected: The following columns were not found: ['tier']")

Features

Qualified names: Handles table.column and alias references (u.name) correctly.
Suggestions: Offers specific column names if a mismatch is close (e.g. user_id vs userid).
Caching: Use CachedColumnValidator for high-performance repeated validation.

Cached validator (best for repeated queries)

from sqldrift import CachedSchemaValidator

cached = CachedSchemaValidator(live_tables, cache_size=256)
success, message = cached.validate(query)  # subsequent identical calls are cached

Where Do Table Lists Come From?

sqldrift does not connect to a database — you provide the list of tables that currently exist in your schema. This keeps the library database-agnostic and flexible.

Common ways to source your table list:

# PostgreSQL via psycopg2
cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'")
live_tables = [row[0] for row in cur.fetchall()]

# MySQL via mysql-connector
cur.execute("SHOW TABLES")
live_tables = [row[0] for row in cur.fetchall()]

# SQLAlchemy (any backend)
from sqlalchemy import inspect
live_tables = inspect(engine).get_table_names()

# Static config (YAML, JSON, etc.)
import json
with open("schema.json") as f:
    live_tables = json.load(f)["tables"]

Then pass the list to any validator:

validator = SchemaValidator(live_tables)
success, msg = validator.validate("SELECT * FROM users JOIN orders ON ...")

Integration Examples

Detailed guides with runnable scripts for specific database integrations:

Integration	Guide
AWS Glue Catalog + Athena	examples/glue_athena/
GCP BigQuery	examples/bigquery/
Azure Synapse Analytics	examples/azure_synapse/
AWS Redshift (Data API)	examples/redshift/
PostgreSQL	examples/postgresql/
MySQL	examples/mysql/

API Reference

`validate_query(sql_query, live_tables, *, dialect=None)`

Standalone function for one-off validation.

Parameter	Type	Description
`sql_query`	`str`	The SQL query to validate
`live_tables`	`list[str]`	Tables that exist in the schema
`dialect`	`str \| None`	SQL dialect (`"postgres"`, `"mysql"`, etc.)

Returns: tuple[bool, str]

`SchemaValidator(live_tables, *, case_sensitive=False, preserve_schema=False)`

Class-based validator with pre-computed table lookups.

Option	Default	Description
`case_sensitive`	`False`	Enable case-sensitive table name matching
`preserve_schema`	`False`	Match full `schema.table` names instead of base name

Methods:

Method	Description
`validate(sql_query, dialect)`	Validate a query against the schema
`update_schema(live_tables)`	Hot-swap the schema at runtime
`table_exists(table_name)`	Check if a specific table exists
`get_table_count()`	Return the number of registered tables

`CachedSchemaValidator`

Extends SchemaValidator with LRU caching. Accepts an additional cache_size parameter (default: 128).

Additional methods: clear_cache(), get_cache_info()

Project Structure

sqldrift/
├── pyproject.toml
├── LICENSE
├── README.md
├── src/
│   └── sqldrift/
│       ├── __init__.py        # Public API
│       ├── validator.py       # Core validate_query function
│       └── optimized.py       # SchemaValidator & CachedSchemaValidator
├── tests/
│   └── test_validator.py      # pytest suite (20 tests)
├── examples/
│   └── usage_examples.py
└── benchmarks/
    └── benchmark.py

Performance

Benchmarked with 4,000 tables:

Method	Speedup vs function
SchemaValidator	~2.6x faster
CachedValidator	~282x faster

python benchmarks/benchmark.py

Testing

pip install -e ".[dev]"
pytest

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

aruncse01

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.5

Feb 16, 2026

0.1.4

Feb 16, 2026

This version

0.1.2

Feb 16, 2026

0.1.1

Feb 16, 2026

0.1.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqldrift-0.1.2.tar.gz (16.8 kB view details)

Uploaded Feb 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sqldrift-0.1.2-py3-none-any.whl (13.0 kB view details)

Uploaded Feb 16, 2026 Python 3

File details

Details for the file sqldrift-0.1.2.tar.gz.

File metadata

Download URL: sqldrift-0.1.2.tar.gz
Upload date: Feb 16, 2026
Size: 16.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqldrift-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`9a4893c7986924a2e129075a563b434620242e5682e682202bdd4ffb162b4d6c`
MD5	`3c7a6a47eea83bbeb302bb342e2dbbdc`
BLAKE2b-256	`69234a6f084eb1520953764edf53fc2254456b4be5b4ab8fb2ce86f986e47a74`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqldrift-0.1.2.tar.gz:

Publisher: publish.yml on aruncse01/sqldrift

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sqldrift-0.1.2.tar.gz
- Subject digest: 9a4893c7986924a2e129075a563b434620242e5682e682202bdd4ffb162b4d6c
- Sigstore transparency entry: 954813602
- Sigstore integration time: Feb 16, 2026
Source repository:
- Permalink: aruncse01/sqldrift@3c5d4bc6e5fd4ec6938d44ae4367a5f919b60750
- Branch / Tag:
- Owner: https://github.com/aruncse01
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3c5d4bc6e5fd4ec6938d44ae4367a5f919b60750
- Trigger Event: release

File details

Details for the file sqldrift-0.1.2-py3-none-any.whl.

File metadata

Download URL: sqldrift-0.1.2-py3-none-any.whl
Upload date: Feb 16, 2026
Size: 13.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqldrift-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4b66853e2cd3fe7f1542d947f9fbe391983514a6b30c296eacf43f4a734fe066`
MD5	`c18d890f1bd3a96077ded609f75aa402`
BLAKE2b-256	`5559c9bd48ce4b22317143ec67dfe93e94faa33df44e4c10d6b64288d68a82cb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqldrift-0.1.2-py3-none-any.whl:

Publisher: publish.yml on aruncse01/sqldrift

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sqldrift-0.1.2-py3-none-any.whl
- Subject digest: 4b66853e2cd3fe7f1542d947f9fbe391983514a6b30c296eacf43f4a734fe066
- Sigstore transparency entry: 954813606
- Sigstore integration time: Feb 16, 2026
Source repository:
- Permalink: aruncse01/sqldrift@3c5d4bc6e5fd4ec6938d44ae4367a5f919b60750
- Branch / Tag:
- Owner: https://github.com/aruncse01
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3c5d4bc6e5fd4ec6938d44ae4367a5f919b60750
- Trigger Event: release

sqldrift 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

sqldrift

Why?

Features

Installation

Quick Start

Simple one-off validation

Reusable validator (recommended for large schemas)

Column-Level Drift Detection

Features

Cached validator (best for repeated queries)

Where Do Table Lists Come From?

Integration Examples

API Reference

validate_query(sql_query, live_tables, *, dialect=None)

SchemaValidator(live_tables, *, case_sensitive=False, preserve_schema=False)

CachedSchemaValidator

Project Structure

Performance

Testing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`validate_query(sql_query, live_tables, *, dialect=None)`

`SchemaValidator(live_tables, *, case_sensitive=False, preserve_schema=False)`

`CachedSchemaValidator`