Generate SQL assertions from unique join markers.
Project description
sqlassert
sqlassert is a Python library for adding safety checks to SQL before you run it.
The goal is to catch common query mistakes at test time or build time, using fast static and metadata-backed proofs instead of scanning production data. You can add sqlassert to your test suite and validate important queries offline, making them more resilient independent of the current contents of your database.
pip install sqlassert
Alpha warning: Today sqlassert supports only one check: /**UNIQUE**/ joins. It is also only tested on duckdb.
Features
Unique Join
Joins often accidentally multiply rows. A query may look correct against today’s data but silently break when the RHS relation later contains multiple matching rows.
sqlassert lets you mark joins that are expected to be unique. That is, the result of the join must never 'grow' the number of rows with respect to the LHS.
select *
from sessions
/**UNIQUE**/ join users
on sessions.user_id = users.id;
The marker is just a SQL comment. Your SQL remains valid SQL and can still run normally. sqlassert reads the query separately and validates that the RHS is provably unique for the join keys.
Usage
Run validation offline, before your application or analytics job executes the query:
import duckdb
from sqlassert import validate_unique_joins
con = duckdb.connect("warehouse.duckdb")
query = """
select *
from sessions
/**UNIQUE**/ join users
on sessions.user_id = users.id
"""
result = validate_unique_joins(con, query)
assert result.valid, result.reason
For a test suite, keep your model/query SQL as strings or load them from files, then validate them against a db connection that has the relevant schema:
def test_query_join_contract(con):
query = load_query("models/session_enrichment.sql")
result = validate_unique_joins(con, query)
assert result.valid, result.reason
result.checks contains one check per marker:
for check in result.checks:
print(check.valid)
print(check.reason)
print(check.inferred_key_columns)
print(check.constrained_key_columns)
Details
Unique Join Syntax
Place /**UNIQUE**/ immediately before the join that should be uniqueness-checked:
select *
from lhs
/**UNIQUE**/ left join rhs
on lhs.rhs_id = rhs.id;
ON and USING are both supported:
select *
from users
/**UNIQUE**/ join user_profiles
using (id);
The marker applies to the next join after the comment.
Proofs, Not Data Checks
sqlassert does not validate by querying actual table data. It will not run count(*), search for duplicates, or sample rows.
Instead, it proves uniqueness using fast information available from the SQL and database metadata. If uniqueness cannot be proven, validation fails with a reason that names the join and RHS column:
in join "INNER JOIN events ON sessions.event_id = events.id", we can't prove that RHS column id is unique
Supported uniqueness proofs today:
- RHS
PRIMARY KEYandUNIQUEconstraints from db metadata. - RHS
GROUP BYsubqueries, when the join covers the grouping keys. - RHS
SELECT DISTINCTsubqueries, when the join covers the selected distinct columns. - RHS
QUALIFY row_number() over (partition by ...) = 1subqueries, when the join covers the partition keys. - Simple projection views and subqueries that preserve one of the proofs above.
Views can inherit uniqueness when they are simple projections over a source relation with a supported proof. Filters preserve uniqueness; computed expressions, joins inside views, unions, and arbitrary subquery semantics are not guessed.
Examples:
-- Proved by primary key.
select *
from sessions
/**UNIQUE**/ join users
on sessions.user_id = users.id;
-- Proved by composite primary key plus RHS-only filter.
select *
from sessions
/**UNIQUE**/ join orders
on sessions.user_id = orders.user_id
and orders.order_id = 1;
-- Proved by GROUP BY.
with latest_session as (
select user_id, max(ts) as max_ts
from sessions
group by user_id
)
select *
from users
/**UNIQUE**/ join latest_session
on users.id = latest_session.user_id;
-- Proved by QUALIFY row_number() = 1.
with sessions_ranked as (
select user_id, *
from sessions
qualify row_number() over (partition by user_id order by ts) = 1
)
select *
from users
/**UNIQUE**/ join sessions_ranked
on users.id = sessions_ranked.user_id;
More compile-time SQL checks can be added under the same model: explicit syntax, fast validation, and clear reasons when a proof is missing.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlassert-0.1.0.tar.gz.
File metadata
- Download URL: sqlassert-0.1.0.tar.gz
- Upload date:
- Size: 11.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7255c1869a5dd8ded8ce70b77d9f8a7ad88b0ce705689bccb5c3d67b09fb78f
|
|
| MD5 |
af8ea29a74224d81bc42860c67fd33cc
|
|
| BLAKE2b-256 |
6cbba6907ae6540a1f9af1581fa43b4111edee3c210c64f03322a2c3e80cafb4
|
File details
Details for the file sqlassert-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sqlassert-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a7219e2c659ae960844a4991cb92236d84706095a1a835685492b3b528ec742
|
|
| MD5 |
81ed9bb6df4bf7f4c14a0b08603d68c6
|
|
| BLAKE2b-256 |
2c0f69d67482c6d64db766629617506a533529ca5e6b4a0ca0491d3c6fd0808e
|