Skip to main content

Generate SQL assertions from unique join markers.

Project description

sqlassert

sqlassert is a Python library for adding safety checks to SQL before you run it.

The goal is to catch common query mistakes at test time or build time, using fast static and metadata-backed proofs instead of scanning production data. You can add sqlassert to your test suite and validate important queries offline, making them more resilient independent of the current contents of your database.

pip install sqlassert

Alpha warning: Today sqlassert supports only one check: /**UNIQUE**/ joins. It is also only tested on duckdb.

Features

Unique Join

Joins often accidentally multiply rows. A query may look correct against today’s data but silently break when the RHS relation later contains multiple matching rows.

sqlassert lets you mark joins that are expected to be unique. That is, the result of the join must never 'grow' the number of rows with respect to the LHS.

select *
from sessions
/**UNIQUE**/ join users
  on sessions.user_id = users.id;

The marker is just a SQL comment. Your SQL remains valid SQL and can still run normally. sqlassert reads the query separately and validates that the RHS is provably unique for the join keys.

Usage

Run validation offline, before your application or analytics job executes the query:

import duckdb
from sqlassert import validate_unique_joins

con = duckdb.connect("warehouse.duckdb")

query = """
select *
from sessions
/**UNIQUE**/ join users
  on sessions.user_id = users.id
"""

result = validate_unique_joins(con, query)

assert result.valid, result.reason

For a test suite, keep your model/query SQL as strings or load them from files, then validate them against a db connection that has the relevant schema:

def test_query_join_contract(con):
    query = load_query("models/session_enrichment.sql")
    result = validate_unique_joins(con, query)

    assert result.valid, result.reason

result.checks contains one check per marker:

for check in result.checks:
    print(check.valid)
    print(check.reason)
    print(check.inferred_key_columns)
    print(check.constrained_key_columns)

Details

Unique Join Syntax

Place /**UNIQUE**/ immediately before the join that should be uniqueness-checked:

select *
from lhs
/**UNIQUE**/ left join rhs
  on lhs.rhs_id = rhs.id;

ON and USING are both supported:

select *
from users
/**UNIQUE**/ join user_profiles
  using (id);

The marker applies to the next join after the comment.

Proofs, Not Data Checks

sqlassert does not validate by querying actual table data. It will not run count(*), search for duplicates, or sample rows.

Instead, it proves uniqueness using fast information available from the SQL and database metadata. If uniqueness cannot be proven, validation fails with a reason that names the join and RHS column:

in join "INNER JOIN events ON sessions.event_id = events.id", we can't prove that RHS column id is unique

Supported uniqueness proofs today:

  • RHS PRIMARY KEY and UNIQUE constraints from db metadata.
  • RHS GROUP BY subqueries, when the join covers the grouping keys.
  • RHS SELECT DISTINCT subqueries, when the join covers the selected distinct columns.
  • RHS QUALIFY row_number() over (partition by ...) = 1 subqueries, when the join covers the partition keys.
  • Simple projection views and subqueries that preserve one of the proofs above.

Views can inherit uniqueness when they are simple projections over a source relation with a supported proof. Filters preserve uniqueness; computed expressions, joins inside views, unions, and arbitrary subquery semantics are not guessed.

Examples:

-- Proved by primary key.
select *
from sessions
/**UNIQUE**/ join users
  on sessions.user_id = users.id;
-- Proved by composite primary key plus RHS-only filter.
select *
from sessions
/**UNIQUE**/ join orders
  on sessions.user_id = orders.user_id
 and orders.order_id = 1;
-- Proved by GROUP BY.
with latest_session as (
  select user_id, max(ts) as max_ts
  from sessions
  group by user_id
)
select *
from users
/**UNIQUE**/ join latest_session
  on users.id = latest_session.user_id;
-- Proved by QUALIFY row_number() = 1.
with sessions_ranked as (
    select user_id, *
    from sessions
    qualify row_number() over (partition by user_id order by ts) = 1
)
select *
from users
/**UNIQUE**/ join sessions_ranked
  on users.id = sessions_ranked.user_id;

More compile-time SQL checks can be added under the same model: explicit syntax, fast validation, and clear reasons when a proof is missing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlassert-0.1.1.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlassert-0.1.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file sqlassert-0.1.1.tar.gz.

File metadata

  • Download URL: sqlassert-0.1.1.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for sqlassert-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d7cc852b445235f97f72e5ac6ea7674e8e76312665734bc3eae8209727afc37a
MD5 b0cdc2a472121cbc6455da542e34a7ad
BLAKE2b-256 8e3931c01a6265bcb3deb99134cf39e99e64b868ed4c0b54bacf4b81e4731bd6

See more details on using hashes here.

File details

Details for the file sqlassert-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sqlassert-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for sqlassert-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 46ec8c84b3b15b5186495b141910ed3140ee7c5096e64286dc4b0952d7d4c027
MD5 27bef3c1541c0895b602a0e72d90a960
BLAKE2b-256 bd18d31b06974b7a7f00bba15c719aa4fc55040e686746e6a9ad89059a574d55

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page