Skip to main content

Generate SQL assertions from unique join markers.

Project description

sqlassert

sqlassert is a Python library for adding safety checks to SQL before you run it.

The goal is to catch common query mistakes at test time or build time, using fast static and metadata-backed proofs instead of scanning production data. You can add sqlassert to your test suite and validate important queries offline, making them more resilient independent of the current contents of your database.

pip install sqlassert

Alpha warning: Today sqlassert supports only one check: /**UNIQUE**/ joins. It is also only tested on duckdb.

Features

Unique Join

Joins often accidentally multiply rows. A query may look correct against today’s data but silently break when the RHS relation later contains multiple matching rows.

sqlassert lets you mark joins that are expected to be unique. That is, the result of the join must never 'grow' the number of rows with respect to the LHS.

select *
from sessions
/**UNIQUE**/ join users
  on sessions.user_id = users.id;

The marker is just a SQL comment. Your SQL remains valid SQL and can still run normally. sqlassert reads the query separately and validates that the RHS is provably unique for the join keys.

Usage

Run validation offline, before your application or analytics job executes the query:

import duckdb
from sqlassert import validate_unique_joins

con = duckdb.connect("warehouse.duckdb")

query = """
select *
from sessions
/**UNIQUE**/ join users
  on sessions.user_id = users.id
"""

result = validate_unique_joins(con, query)

assert result.valid, result.reason

For a test suite, keep your model/query SQL as strings or load them from files, then validate them against a db connection that has the relevant schema:

def test_query_join_contract(con):
    query = load_query("models/session_enrichment.sql")
    result = validate_unique_joins(con, query)

    assert result.valid, result.reason

result.checks contains one check per marker:

for check in result.checks:
    print(check.valid)
    print(check.reason)
    print(check.inferred_key_columns)
    print(check.constrained_key_columns)

Details

Unique Join Syntax

Place /**UNIQUE**/ immediately before the join that should be uniqueness-checked:

select *
from lhs
/**UNIQUE**/ left join rhs
  on lhs.rhs_id = rhs.id;

ON and USING are both supported:

select *
from users
/**UNIQUE**/ join user_profiles
  using (id);

The marker applies to the next join after the comment.

Proofs, Not Data Checks

sqlassert does not validate by querying actual table data. It will not run count(*), search for duplicates, or sample rows.

Instead, it proves uniqueness using fast information available from the SQL and database metadata. If uniqueness cannot be proven, validation fails with a reason that names the join and RHS column:

in join "INNER JOIN events ON sessions.event_id = events.id", we can't prove that RHS column id is unique

Supported uniqueness proofs today:

  • RHS PRIMARY KEY and UNIQUE constraints from db metadata.
  • RHS GROUP BY subqueries, when the join covers the grouping keys.
  • RHS SELECT DISTINCT subqueries, when the join covers the selected distinct columns.
  • RHS QUALIFY row_number() over (partition by ...) = 1 subqueries, when the join covers the partition keys.
  • Simple projection views and subqueries that preserve one of the proofs above.

Views can inherit uniqueness when they are simple projections over a source relation with a supported proof. Filters preserve uniqueness; computed expressions, joins inside views, unions, and arbitrary subquery semantics are not guessed.

Examples:

-- Proved by primary key.
select *
from sessions
/**UNIQUE**/ join users
  on sessions.user_id = users.id;
-- Proved by composite primary key plus RHS-only filter.
select *
from sessions
/**UNIQUE**/ join orders
  on sessions.user_id = orders.user_id
 and orders.order_id = 1;
-- Proved by GROUP BY.
with latest_session as (
  select user_id, max(ts) as max_ts
  from sessions
  group by user_id
)
select *
from users
/**UNIQUE**/ join latest_session
  on users.id = latest_session.user_id;
-- Proved by QUALIFY row_number() = 1.
with sessions_ranked as (
    select user_id, *
    from sessions
    qualify row_number() over (partition by user_id order by ts) = 1
)
select *
from users
/**UNIQUE**/ join sessions_ranked
  on users.id = sessions_ranked.user_id;

More compile-time SQL checks can be added under the same model: explicit syntax, fast validation, and clear reasons when a proof is missing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlassert-0.1.0.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlassert-0.1.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file sqlassert-0.1.0.tar.gz.

File metadata

  • Download URL: sqlassert-0.1.0.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for sqlassert-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f7255c1869a5dd8ded8ce70b77d9f8a7ad88b0ce705689bccb5c3d67b09fb78f
MD5 af8ea29a74224d81bc42860c67fd33cc
BLAKE2b-256 6cbba6907ae6540a1f9af1581fa43b4111edee3c210c64f03322a2c3e80cafb4

See more details on using hashes here.

File details

Details for the file sqlassert-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sqlassert-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for sqlassert-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a7219e2c659ae960844a4991cb92236d84706095a1a835685492b3b528ec742
MD5 81ed9bb6df4bf7f4c14a0b08603d68c6
BLAKE2b-256 2c0f69d67482c6d64db766629617506a533529ca5e6b4a0ca0491d3c6fd0808e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page