Skip to main content

Validate a semql Catalog against a live database — catches missing tables, dropped columns, and broken join predicates before a deploy.

Project description

semql-validate-db

Pre-deploy drift checker for semql catalogues. Runs cheap probe queries against a live database and surfaces the class of bugs the compiler can't see — missing tables, dropped columns, broken join predicates, base-predicate drift.

semql is intentionally pure (PHILOSOPHY: "the compiler has no I/O"). That keeps the compiler simple, but it also means a catalog can pass every compile-time check and still blow up at query time because upstream renamed a column. semql-validate-db is the out-of-band gate that catches it.

Install

pip install semql-validate-db

The package is driver-agnostic. Bring your own DB-API 2.0 connection:

pip install psycopg              # Postgres
pip install clickhouse-connect   # ClickHouse
pip install duckdb               # DuckDB

Quick start

import duckdb
from semql import Backend, Catalog, Cube, Dimension, Measure, TimeDimension
from semql_validate_db import validate_against_db

orders = Cube(
    name="orders",
    backend=Backend.DUCKDB,
    table="orders",
    alias="o",
    measures=[Measure(name="revenue", sql="{o}.amount", agg="sum")],
    dimensions=[Dimension(name="region", sql="{o}.region", type="string")],
    time_dimensions=[TimeDimension(name="created_at", sql="{o}.created_at")],
)
catalog = Catalog([orders])

conn = duckdb.connect(":memory:")
conn.execute(
    "CREATE TABLE orders (amount DOUBLE, region TEXT, created_at TIMESTAMP)"
)

errors = validate_against_db(catalog, connection=conn)
for e in errors:
    print(f"{e.code}: {e.cube}.{e.field or ''}{e.message}")

A clean run returns an empty list. Drift (a missing column, a renamed table) yields one DbValidationError per finding so a single run gives the full picture instead of bailing on the first failure.

What it catches

  • missing_tablecube.table doesn't exist or the connection's role can't see it.
  • missing_column — a measure / dimension / time-dimension SQL fragment references a column that no longer exists.
  • base_predicate_invalidcube.base_predicate doesn't execute.
  • join_predicate_invalid — a Join.on predicate references columns that aren't there, or compares incompatible types.
  • required_filter_dimension_missing — static catalog check; the named required_filters entry has no matching Dimension.

What it doesn't catch

  • Semantic drift (a column exists but means something different now). Schema is necessary, not sufficient.
  • Cross-table referential integrity. The probes are LIMIT 0; they parse, they don't sample.
  • Backend-specific feature drift (a function got dialect-renamed). Use the compiler's snapshot tests for that.

Why LIMIT 0?

Every probe runs SELECT … LIMIT 0. The query planner type-checks identifiers and predicates but does no row work, so the cost is microseconds per probe — fine for a per-cube fan-out in CI. The trade-off is that purely runtime drift (e.g. an enum value that got dropped from a check constraint) won't surface here.

CLI

The package is library-first; a CLI lives in callers' deploy scripts where the connection / DSN / role are already known.

Status

Phase A: probe-by-fragment shape. Drift findings are accurate; performance is "fine for CI, not for runtime gates."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semql_validate_db-0.1.0.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semql_validate_db-0.1.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file semql_validate_db-0.1.0.tar.gz.

File metadata

  • Download URL: semql_validate_db-0.1.0.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for semql_validate_db-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fed49642ecc497421d91e358e246a0eecd441233939286c42806fcd41fdd8668
MD5 bd5081685d03f153c9dcd15b8f31cf16
BLAKE2b-256 7e45cfa0f57abcc845d5850636a6810c9c2fb5294b0904b3cd35a0491e6c79fe

See more details on using hashes here.

File details

Details for the file semql_validate_db-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: semql_validate_db-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for semql_validate_db-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 de9c1801a4a2e8b82166675ac36fffbb4e689893311367e2c557b5451019a050
MD5 5d350ef8d7cb4d6d5658f6a105a7ba1b
BLAKE2b-256 33894cb4fabbc072e083068c9e83c6f0a1b9756e20d303e15d9935b73d639906

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page