Skip to main content

Validate a semql Catalog against a live database — catches missing tables, dropped columns, and broken join predicates before a deploy.

Project description

semql-validate-db

Pre-deploy drift checker for semql catalogs. Runs cheap probe queries against a live database and surfaces the class of bugs the compiler can't see — missing tables, dropped columns, broken join predicates, base-predicate drift.

semql is intentionally pure (PHILOSOPHY: "the compiler has no I/O"). That keeps the compiler simple, but it also means a catalog can pass every compile-time check and still blow up at query time because upstream renamed a column. semql-validate-db is the out-of-band gate that catches it.

Install

pip install semql-validate-db

The package is driver-agnostic. Bring your own DB-API 2.0 connection:

pip install psycopg              # Postgres
pip install clickhouse-connect   # ClickHouse
pip install duckdb               # DuckDB

Quick start

import duckdb
from semql import Backend, Catalog, Cube, Dimension, Measure, TimeDimension
from semql_validate_db import validate_against_db

orders = Cube(
    name="orders",
    backend=Backend.DUCKDB,
    table="orders",
    alias="o",
    measures=[Measure(name="revenue", sql="{o}.amount", agg="sum")],
    dimensions=[Dimension(name="region", sql="{o}.region", type="string")],
    time_dimensions=[TimeDimension(name="created_at", sql="{o}.created_at")],
)
catalog = Catalog([orders])

conn = duckdb.connect(":memory:")
conn.execute(
    "CREATE TABLE orders (amount DOUBLE, region TEXT, created_at TIMESTAMP)"
)

errors = validate_against_db(catalog, connection=conn)
for e in errors:
    print(f"{e.code}: {e.cube}.{e.field or ''}{e.message}")

A clean run returns an empty list. Drift (a missing column, a renamed table) yields one DbValidationError per finding so a single run gives the full picture instead of bailing on the first failure.

What it catches

  • missing_tablecube.table doesn't exist or the connection's role can't see it.
  • missing_column — a measure / dimension / time-dimension SQL fragment references a column that no longer exists.
  • base_predicate_invalidcube.base_predicate doesn't execute.
  • join_predicate_invalid — a Join.on predicate references columns that aren't there, or compares incompatible types.
  • required_filter_dimension_missing — static catalog check; the named required_filters entry has no matching Dimension.

What it doesn't catch

  • Semantic drift (a column exists but means something different now). Schema is necessary, not sufficient.
  • Cross-table referential integrity. The probes are LIMIT 0; they parse, they don't sample.
  • Backend-specific feature drift (a function got dialect-renamed). Use the compiler's snapshot tests for that.

Why LIMIT 0?

Every probe runs SELECT … LIMIT 0. The query planner type-checks identifiers and predicates but does no row work, so the cost is microseconds per probe — fine for a per-cube fan-out in CI. The trade-off is that purely runtime drift (e.g. an enum value that got dropped from a check constraint) won't surface here.

CLI

The package is library-first; a CLI lives in callers' deploy scripts where the connection / DSN / role are already known.

Status

Phase A: probe-by-fragment shape. Drift findings are accurate; performance is "fine for CI, not for runtime gates."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semql_validate_db-0.2.1.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semql_validate_db-0.2.1-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file semql_validate_db-0.2.1.tar.gz.

File metadata

  • Download URL: semql_validate_db-0.2.1.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for semql_validate_db-0.2.1.tar.gz
Algorithm Hash digest
SHA256 7ae70cb27e81acd329c57b54120ad41878952d26262559aeb27f30f7c0a23324
MD5 0a252e5550cfd24f67a1be1ff3d1e66d
BLAKE2b-256 b8891c024bee07a0078d5215832f3b9bc4f3f65f91d478b3a038aae05b8ce348

See more details on using hashes here.

File details

Details for the file semql_validate_db-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: semql_validate_db-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for semql_validate_db-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 af2ae44044abf85f98f9015191e53d859b23dd336bea5da301d206e84a0b8040
MD5 c1c2af7ca01c15260e192d7e54fc4b03
BLAKE2b-256 dc57d1d859812bebaedfb5c50a944ba3d2afd2ffb6f397c75bf26ac175980987

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page