Skip to main content

Batch SQL transformation framework

Project description

SQLBuild

Typed, test-first SQL pipelines with change-aware incremental rebuilds.

SQLBuild is a framework for building batch SQL transformation pipelines where correctness and extensibility are first-class concerns.

Key features

  • SQL unit tests that chain across models - Mock your sources, assert on the model you care about, and SQLBuild resolves every intermediate model automatically. One test file can be a full integration test across your pipeline.
  • Audits that block bad data - Audits run before data reaches the target table. For full table builds, SQLBuild materializes into a staging table and only promotes if audits pass. For incremental models, delta-phase audits validate each batch before DML.
  • Python macros, not Jinja - Macros are real Python functions. Testable, debuggable, and composable with standard tooling.
  • Change-aware incremental rebuilds - Fingerprint-based query change detection, schema diff tracking, and configurable backfill policies with automatic cascade through the DAG.
  • Cursor-based incremental processing - Automatic gap detection and resume. If a model fails for several runs, the next build replays from where it left off. Microbatch mode splits large ranges into configurable batches.
  • User-defined functions - SQL and Python UDFs managed as project resources, with table functions for predicate-pushdown-friendly alternatives to final-layer views.
  • Environment diffs - Compare schemas and row-level data between environments with sqb diff prod:dev.
  • Zero-copy cloning - Branch environments instantly with sqb clone without duplicating data. No manifest.json required.
  • Custom materializations - Write materialization logic in Python with full framework integration, including audit hooks, schema change signals, and query change detection.
  • Path-between selectors - --select fact_orders~daily_activity_rollup selects every model on the shortest path between two nodes.

Quick start

pip install sqlbuild
# or
uv add sqlbuild

Clone the repo and run the waffle shop example:

git clone https://github.com/chio-labs/sqlbuild.git
cd sqlbuild
uv sync
sqb --project-dir examples/waffle_shop plan
sqb --project-dir examples/waffle_shop build

How it works

  1. Define your models as SQL files with MODEL() headers that declare configuration, schema, and audits inline
  2. Compile to resolve references, validate SQL (with SQLGlot), and expand Python macros
  3. Plan what needs to change based on fingerprints, schema diffs, and backfill policies
  4. Build by executing the plan: materializing models, validating data before promotion, and ensuring bad data never reaches production
  5. Iterate with first-class support for chained unit tests, zero-copy cloning, and deferred builds - fast feedback without rebuilding the world

Example

A simple staging model:

MODEL (
  materialized view,
  tags [staging],
);

SELECT
  id AS order_id,
  customer_id,
  ordered_at,
  status
FROM __source("raw_orders")

An incremental model with microbatch processing:

MODEL (
  materialized incremental,
  incremental_strategy delete_insert,
  cursor activity_hour,
  cursor_type timestamp,
  cursor_grain hour,
  cursor_inputs (
    fact_orders ordered_at,
  ),
  incremental_mode microbatch,
  batch_size 1d,
  tags [marts],
);

SELECT
  DATE_TRUNC('hour', o.ordered_at) AS activity_hour,
  COUNT(*) AS orders_placed,
  SUM(o.quantity) AS waffles_ordered
FROM __ref("fact_orders") o
GROUP BY DATE_TRUNC('hour', o.ordered_at)

A chained unit test:

TEST();

WITH
__source__raw_orders AS (
  @mock_orders()
),
__source__raw_payments AS (
  SELECT 1 AS payment_id, 1 AS order_id, 1500 AS amount_cents, 'credit_card' AS method
),
__expected__fact_orders AS (
  SELECT 1 AS order_id, 100 AS customer_id, 1500 AS total_cents,
         'credit_card' AS payment_method
)
SELECT 1

Documentation

Full documentation is available at docs.sqlbuild.com.

Contributing

We welcome contributions. Please see CONTRIBUTING.md for guidelines.

License

SQLBuild is licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlbuild-0.2.1.tar.gz (891.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlbuild-0.2.1-py3-none-any.whl (398.5 kB view details)

Uploaded Python 3

File details

Details for the file sqlbuild-0.2.1.tar.gz.

File metadata

  • Download URL: sqlbuild-0.2.1.tar.gz
  • Upload date:
  • Size: 891.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sqlbuild-0.2.1.tar.gz
Algorithm Hash digest
SHA256 6d48bc19be0ee1917596cd5c67695b3fec908ec1b759469cf92f6cbf5d54b2e4
MD5 1feaf766f4697df74ab44993e1d7f4d5
BLAKE2b-256 b682c3b79924cb35f057f16c00f136e44cad5fa3c8bd93c4f3b306c9be5fafbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlbuild-0.2.1.tar.gz:

Publisher: publish.yml on chio-labs/sqlbuild

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlbuild-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: sqlbuild-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 398.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sqlbuild-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fe617f077dd8fec8acb6b976e024ab1e0553c4485b0210e0570cbba7fc593c43
MD5 2a4900962499e01f7263e2a8ec2dc5e7
BLAKE2b-256 2f33d2f42883b09016d75ea69220d34752ac3f4f99f3a6e7d14b4611d28d8ecb

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlbuild-0.2.1-py3-none-any.whl:

Publisher: publish.yml on chio-labs/sqlbuild

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page