Typed, test-first SQL pipelines with local E2E testing
Project description
Change-aware SQL pipelines: only rebuild what changed, with all state in the warehouse.
SQLBuild is a SQL-first framework for building reliable warehouse pipelines. Every build is change-aware by default -- models, seeds, functions, and Python nodes are fingerprinted, source freshness is tracked, and unchanged work (including audits that already passed) is skipped automatically. All state is persisted as append-only tables in the warehouse alongside your data. No external state database, no manifest files, no paid add-on.
It keeps a low, dbt-like floor for SQL models and can run alongside an existing dbt project. It adds ingestion, Python nodes, providers, and opt-in virtual environments for more advanced use cases, letting you expand scope as your project naturally grows.
Key features
- Change-aware builds by default -- Fingerprint-based tracking for models, seeds, functions, and Python nodes. Source freshness observation. Audits skipped when the model version hasn't changed. Cascade propagation with configurable replay windows (
replay_on_change). Pass--forceto override and run everything selected. - Warehouse-native state -- All change-tracking state lives in append-only tables (
_sqlbuild_fingerprints,_sqlbuild_source_freshness) in your warehouse schemas. No external state database, no state machine, no corruption risk. The planner reads the latest row per identity, compares, and appends after successful builds. - Reuse from production -- Dev targets can opt into
reuse_fromto clone or copy unchanged relations from another target (e.g. prod) instead of rebuilding. Zero compute for models that match. - Audits that block bad data -- Audits run before data reaches the target table. For full table builds, SQLBuild materializes into a staging table and only promotes if audits pass. For incremental models, delta-phase audits validate each batch before DML.
- SQL-first models with compile-time validation -- Define models as SQL files with
MODEL()headers while SQLBuild resolves references, validates SQL, infers columns, checks contracts, and computes lineage before anything runs. - Cursor-based incremental processing -- Automatic gap detection and resume. If a model fails for several runs, the next build replays from where it left off. Microbatch mode splits large ranges into configurable batches.
- Source freshness -- Track whether external source data has changed via adapter metadata, column queries, or custom SQL. Lag tolerance prevents jitter from triggering unnecessary rebuilds.
sqb freshnessobserves freshness without running a build;--fail-on-stalegates CI pipelines. - Source loaders -- Load external data into source tables with Python
@loaderfunctions. Supports incremental write strategies (table, append, delete_insert, merge), cursor-based loading, and concurrent execution. Loaders run automatically during builds. - Python nodes -- Tasks (
@task), assets (@asset), checks (@check), and factories (@factory) as first-class DAG nodes alongside SQL models. Identity-tracked with source and dependency diffs shown in the plan. - Providers -- Shared runtime services (API clients, connections, config) injected into Python nodes and hooks by parameter name. Backed by pydantic-settings for validation and environment variable support.
- Python lifecycle hooks -- Typed
sql("...")/python("hook_name")hooks with compile-time validation and aHookContextAPI. Provider injection supported. - Python macros, not Jinja -- Macros are real Python functions. Testable, debuggable, and composable with standard tooling.
- User-defined functions -- SQL and Python UDFs managed as project resources, with table functions for predicate-pushdown-friendly alternatives to final-layer views.
- Custom materializations -- Write materialization logic in Python with full framework integration, including audit hooks, schema change signals, and query change detection.
- SQL unit tests that chain across models -- Mock your sources, assert on the model you care about, and SQLBuild resolves every intermediate model automatically. One test file can be a full integration test across your pipeline.
- End-to-end scenarios with local replay -- Define coherent fixture worlds, run the real project graph in an isolated warehouse slice, capture JSONL snapshots, and replay them locally through DuckDB for fast CI feedback.
- Data diffs -- Compare schemas and row-level data between targets with
sqb diff prod:dev. - Zero-copy cloning -- Branch targets instantly with
sqb clonewithout duplicating data. Nomanifest.jsonrequired. - Path-between selectors --
--select fact_orders~daily_activity_rollupselects every model on the shortest path between two nodes. - Virtual environments (alpha) -- Opt-in state-backed workflows for versioned model outputs, zero-copy branching, instant promotion and rollback, per-PR preview environments. Seeds are versioned alongside models. State stored in PostgreSQL or DuckDB, scoped per environment.
- Python you can read, Rust where it counts -- The framework is Python. For SQL parsing, validation, column inference, lineage, and transpilation, SQLBuild uses Polyglot, a Rust reimplementation of SQLGlot's SQL analysis capabilities (MIT, 32+ dialects).
- Dagster and Rivers integrations -- Models, loaders, tasks, assets, and checks map to Dagster/Rivers assets with dependency edges preserved. Python checks become asset checks.
Quick start
pip install sqlbuild
# or
uv pip install sqlbuild
Create and run the included playground project:
sqb playground waffle-shop
cd waffle-shop
sqb plan
sqb build
sqb test
sqb scenario test
How it works
- Define your models as SQL files with
MODEL()headers that declare configuration, schema, and audits inline - Compile to resolve references, validate SQL, infer column types, check contracts, and compute column lineage -- all offline
- Plan what needs to change by comparing fingerprints, source freshness, seed content, and Python node identities against the warehouse state. Unchanged models, seeds, audits, and Python nodes are skipped. Production relations can optionally be reused when version identities match.
- Build by executing the plan: materializing only what changed, validating data before promotion, and ensuring bad data never reaches production
- Test with chained unit tests, E2E scenario tests, and local replay through DuckDB -- no warehouse required
Example
A simple staging model:
MODEL (
materialized view,
tags [staging],
);
SELECT
id AS order_id,
customer_id,
ordered_at,
status
FROM __source("raw__orders")
An incremental model with microbatch processing:
MODEL (
materialized incremental,
incremental_strategy delete_insert,
cursor activity_hour,
cursor_type timestamp,
cursor_grain hour,
cursor_inputs (
fact_orders ordered_at,
),
incremental_mode microbatch,
batch_size 1d,
replay_on_change full,
tags [marts],
post_hooks [sql('GRANT SELECT ON @@CTX:destination.qualified TO analyst_role')],
);
SELECT
DATE_TRUNC('hour', o.ordered_at) AS activity_hour,
COUNT(*) AS orders_placed,
SUM(o.quantity) AS waffles_ordered
FROM __ref("fact_orders") o
GROUP BY DATE_TRUNC('hour', o.ordered_at)
A chained unit test:
TEST();
WITH
__source__raw__orders AS (
@mock_orders()
),
__source__raw__payments AS (
SELECT 1 AS payment_id, 1 AS order_id, 1500 AS amount_cents, 'credit_card' AS method
),
__expected__fact_orders AS (
SELECT 1 AS order_id, 100 AS customer_id, 1500 AS total_cents,
'credit_card' AS payment_method
),
__assert__no_negative_totals AS (
SELECT * FROM __ref("fact_orders") WHERE total_cents < 0
)
SELECT 1
An end-to-end scenario:
SCENARIO (
description "Customer refund updates daily revenue correctly",
tags [revenue, refund],
);
WITH
__source__raw__orders AS (
SELECT 1 AS order_id, DATE '2026-01-01' AS order_date, 100.00 AS amount
),
__source__raw__refunds AS (
SELECT 1 AS refund_id, 1 AS order_id, DATE '2026-01-01' AS refund_date, 25.00 AS amount
),
__expected__daily_revenue AS (
SELECT DATE '2026-01-01' AS order_date, 75.00 AS revenue
),
__assert__all_refunds_linked AS (
SELECT * FROM __ref("fact_refunds") WHERE order_id IS NULL
)
SELECT 1
Scenario files live under tests/scenarios/**/*.sql. Run them in the target warehouse with:
sqb scenario test
sqb scenario test --select revenue__customer_refund --retain
sqb scenario test --select tests/scenarios/revenue --exclude revenue__slow_refund
Capture local replay snapshots as JSONL under tests/_scenario_snapshots/<scenario_name>/:
sqb scenario capture --select revenue__customer_refund
sqb scenario capture --select-file changed_scenarios.txt --exclude revenue__slow_refund
sqb scenario test --select revenue__customer_refund --local
sqb scenario test --local --sync-snapshots
sqb scenario test --local --refresh
Snapshots are committable test data. Review them for sensitive values before committing.
A source loader:
from sqlbuild.loaders import loader
from sqlbuild.executor.load.models import LoaderContext
@loader
def raw_orders(ctx: LoaderContext) -> list[dict[str, object]]:
if ctx.current_cursor_value is None:
return fetch_all_orders()
return fetch_orders_since(ctx.current_cursor_value)
Python loaders should return rows for SQLBuild to write, such as a list of dictionaries
or another supported tabular row object. Self-managed loaders that write their own data
can return None.
Bound to a source in sources/*.yml:
sources:
- name: raw_orders
managed: true
write_strategy: delete_insert
cursor_column: ordered_at
columns:
- name: id
type: INTEGER
- name: ordered_at
type: TIMESTAMP
Supported adapters
| Adapter | Status |
|---|---|
| DuckDB | Supported |
| MotherDuck | Supported |
| Snowflake | Supported |
| BigQuery | Supported |
| Databricks | Supported |
| PostgreSQL | Supported |
| SQL Server | Supported |
Documentation
Full documentation is available at docs.sqlbuild.com.
Contributing
We welcome contributions. Please see CONTRIBUTING.md for guidelines.
License
SQLBuild is licensed under the Apache License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlbuild-0.33.0.tar.gz.
File metadata
- Download URL: sqlbuild-0.33.0.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f432a353bb418d9168508b1ff1a2c1ad34c3b439ab6c9916322617462ab312d7
|
|
| MD5 |
b8a04d6b8def3fd5dbba313ad78851e2
|
|
| BLAKE2b-256 |
ca9b5be6f5e49e05f7a2c102c4d6cb5671f0c845302ba0af1bb7a2c7b8494ccb
|
Provenance
The following attestation bundles were made for sqlbuild-0.33.0.tar.gz:
Publisher:
publish.yml on chio-labs/sqlbuild
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqlbuild-0.33.0.tar.gz -
Subject digest:
f432a353bb418d9168508b1ff1a2c1ad34c3b439ab6c9916322617462ab312d7 - Sigstore transparency entry: 1807965465
- Sigstore integration time:
-
Permalink:
chio-labs/sqlbuild@e2d729b88bb7777662464cfcaefc2ec1cb97d7e1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/chio-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e2d729b88bb7777662464cfcaefc2ec1cb97d7e1 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file sqlbuild-0.33.0-py3-none-any.whl.
File metadata
- Download URL: sqlbuild-0.33.0-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47acef9c038780d8a84e36c32ef911b4264b7f5567826808f3ac85e0e5d52fa7
|
|
| MD5 |
aea657c12e45865c91f7d4b51d298604
|
|
| BLAKE2b-256 |
8d81b9b87e0d7d34c87565457c1afe0925a9dd433fca5f6276fc830a273ee938
|
Provenance
The following attestation bundles were made for sqlbuild-0.33.0-py3-none-any.whl:
Publisher:
publish.yml on chio-labs/sqlbuild
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqlbuild-0.33.0-py3-none-any.whl -
Subject digest:
47acef9c038780d8a84e36c32ef911b4264b7f5567826808f3ac85e0e5d52fa7 - Sigstore transparency entry: 1807965466
- Sigstore integration time:
-
Permalink:
chio-labs/sqlbuild@e2d729b88bb7777662464cfcaefc2ec1cb97d7e1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/chio-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e2d729b88bb7777662464cfcaefc2ec1cb97d7e1 -
Trigger Event:
workflow_dispatch
-
Statement type: