A dbt adapter that runs SQL in DuckDB and materializes to Delta Lake (delta_rs).
Project description
duckrun is a dbt adapter that runs your model SQL in
DuckDB and materializes the results to Delta Lake using
delta_rs (the deltalake Python package).
It is a thin layer on top of dbt-duckdb: you
get everything dbt-duckdb offers (views, seeds, sources, tests, snapshots, the full
plugin ecosystem) plus a delta materialization that writes real Delta tables —
the one thing dbt-duckdb doesn't do out of the box. Delta writes use ZSTD compression
and row-group sizing tuned for Power BI / DirectLake.
Note: versions ≤ 0.2.x of
duckrunwere a Microsoft Fabric/OneLake helper library. From 0.1.0 of this line onward, duckrun is a dbt adapter. The old API lives in git history onmain.
Why
DuckDB is a fantastic query engine; Delta Lake is a great open table format. dbt is the
right tool to orchestrate the DAG. duckrun simply wires the three together: DuckDB
executes, delta_rs materializes, dbt orchestrates.
Install
pip install duckrun
That single install pulls dbt-duckdb (and therefore duckdb) and deltalake.
Configure your profile
# ~/.dbt/profiles.yml
my_project:
target: dev
outputs:
dev:
type: duckrun
# DuckDB runs in-memory by default — the Delta tables are the only state.
# Default Delta location for models that don't set config(location=...)
root_path: './warehouse' # local path, or abfss://.../Tables, s3://..., gs://...
# storage_options: {} # passed through to deltalake for remote stores
In a notebook where the storage secret is already provided to DuckDB, leave
storage_options empty.
Remote stores (Microsoft Fabric OneLake / ADLS / S3 / GCS)
Point root_path at the warehouse location and pass credentials via storage_options
— these flow straight to deltalake for writes/merges. If storage_options carries a
bearer_token (or token / access_token), the adapter also auto-creates a matching
DuckDB Azure secret so delta_scan() reads work, no extra config.
onelake:
type: duckrun
schema: dbo
root_path: "abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse>.Lakehouse/Tables"
storage_options:
bearer_token: "{{ env_var('ONELAKE_TOKEN') }}" # az account get-access-token --resource https://storage.azure.com
use_fabric_endpoint: "true"
Tables are written as root_path/<schema>/<model> (e.g. …/Tables/dbo/orders).
Verified end-to-end against real Fabric OneLake: table overwrite, incremental merge,
and delta_scan reads/tests.
Materializations
| materialized | backed by | notes |
|---|---|---|
table |
Delta (overwrite) | DuckDB runs the SQL, delta_rs writes the table fresh each run. |
incremental |
Delta (merge / append) | First run overwrites; later runs apply incremental_strategy. |
view |
in-memory DuckDB | Ephemeral staging within a run (inherited from dbt-duckdb). |
seed |
in-memory DuckDB | CSV fixtures (inherited). |
delta |
Delta | Alias for table; honors incremental=true. Kept for convenience. |
The persisted materializations (table, incremental, delta) write to
<root_path>/<schema>/<model> by default, or to config(location=...), and register a
delta_scan view so downstream ref() works.
table
-- models/orders.sql
{{ config(materialized='table') }}
select status, count(*) as n, sum(amount) as total
from {{ ref('stg_orders') }}
group by status
incremental
{{ config(materialized='incremental', unique_key='order_id', incremental_strategy='merge') }}
select * from {{ ref('stg_orders') }}
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
First run (or --full-refresh, or missing table) overwrites. Later runs apply
incremental_strategy:
incremental_strategy |
behavior | requires |
|---|---|---|
merge (default w/ unique_key) |
upsert — update matched, insert new | unique_key |
insert |
insert only new keys (idempotent append / dedupe) | unique_key |
append (default w/o unique_key) |
blind append | — |
Config options (table / incremental / delta)
| option | description |
|---|---|
location |
Delta path. Defaults to <root_path>/<schema>/<id>. |
incremental_strategy |
merge | insert | append (incremental only). |
unique_key |
column(s) to merge on. |
partition_by |
Delta partition column(s). |
merge_schema |
allow schema evolution on write. |
storage_options |
per-model override forwarded to deltalake. |
Reading existing Delta tables as sources
sources:
- name: lake
tables:
- name: customers
meta:
plugin: duckrun
delta_table_path: 's3://bucket/lake/customers'
How it works
- dbt compiles your model SQL.
- The materialization stages it as a DuckDB view.
- A
dbt-duckdbplugin (store()hook) hands that relation to deltalake via the Arrow C-stream interface (__arrow_c_stream__) — no pyarrow — whichwrite_deltalake/DeltaTable.mergeconsume natively. - The model relation becomes a
delta_scanview over the new Delta table.
The adapter is a thin subclass of dbt-duckdb declaring dependencies=['duckdb'], so
view, seed, tests, etc. are inherited directly; table and incremental are
overridden to write Delta.
Development
The integration_tests/ directory is a small dbt project exercised by CI
(.github/workflows/integration.yml): dbt build (twice) against a local Delta
./warehouse — a seed, a view, a table, and an incremental model — where the
second build exercises the incremental merge. Verified to run with pyarrow not
installed, on the minimum supported duckdb and deltalake.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file duckrun-0.3.0.tar.gz.
File metadata
- Download URL: duckrun-0.3.0.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d998eedff7e65274cdc1261e86ec700f8b4e101c70200efe3dec1f374b1a033
|
|
| MD5 |
131ac950b3e6fb44564d0bb5f1d12ed3
|
|
| BLAKE2b-256 |
522a5ad112985bcdde00519dc22a7c044850a688319095871565aa4e7aa940a8
|
File details
Details for the file duckrun-0.3.0-py3-none-any.whl.
File metadata
- Download URL: duckrun-0.3.0-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4cbe3eb6811f7c683a5cdc7ccd506b1b423edefe5343633082705414b7eecad7
|
|
| MD5 |
ea1887d21604a51d88ad973b356ddad2
|
|
| BLAKE2b-256 |
3996cb733fd5ef7ebbe8e22589ff30c3e93f5dc09707d9a7f9d317eef5a8f055
|