Skip to main content

A dbt adapter that runs SQL in DuckDB and materializes to Delta Lake (delta_rs).

Project description

duckrun

PyPI version

duckrun is a dbt adapter that runs your model SQL in DuckDB and materializes the results to Delta Lake using delta_rs (the deltalake Python package).

It is a thin layer on top of dbt-duckdb: you get everything dbt-duckdb offers (views, seeds, sources, tests, snapshots, the full plugin ecosystem) plus a delta materialization that writes real Delta tables — the one thing dbt-duckdb doesn't do out of the box. Delta writes use ZSTD compression and row-group sizing tuned for Power BI / DirectLake.

Note: versions ≤ 0.2.x of duckrun were a Microsoft Fabric/OneLake helper library. From 0.1.0 of this line onward, duckrun is a dbt adapter. The old API lives in git history on main.

Why

DuckDB is a fantastic query engine; Delta Lake is a great open table format. dbt is the right tool to orchestrate the DAG. duckrun simply wires the three together: DuckDB executes, delta_rs materializes, dbt orchestrates.

Install

pip install duckrun

That single install pulls dbt-duckdb (and therefore duckdb) and deltalake.

Configure your profile

# ~/.dbt/profiles.yml
my_project:
  target: dev
  outputs:
    dev:
      type: duckrun
      # DuckDB runs in-memory by default — the Delta tables are the only state.
      # Default Delta location for models that don't set config(location=...)
      root_path: './warehouse'  # local path, or abfss://.../Tables, s3://..., gs://...
      # storage_options: {}     # passed through to deltalake for remote stores

In a notebook where the storage secret is already provided to DuckDB, leave storage_options empty.

Remote stores (Microsoft Fabric OneLake / ADLS / S3 / GCS)

Point root_path at the warehouse location and pass credentials via storage_options — these flow straight to deltalake for writes/merges. If storage_options carries a bearer_token (or token / access_token), the adapter also auto-creates a matching DuckDB Azure secret so delta_scan() reads work, no extra config.

    onelake:
      type: duckrun
      schema: dbo
      root_path: "abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse>.Lakehouse/Tables"
      storage_options:
        bearer_token: "{{ env_var('ONELAKE_TOKEN') }}"   # az account get-access-token --resource https://storage.azure.com
        use_fabric_endpoint: "true"

Tables are written as root_path/<schema>/<model> (e.g. …/Tables/dbo/orders). Verified end-to-end against real Fabric OneLake: table overwrite, incremental merge, and delta_scan reads/tests.

Materializations

materialized backed by notes
table Delta (overwrite) DuckDB runs the SQL, delta_rs writes the table fresh each run.
incremental Delta (merge / append) First run overwrites; later runs apply incremental_strategy.
view in-memory DuckDB Ephemeral staging within a run (inherited from dbt-duckdb).
seed in-memory DuckDB CSV fixtures (inherited).
delta Delta Alias for table; honors incremental=true. Kept for convenience.

The persisted materializations (table, incremental, delta) write to <root_path>/<schema>/<model> by default, or to config(location=...), and register a delta_scan view so downstream ref() works.

table

-- models/orders.sql
{{ config(materialized='table') }}

select status, count(*) as n, sum(amount) as total
from {{ ref('stg_orders') }}
group by status

incremental

{{ config(materialized='incremental', unique_key='order_id', incremental_strategy='merge') }}

select * from {{ ref('stg_orders') }}
{% if is_incremental() %}
  where updated_at > (select max(updated_at) from {{ this }})
{% endif %}

First run (or --full-refresh, or missing table) overwrites. Later runs apply incremental_strategy:

incremental_strategy behavior requires
merge (default w/ unique_key) upsert — update matched, insert new unique_key
insert insert only new keys (idempotent append / dedupe) unique_key
append (default w/o unique_key) blind append

Config options (table / incremental / delta)

option description
location Delta path. Defaults to <root_path>/<schema>/<id>.
incremental_strategy merge | insert | append (incremental only).
unique_key column(s) to merge on.
partition_by Delta partition column(s).
merge_schema allow schema evolution on write.
storage_options per-model override forwarded to deltalake.

Reading existing Delta tables as sources

sources:
  - name: lake
    tables:
      - name: customers
        meta:
          plugin: duckrun
          delta_table_path: 's3://bucket/lake/customers'

How it works

  1. dbt compiles your model SQL.
  2. The materialization stages it as a DuckDB view.
  3. A dbt-duckdb plugin (store() hook) hands that relation to deltalake via the Arrow C-stream interface (__arrow_c_stream__) — no pyarrow — which write_deltalake / DeltaTable.merge consume natively.
  4. The model relation becomes a delta_scan view over the new Delta table.

The adapter is a thin subclass of dbt-duckdb declaring dependencies=['duckdb'], so view, seed, tests, etc. are inherited directly; table and incremental are overridden to write Delta.

Development

The integration_tests/ directory is a small dbt project exercised by CI (.github/workflows/integration.yml): dbt build (twice) against a local Delta ./warehouse — a seed, a view, a table, and an incremental model — where the second build exercises the incremental merge. Verified to run with pyarrow not installed, on the minimum supported duckdb and deltalake.

License

MIT

Project details


Release history Release notifications | RSS feed

This version

0.3.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckrun-0.3.0.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duckrun-0.3.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file duckrun-0.3.0.tar.gz.

File metadata

  • Download URL: duckrun-0.3.0.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for duckrun-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6d998eedff7e65274cdc1261e86ec700f8b4e101c70200efe3dec1f374b1a033
MD5 131ac950b3e6fb44564d0bb5f1d12ed3
BLAKE2b-256 522a5ad112985bcdde00519dc22a7c044850a688319095871565aa4e7aa940a8

See more details on using hashes here.

File details

Details for the file duckrun-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: duckrun-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for duckrun-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4cbe3eb6811f7c683a5cdc7ccd506b1b423edefe5343633082705414b7eecad7
MD5 ea1887d21604a51d88ad973b356ddad2
BLAKE2b-256 3996cb733fd5ef7ebbe8e22589ff30c3e93f5dc09707d9a7f9d317eef5a8f055

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page