duckrun

A dbt adapter that runs SQL in DuckDB and materializes to Delta Lake (delta_rs).

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Project description

Disclaimer: This is a personal project, built and maintained in my own time. It is not affiliated with, endorsed by, or supported by any employer or vendor. No warranty — use it at your own risk.

duckrun is a dbt adapter that runs your model SQL in DuckDB and writes the results to Delta Lake using delta_rs (the deltalake Python package). duckrun itself is just glue — it owns none of the heavy lifting. The real work is done by DuckDB (executes the SQL), delta-rs (writes the Delta table), Arrow (the zero-copy (kind of) bridge that hands query results from DuckDB to delta-rs), and dbt (orchestrates the DAG). DuckDB is here for convenience as the SQL engine; the materialization is all delta-rs and Arrow.

It is a thin wrapper around dbt-duckdb. You keep everything dbt-duckdb gives you — views, seeds, sources, tests, snapshots, the full plugin ecosystem — and gain one extra thing: a Delta-backed table / incremental materialization that writes real Delta tables.

The design rationale — why delta_rs and not DuckDB's native Delta writer, why Delta and not Iceberg, why a separate adapter — lives in docs/design_document.md.

How it fits together

DuckDB is a great query engine, Delta Lake is a great open table format, and dbt is the right tool to orchestrate the DAG. duckrun wires the three together:

DuckDB executes · delta_rs materializes · dbt orchestrates.

Install

pip install duckrun

That single install pulls in dbt-duckdb (and therefore duckdb) plus deltalake.

Configure your profile

# ~/.dbt/profiles.yml
my_project:
  target: dev
  outputs:
    dev:
      type: duckrun
      # No `threads:` needed — duckrun always runs single-threaded.
      # DuckDB runs in-memory by default — the Delta tables are the only state.
      # Default Delta location for models that don't set config(location=...).
      root_path: './warehouse'   # local path, or s3://..., gs://..., abfss://...
      # storage_options: {}      # passed through to deltalake for remote stores

Persisted models are written to <root_path>/<schema>/<model> (e.g. ./warehouse/dbo/orders), or to an explicit config(location=...).

Fabric Lakehouse without a schema

A schema-less Lakehouse (tables straight under Tables/, no Tables/<schema>/ grouping) is a bad pattern — you lose the namespace that keeps a warehouse organized — but if you're stuck with one, no special config is needed. Drop the trailing Tables from root_path and let the schema fill that slot:

      schema: Tables
      root_path: "abfss://<ws>@onelake.dfs.fabric.microsoft.com/<lh>.Lakehouse"

Since models are written to <root_path>/<schema>/<model>, this lands them at <lh>.Lakehouse/Tables/<model> — exactly the flat layout the schema-less Lakehouse expects. Prefer a schema-enabled Lakehouse (root_path: .../Tables, real schemas) whenever you can.

Remote stores (S3 / GCS / ADLS)

Point root_path at the warehouse location and pass credentials through storage_options — these flow straight to deltalake for writes and merges.

On Azure-backed stores, if storage_options carries a bearer_token (or token / access_token), the adapter also auto-creates a matching DuckDB Azure secret, so delta_scan() reads work with no extra config. In a notebook where the storage secret is already provided to DuckDB, you can leave storage_options empty.

    remote:
      type: duckrun
      schema: dbo
      root_path: "s3://my-bucket/warehouse"   # or abfss://... , gs://...
      storage_options:
        aws_access_key_id: "{{ env_var('AWS_ACCESS_KEY_ID') }}"
        aws_secret_access_key: "{{ env_var('AWS_SECRET_ACCESS_KEY') }}"

Verified end-to-end against real remote object storage: table overwrite, incremental merge, and delta_scan reads / tests.

Materializations

materialized	backed by	notes
`table`	Delta (overwrite)	DuckDB runs the SQL; delta_rs writes the table fresh each run.
`incremental`	Delta (merge / append)	First run overwrites; later runs apply `incremental_strategy`.
`view`	in-memory DuckDB	Ephemeral staging within a run (inherited from dbt-duckdb).
`seed`	in-memory DuckDB	CSV fixtures (inherited from dbt-duckdb).
`delta`	Delta	Alias for `table`; honors `incremental=true`. Kept for convenience.

The persisted materializations (table, incremental, delta) register a delta_scan view over the new Delta table, so downstream ref() works.

`table`

-- models/orders.sql
{{ config(materialized='table') }}

select status, count(*) as n, sum(amount) as total
from {{ ref('stg_orders') }}
group by status

`incremental`

{{ config(materialized='incremental', unique_key='order_id', incremental_strategy='merge') }}

select * from {{ ref('stg_orders') }}
{% if is_incremental() %}
  where updated_at > (select max(updated_at) from {{ this }})
{% endif %}

The first run (or --full-refresh, or a missing table) overwrites. Later runs apply the incremental_strategy:

`incremental_strategy`	behavior	requires
`merge` (default with `unique_key`)	upsert — update matched, insert new	`unique_key`
`insert`	insert only new keys (idempotent append)	`unique_key`
`append` (default without `unique_key`)	blind append	—
`safeappend`	append, but only if the table is unchanged since the model read it (else fail) — cheap, no dedup scan	—

`safeappend`

A cheap append for the common "load only what's new" pattern — when your model SQL already guarantees no duplicates and you don't want to pay for a merge.

{{ config(materialized='incremental', incremental_strategy='safeappend') }}

select * from read_csv(getvariable('new_files'))
{% if is_incremental() %}
  -- the dedup is your SQL's job: only load files not already in the table
  where file not in (select distinct file from {{ this }})
{% endif %}

Why, reason 1 — performance. merge / insert scan the target and join on the key to find what's new — expensive on a large table. If the SQL above already excludes rows that are present, that work is redundant. safeappend is a plain append: no target data scan, no key join, and DuckDB keeps its full memory budget (the merge memory split is never applied — same as append / overwrite). The only thing it reads from the target is one Delta log entry to get the version.

Why, reason 2 — a concurrency guard a blind append doesn't have. Because the dedup is done in SQL against {{ this }}, a plain append is unsafe under concurrency: if another writer commits between your not in (... from {{ this }}) read and your write, the file it added isn't excluded and you get a duplicate. safeappend closes that gap — it commits only if the table version is unchanged since the model started (captured before it reads {{ this }}); if anything committed in between, it fails with CommitFailedError so the run re-runs against the new state. No duplicate slips in.

This is optimistic concurrency control — it never locks the table or blocks other writers; it appends, then validates at commit with a compare-and-swap on the version and aborts on a mismatch. Its policy is the strictest of the strategies (abort on any concurrent change, rather than reconcile like merge or auto-rebase like append), but the mechanism is optimistic, not pessimistic. Re-running is safe and idempotent: the SQL dedup simply excludes whatever the previous attempt already loaded.

First run (or --full-refresh, or a missing table) overwrites to create the table; safeappend applies on later runs. A real example is the AEMO fct_scada model — the project's largest table, which loads only not-yet-seen files and so uses safeappend instead of an expensive merge.

Config options (`table` / `incremental` / `delta`)

option	description
`location`	Delta path. Defaults to `<root_path>/<schema>/<id>`.
`incremental_strategy`	`merge` \| `insert` \| `append` \| `safeappend` (incremental only).
`unique_key`	column(s) to merge on.
`merge_update_columns`	merge: update only these columns on match (others untouched).
`merge_exclude_columns`	merge: update all columns except these on match.
`merge_max_spill_size`	merge: memory ceiling in bytes for delta_rs's merge pool (not a disk budget). Defaults to ~60% of the effective limit — `min(physical RAM, container/cgroup limit, currently-free RAM)` — beyond which delta_rs spills the merge join to disk (like DuckDB's `memory_limit`). The other big consumer, DuckDB itself, is separately pinned to ~30% of the same effective limit on the merge path (it produces the merge source in the same process), so the two budgets sum under the cgroup cap; both log their chosen value at run start. Set `0` to disable. It bounds the merge pool, not the whole process (the Arrow source, read buffers, and spill-file page cache sit outside it), so on a tight container with a huge source the total can still exceed the cap — lower it if needed. A cap below the join's minimum (~hundreds of MB) makes the merge raise `Resources exhausted` instead of spilling. Requires deltalake 1.5.0 (pinned).
`incremental_predicates`	merge: extra predicates AND-ed into the merge condition (use `target.`/`source.`, or dbt's `DBT_INTERNAL_DEST`/`DBT_INTERNAL_SOURCE`).
`on_schema_change`	`ignore` (default) \| `append_new_columns` \| `fail`. (`sync_all_columns` only adds — delta_rs can't drop columns.)
`partition_by`	Delta partition column(s).
`merge_schema`	allow schema evolution on write.
`storage_options`	per-model override forwarded to deltalake.

Reading existing tables/files as sources

A source routed to the duckrun plugin can be a Delta table, a CSV, or a Parquet file. delta_table_path always reads Delta; otherwise the path comes from location and the format is taken from format (csv | parquet | delta) or inferred from the extension.

sources:
  - name: lake
    tables:
      - name: customers           # Delta table
        meta:
          plugin: duckrun
          delta_table_path: 's3://bucket/lake/customers'
      - name: events              # CSV (read_csv_auto)
        meta:
          plugin: duckrun
          format: csv
          location: 's3://bucket/raw/events.csv'
      - name: metrics             # Parquet
        meta:
          plugin: duckrun
          format: parquet
          location: 's3://bucket/raw/metrics.parquet'

How it works

dbt compiles your model SQL.
The materialization stages it as a DuckDB view.
A dbt-duckdb plugin (a store() hook) hands that relation to deltalake over the Arrow C-stream interface (__arrow_c_stream__) — no pyarrow required — which write_deltalake / DeltaTable.merge consume natively.
The model relation becomes a delta_scan view over the new Delta table.

The adapter is a thin subclass of dbt-duckdb declaring dependencies=['duckdb'], so view, seed, tests, and the rest are inherited directly; only table and incremental are overridden to write Delta.

Table maintenance (compaction & vacuum)

duckrun maintains your Delta tables automatically — no configuration, no scheduled job, no separate OPTIMIZE/VACUUM run to remember. It happens inline on every write.

This matters because delta_rs has no automatic, post-commit maintenance of its own — and it ignores Databricks-style auto-optimize table properties (delta.autoOptimize.*). Left alone, an incremental table fragments into many small Parquet files and keeps every superseded file version forever. duckrun runs the maintenance for you, right after each write:

write	maintenance
`table` / overwrite	`vacuum` + metadata cleanup every run
`append`	`optimize.compact` + `vacuum` + cleanup once the table exceeds 100 files
`merge` / `insert`	same threshold-gated `compact` + `vacuum` + cleanup after the merge
`microbatch` / delete+insert	same threshold-gated maintenance

Every vacuum uses delta_rs's safe default retention (7 days / 168h), so files a concurrent reader might still be reading are never deleted out from under it. The trade-off is that a superseded file version lingers for the retention window before it can be reclaimed — duckrun favors read-safety over immediate disk savings.

Connection API (notebook)

Besides the dbt adapter, duckrun ships a storage-neutral, PySpark-shaped duckrun.connect() for interactive/notebook use (local, S3, GCS, ADLS, OneLake). conn.sql(...) is read-only (including time travel — delta_scan('…', version => N)); writes go through the Spark surface: a DataFrame with .write…saveAsTable() (modes overwrite / append / safeappend / ignore) and a DeltaTable handle (conn.delta_table(name) / DeltaTable.forName) with .merge(...), .delete(), .update(), .replaceWhere(), .version(), plus conn.read and conn.catalog.

merge is snapshot-pinned by default — Spark's single-snapshot MERGE, with no extra arguments: the target version is captured and the commit is validated against it, so a concurrent writer fails the commit loudly instead of silently interleaving. mode("safeappend") is the same optimistic, fail-loud append as the dbt safeappend strategy: it commits only if the table is unchanged since the call, else raises CommitFailedError.

import duckrun
conn = duckrun.connect("abfss://ws@onelake.dfs.fabric.microsoft.com/lh.Lakehouse/Tables/dbo")
conn.sql("select * from orders").write.mode("overwrite").saveAsTable("orders_copy")
conn.table("orders_copy").show()

conn.delta_table("orders").delete("region = 'eu'")   # delete / update / replaceWhere

# upsert — pinned automatically, nothing to pass
src = conn.sql("select * from updates")
conn.delta_table("orders").merge(src, "target.id = source.id") \
    .whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

See docs/connection-api.md for the full per-method scorecard.

Building with an AI assistant

duckrun ships a guide for AI coding assistants so they get the adapter's defaults right (several differ from other dbt adapters). If you use Claude Code, install it once and it loads on demand when you ask a duckrun question:

/plugin marketplace add djouallah/duckrun
/plugin install duckrun-projects@duckrun

Using a different assistant (Cursor, Copilot, Codex, …) or just have the repo checked out? It reads the AGENTS.md at the repo root automatically, which points to the full guide in plugins/duckrun-projects/skills/duckrun-projects/SKILL.md. None of this is required to use duckrun — pip install duckrun is unaffected.

Docs & test results

Doc	What's in it
Design document	Why delta_rs (not DuckDB's native Delta writer), why Delta (not Iceberg), why a separate adapter.
Connection API	The `duckrun.connect()` notebook API + the live per-method scorecard.
dbt adapter conformance	Official `dbt-tests-adapter` results, regenerated on every push to `main`.
Incremental MERGE benchmark	~120M-row TPCH merge / append / overwrite scorecard — the release gate.

Testing. tests/integration_tests/aemo/ is a small dbt project built against OneLake, and tests/integration_tests/coffee/ is the connection-API coffee-shop scenario / stress test (CI: integration.yml); tests/conformance/ runs the official suite (above); tests/correctness/ proves the concurrency guarantees. The cards in those docs are rendered live by CI, so they always reflect the latest main.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mimoune.djouallah

Release history Release notifications | RSS feed

0.3.24

Jun 23, 2026

0.3.23

Jun 23, 2026

0.3.22

Jun 23, 2026

0.3.21

Jun 22, 2026

0.3.20

Jun 22, 2026

0.3.19

Jun 21, 2026

0.3.18

Jun 21, 2026

0.3.17

Jun 21, 2026

0.3.17.dev7 pre-release

Jun 18, 2026

0.3.17.dev6 pre-release

Jun 17, 2026

0.3.17.dev5 pre-release

Jun 15, 2026

0.3.17.dev4 pre-release

Jun 15, 2026

This version

0.3.17.dev3 pre-release

Jun 14, 2026

0.3.17.dev2 pre-release

Jun 14, 2026

0.3.17.dev1 pre-release

Jun 14, 2026

0.3.16

Jun 12, 2026

0.3.15

Jun 11, 2026

0.3.14

Jun 10, 2026

0.3.13

Jun 9, 2026

0.3.12

Jun 8, 2026

0.3.11

Jun 8, 2026

0.3.9

Jun 8, 2026

0.3.8

Jun 8, 2026

0.3.7

Jun 7, 2026

0.3.6

Jun 7, 2026

0.3.5

Jun 7, 2026

0.3.4

Jun 7, 2026

0.3.3

Jun 6, 2026

0.3.2

Jun 6, 2026

0.3.1

Jun 6, 2026

0.3.0

Jun 5, 2026

0.2.26

Jan 12, 2026

0.2.25

Jan 12, 2026

0.2.24

Jan 12, 2026

0.2.23

Jan 12, 2026

0.2.22.dev2 pre-release

Nov 24, 2025

0.2.22.dev0 pre-release

Nov 22, 2025

0.2.21

Nov 21, 2025

0.2.21.dev2 pre-release

Nov 21, 2025

0.2.21.dev1 pre-release

Nov 21, 2025

0.2.20

Nov 17, 2025

0.2.20.dev5 pre-release

Nov 16, 2025

0.2.20.dev3 pre-release

Nov 16, 2025

0.2.20.dev2 pre-release

Nov 15, 2025

0.2.20.dev1 pre-release

Nov 15, 2025

0.2.20.dev0 pre-release

Nov 15, 2025

0.2.19

Nov 12, 2025

0.2.19.dev8 pre-release

Nov 11, 2025

0.2.19.dev7 pre-release

Nov 10, 2025

0.2.19.dev6 pre-release

Nov 10, 2025

0.2.19.dev5 pre-release

Nov 10, 2025

0.2.19.dev4 pre-release

Nov 10, 2025

0.2.19.dev3 pre-release

Nov 9, 2025

0.2.19.dev2 pre-release

Nov 9, 2025

0.2.19.dev1 pre-release

Nov 5, 2025

0.2.19.dev0 pre-release

Nov 5, 2025

0.2.18

Nov 4, 2025

0.2.18.dev5 pre-release

Nov 3, 2025

0.2.18.dev4 pre-release

Nov 3, 2025

0.2.18.dev3 pre-release

Nov 2, 2025

0.2.18.dev2 pre-release

Nov 2, 2025

0.2.18.dev1 pre-release

Nov 2, 2025

0.2.17

Nov 1, 2025

0.2.16.dev2 pre-release

Nov 1, 2025

0.2.16.dev1 pre-release

Nov 1, 2025

0.2.16.dev0 pre-release

Nov 1, 2025

0.2.15

Oct 31, 2025

0.2.14.dev40 pre-release

Oct 31, 2025

0.2.14.dev3 pre-release

Oct 30, 2025

0.2.14.dev2 pre-release

Oct 30, 2025

0.2.14.dev1 pre-release

Oct 30, 2025

0.2.14.dev0 pre-release

Oct 30, 2025

0.2.13

Oct 22, 2025

0.2.13.dev0 pre-release

Oct 22, 2025

0.2.12

Oct 21, 2025

0.2.11

Oct 21, 2025

0.2.11.dev0 pre-release

Oct 21, 2025

0.2.10

Oct 16, 2025

0.2.9

Oct 15, 2025

0.2.7

Oct 14, 2025

0.2.6

Oct 14, 2025

0.2.4

Oct 9, 2025

0.2.3

Oct 9, 2025

0.2.2

Oct 8, 2025

0.2.1

Oct 7, 2025

0.2.0

Oct 7, 2025

0.1.9

Oct 6, 2025

0.1.8

Oct 6, 2025

0.1.7

Oct 5, 2025

0.1.6.3

Oct 5, 2025

0.1.6.2

Oct 5, 2025

0.1.6.1

Oct 5, 2025

0.1.6

Oct 5, 2025

0.1.5.6

Oct 5, 2025

0.1.5.5

Oct 5, 2025

0.1.5.4

Oct 5, 2025

0.1.5.3

Oct 5, 2025

0.1.5.2

Oct 4, 2025

0.1.5.1

Oct 4, 2025

0.1.5

Oct 4, 2025

0.1.4

Oct 4, 2025

0.1.3

Oct 4, 2025

0.1.2

Oct 4, 2025

0.1.1

Oct 3, 2025

0.0.0

Oct 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckrun-0.3.17.dev3.tar.gz (76.3 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

duckrun-0.3.17.dev3-py3-none-any.whl (78.2 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file duckrun-0.3.17.dev3.tar.gz.

File metadata

Download URL: duckrun-0.3.17.dev3.tar.gz
Upload date: Jun 14, 2026
Size: 76.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for duckrun-0.3.17.dev3.tar.gz
Algorithm	Hash digest
SHA256	`a388d385936e0cbd82b5b198bc88783d277c46d3a95d46bf3f8277974b471064`
MD5	`8c5662740a5e852620d67bed38609389`
BLAKE2b-256	`123682721e5362678a004591be33b8dd367dc63d3e1115924c96b72f07225954`

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckrun-0.3.17.dev3.tar.gz:

Publisher: publish.yml on djouallah/duckrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: duckrun-0.3.17.dev3.tar.gz
- Subject digest: a388d385936e0cbd82b5b198bc88783d277c46d3a95d46bf3f8277974b471064
- Sigstore transparency entry: 1820751170
- Sigstore integration time: Jun 14, 2026
Source repository:
- Permalink: djouallah/duckrun@4766746d58e4750a20ee15f408b0289b94b46867
- Branch / Tag: refs/tags/v0.3.17.dev3
- Owner: https://github.com/djouallah
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4766746d58e4750a20ee15f408b0289b94b46867
- Trigger Event: push

File details

Details for the file duckrun-0.3.17.dev3-py3-none-any.whl.

File metadata

Download URL: duckrun-0.3.17.dev3-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 78.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for duckrun-0.3.17.dev3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`942751601a002e7fdf975516f89a18cfd9f343c4145971cc8d1e4952b48668b6`
MD5	`cdcffc13b169c6bfb9a9a80da0aeecc9`
BLAKE2b-256	`1539287ee7fa90fef842f6df42b5f3e7fe18c9c84a917b516ab71649dda989cb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckrun-0.3.17.dev3-py3-none-any.whl:

Publisher: publish.yml on djouallah/duckrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: duckrun-0.3.17.dev3-py3-none-any.whl
- Subject digest: 942751601a002e7fdf975516f89a18cfd9f343c4145971cc8d1e4952b48668b6
- Sigstore transparency entry: 1820751172
- Sigstore integration time: Jun 14, 2026
Source repository:
- Permalink: djouallah/duckrun@4766746d58e4750a20ee15f408b0289b94b46867
- Branch / Tag: refs/tags/v0.3.17.dev3
- Owner: https://github.com/djouallah
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4766746d58e4750a20ee15f408b0289b94b46867
- Trigger Event: push

duckrun 0.3.17.dev3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

How it fits together

Install

Configure your profile

Fabric Lakehouse without a schema

Remote stores (S3 / GCS / ADLS)

Materializations

table

incremental

safeappend

Config options (table / incremental / delta)

Reading existing tables/files as sources

How it works

Table maintenance (compaction & vacuum)

Connection API (notebook)

Building with an AI assistant

Docs & test results

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`table`

`incremental`

`safeappend`

Config options (`table` / `incremental` / `delta`)