Skip to main content

A dbt adapter that runs SQL in DuckDB and materializes to Delta Lake (delta_rs).

Project description

duckrun

PyPI Downloads Downloads/month Python License

Disclaimer: This is a personal project, built and maintained in my own time. It is not affiliated with, endorsed by, or supported by any employer or vendor. No warranty — use it at your own risk.

duckrun runs SQL in DuckDB and reads/writes Delta Lake via delta-rs — locally or on OneLake / S3 / GCS / ADLS. It's just glue: DuckDB executes · delta-rs materializes · Arrow bridges · dbt orchestrates. Two ways to use it:

  • connect() — a notebook helper to query and write Delta straight from SQL (this page);
  • a dbt adapter that materializes models as Delta tables.

Concurrent writers are first-class: every write is snapshot-pinned and fails loud rather than silently interleaving.

Install

pip install duckrun

In a Microsoft Fabric notebook, upgrade and restart the kernel (duckrun needs duckdb ≥ 1.5.4, which is newer than the bundled stable build; it fails loud at connect() otherwise):

!pip install duckrun --upgrade
notebookutils.session.restartPython()

Quickstart — OneLake in a notebook

import duckrun

# Read-only by default — explore a lakehouse safely, no chance of an accidental write.
# Use the workspace + lakehouse GUIDs (friendly names hit an upstream OneLake read bug for now).
conn = duckrun.connect("abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables/dbo")

conn.sql("SHOW TABLES").show()
conn.sql("select status, count(*) from orders group by status").show()
df = conn.table("orders").toPandas()          # or .toArrow() for a streaming RecordBatchReader

# Time travel: list the versions, then read one
from duckrun import DeltaTable
DeltaTable.forName(conn, "orders").history()   # newest-first commits: version, timestamp, operation, …
conn.read.format("delta").option("versionAsOf", 0).load(".../Tables/dbo/orders").show()

Need to write? Opt in with read_only=False:

conn = duckrun.connect("abfss://…/Tables/dbo", read_only=False)

# write Delta straight from SQL
conn.sql("select * from orders where amount > 0") \
    .write.mode("overwrite").saveAsTable("clean_orders")

# raw DML routes to delta-rs (insert / update / delete / create table as / alter / drop)
conn.sql("delete from clean_orders where amount = 0")

# upsert — snapshot-pinned automatically, nothing extra to pass
from duckrun import DeltaTable
src = conn.sql("select * from updates")
DeltaTable.forName(conn, "clean_orders").merge(src, "target.id = source.id") \
    .whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

conn.stop()

Multiple catalogs — attach more lakehouses and read/join across them by three-part name. In Fabric a Warehouse is just a write-locked Lakehouse, so attach it read_only=True next to a writable one:

conn.attach("abfss://…/warehouse.Warehouse/Tables", name="warehouse", read_only=True)
conn.attach("/data/reference", name="local")
conn.sql("select * from warehouse.mart.facts f join local.dbo.lookup l on l.id = f.id").show()

Works the same against a local path, s3://, gs://, or az://. Full method map: Connection API · Spark/Delta coverage · live multi-catalog demo.

dbt adapter

duckrun is also a dbt adapter — a thin wrapper around dbt-duckdb that adds Delta-backed table / incremental materializations (everything else dbt-duckdb gives you is inherited). Point a profile at a lakehouse and dbt run:

# ~/.dbt/profiles.yml
my_project:
  outputs:
    dev:
      type: duckrun
      root_path: "abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables"

Profiles, materializations, incremental strategies (incl. safeappend), sources, and automatic compaction/vacuum are all in docs/dbt-adapter.md.

Building with an AI assistant

duckrun ships a guide so AI coding assistants get the adapter's defaults right (several differ from other dbt adapters). For Claude Code:

/plugin marketplace add djouallah/duckrun
/plugin install duckrun-projects@duckrun

Other assistants read the AGENTS.md at the repo root, which points to the full guide. None of this is required to use duckrun.

Docs

Doc What's in it
Connection API The duckrun.connect() notebook API + the live per-method scorecard.
Spark / Delta coverage What the connect() surface maps to in PySpark / Delta.
dbt adapter Profiles, materializations, incremental strategies, sources, maintenance, limitations.
Design document Why delta-rs (not DuckDB's native Delta writer), why Delta (not Iceberg), why a separate adapter.
dbt adapter conformance Official dbt-tests-adapter results, regenerated on every push to main.
Incremental MERGE benchmark ~60M-row TPCH merge / append / overwrite scorecard — the release gate.

License

MIT

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckrun-0.3.21.tar.gz (92.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duckrun-0.3.21-py3-none-any.whl (98.4 kB view details)

Uploaded Python 3

File details

Details for the file duckrun-0.3.21.tar.gz.

File metadata

  • Download URL: duckrun-0.3.21.tar.gz
  • Upload date:
  • Size: 92.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for duckrun-0.3.21.tar.gz
Algorithm Hash digest
SHA256 126fee03e9baa01ed86d2e83c5387162edaba810d6a2fcb83d90988aa6972d58
MD5 3aebb0b0dd90125db50809f821c6dbf8
BLAKE2b-256 de33592bee2fd655937538e40a2923e9bcc45935f6bd38124cc811dc516530cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckrun-0.3.21.tar.gz:

Publisher: publish.yml on djouallah/duckrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file duckrun-0.3.21-py3-none-any.whl.

File metadata

  • Download URL: duckrun-0.3.21-py3-none-any.whl
  • Upload date:
  • Size: 98.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for duckrun-0.3.21-py3-none-any.whl
Algorithm Hash digest
SHA256 e99b75b68e5f43758df71329c275515b17f415c61fc654ba94d709a106bb33a7
MD5 eb440e200ffd58513b2e6a2e88463cb3
BLAKE2b-256 f04d9a69d8ca6afccfab511e15bd1159a4bf3a7273ec85cd94a4dbf4dfcc0161

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckrun-0.3.21-py3-none-any.whl:

Publisher: publish.yml on djouallah/duckrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page