Skip to main content

A dbt adapter that runs SQL in DuckDB and materializes to Delta Lake (delta_rs).

Project description

duckrun

PyPI Downloads Downloads/month Python License

Disclaimer: This is a personal project, built and maintained in my own time. It is not affiliated with, endorsed by, or supported by any employer or vendor. No warranty — use it at your own risk.

duckrun runs SQL in DuckDB and reads/writes Delta Lake via delta-rs — locally or on OneLake / S3 / GCS / ADLS. It's just glue: DuckDB executes · delta-rs materializes · Arrow bridges · dbt orchestrates. Two ways to use it:

  • connect() — a notebook helper to query and write Delta straight from SQL (this page);
  • a dbt adapter that materializes models as Delta tables.

Concurrent writers are first-class: every write is snapshot-pinned and fails loud rather than silently interleaving.

Install

pip install duckrun

In a Microsoft Fabric notebook, upgrade and restart the kernel (duckrun needs duckdb ≥ 1.5.4, which is newer than the bundled stable build; it fails loud at connect() otherwise):

!pip install duckrun --upgrade
notebookutils.session.restartPython()

Quickstart — OneLake in a notebook

import duckrun

# Read-only by default — explore a lakehouse safely, no chance of an accidental write.
# Use the workspace + lakehouse GUIDs (friendly names hit an upstream OneLake read bug for now).
conn = duckrun.connect("abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables/dbo")

conn.sql("SHOW TABLES").show()
conn.sql("select status, count(*) from orders group by status").show()
df = conn.table("orders").toPandas()          # or .toArrow() for a streaming RecordBatchReader

# Time travel: list the versions, then read one
from duckrun import DeltaTable
DeltaTable.forName(conn, "orders").history()   # newest-first commits: version, timestamp, operation, …
conn.read.format("delta").option("versionAsOf", 0).load(".../Tables/dbo/orders").show()

Need to write? Opt in with read_only=False:

conn = duckrun.connect("abfss://…/Tables/dbo", read_only=False)

# write Delta straight from SQL
conn.sql("select * from orders where amount > 0") \
    .write.mode("overwrite").saveAsTable("clean_orders")

# raw DML routes to delta-rs (insert / update / delete / create table as / alter / drop)
conn.sql("delete from clean_orders where amount = 0")

# upsert — snapshot-pinned automatically, nothing extra to pass
from duckrun import DeltaTable
src = conn.sql("select * from updates")
DeltaTable.forName(conn, "clean_orders").merge(src, "target.id = source.id") \
    .whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

conn.stop()

Multiple catalogs — attach more lakehouses and read/join across them by three-part name. In Fabric a Warehouse is just a write-locked Lakehouse, so attach it read_only=True next to a writable one:

conn.attach("abfss://…/warehouse.Warehouse/Tables", name="warehouse", read_only=True)
conn.attach("/data/reference", name="local")
conn.sql("select * from warehouse.mart.facts f join local.dbo.lookup l on l.id = f.id").show()

Works the same against a local path, s3://, gs://, or az://. Full method map: Connection API · Spark/Delta coverage · live multi-catalog demo.

dbt adapter

duckrun is also a dbt adapter — a thin wrapper around dbt-duckdb that adds Delta-backed table / incremental materializations (everything else dbt-duckdb gives you is inherited). Point a profile at a lakehouse and dbt run:

# ~/.dbt/profiles.yml
my_project:
  outputs:
    dev:
      type: duckrun
      root_path: "abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables"

Profiles, materializations, incremental strategies (incl. append_if_unchanged), sources, and automatic compaction/vacuum are all in docs/dbt-adapter.md.

Building with an AI assistant

duckrun ships a guide so AI coding assistants get the adapter's defaults right (several differ from other dbt adapters). For Claude Code:

/plugin marketplace add djouallah/duckrun
/plugin install duckrun-projects@duckrun

Other assistants read the AGENTS.md at the repo root, which points to the full guide. None of this is required to use duckrun.

Docs

Doc What's in it
Connection API The duckrun.connect() notebook API + the live per-method scorecard.
Spark / Delta coverage What the connect() surface maps to in PySpark / Delta.
dbt adapter Profiles, materializations, incremental strategies, sources, maintenance, limitations.
Design document Why delta-rs (not DuckDB's native Delta writer), why Delta (not Iceberg), why a separate adapter.
Snapshot isolation How a read-modify-write is fenced to the version you read, and how it compares to delta-rs/Spark/SQL Server.
dbt adapter conformance Official dbt-tests-adapter results, regenerated on every push to main.
Incremental MERGE benchmark ~120M-row TPCH merge / append / overwrite scorecard — the release gate.

License

MIT

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckrun-0.3.22.tar.gz (98.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duckrun-0.3.22-py3-none-any.whl (104.1 kB view details)

Uploaded Python 3

File details

Details for the file duckrun-0.3.22.tar.gz.

File metadata

  • Download URL: duckrun-0.3.22.tar.gz
  • Upload date:
  • Size: 98.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for duckrun-0.3.22.tar.gz
Algorithm Hash digest
SHA256 0c78024e052e8dcfcc99a9065280ff20a80cbe65f76dd4a63ad5ad7cfdc698c5
MD5 8bb87ebd557c6dcc211f3bb0eb7f960d
BLAKE2b-256 1fd2b17e130c417f624b2ab2c02fb24bd0d9518606f809b3e2c522f5cd12de9c

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckrun-0.3.22.tar.gz:

Publisher: publish.yml on djouallah/duckrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file duckrun-0.3.22-py3-none-any.whl.

File metadata

  • Download URL: duckrun-0.3.22-py3-none-any.whl
  • Upload date:
  • Size: 104.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for duckrun-0.3.22-py3-none-any.whl
Algorithm Hash digest
SHA256 8198c25c56280f0e7be14bf485e776740611b93a17947a3b27cf167ec72cb7c1
MD5 f417350bdeeecfd48f359dc7c1971e6e
BLAKE2b-256 621f0bf5447cce5021329a0e0169ca13ab1b7e018502a437d8be5b75710b8fa3

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckrun-0.3.22-py3-none-any.whl:

Publisher: publish.yml on djouallah/duckrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page