Skip to main content

A dbt adapter that runs SQL in DuckDB and materializes to Delta Lake (delta_rs).

Project description

duckrun

PyPI Downloads Downloads/month Python License

Disclaimer: This is a personal project, built and maintained in my own time. It is not affiliated with, endorsed by, or supported by any employer or vendor. No warranty — use it at your own risk.

duckrun runs SQL in DuckDB and reads/writes Delta Lake via delta-rs — locally or on OneLake / S3 / GCS / ADLS. It's just glue: DuckDB executes · delta-rs materializes · Arrow bridges · dbt orchestrates. Two ways to use it:

  • connect() — a notebook helper to query and write Delta straight from SQL (this page);
  • a dbt adapter that materializes models as Delta tables.

Concurrent writers are first-class: every write is snapshot-pinned and fails loud rather than silently interleaving.

Install

pip install duckrun

In a Microsoft Fabric notebook, upgrade and restart the kernel (duckrun needs duckdb ≥ 1.5.4, which is newer than the bundled stable build; it fails loud at connect() otherwise):

!pip install duckrun --upgrade
notebookutils.session.restartPython()

Quickstart — OneLake in a notebook

import duckrun

# Read-only by default — explore a lakehouse safely, no chance of an accidental write.
# Use the workspace + lakehouse GUIDs (friendly names hit an upstream OneLake read bug for now).
conn = duckrun.connect("abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables/dbo")

conn.sql("SHOW TABLES").show()
conn.sql("select status, count(*) from orders group by status").show()
df = conn.table("orders").toPandas()          # or .toArrow() for a streaming RecordBatchReader

# Time travel: list the versions, then read one
from duckrun import DeltaTable
DeltaTable.forName(conn, "orders").history()   # newest-first commits: version, timestamp, operation, …
conn.read.format("delta").option("versionAsOf", 0).load(".../Tables/dbo/orders").show()

Need to write? Opt in with read_only=False:

conn = duckrun.connect("abfss://…/Tables/dbo", read_only=False)

# write Delta straight from SQL
conn.sql("select * from orders where amount > 0") \
    .write.mode("overwrite").saveAsTable("clean_orders")

# raw DML routes to delta-rs (insert / update / delete / create table as / alter / drop)
conn.sql("delete from clean_orders where amount = 0")

# upsert — snapshot-pinned automatically, nothing extra to pass
from duckrun import DeltaTable
src = conn.sql("select * from updates")
DeltaTable.forName(conn, "clean_orders").merge(src, "target.id = source.id") \
    .whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

conn.stop()

Multiple catalogs — attach more lakehouses and read/join across them by three-part name. In Fabric a Warehouse is just a write-locked Lakehouse, so attach it read_only=True next to a writable one:

conn.attach("abfss://…/warehouse.Warehouse/Tables", name="warehouse", read_only=True)
conn.attach("/data/reference", name="local")
conn.sql("select * from warehouse.mart.facts f join local.dbo.lookup l on l.id = f.id").show()

Works the same against a local path, s3://, gs://, or az://. Full method map: Connection API · Spark/Delta coverage · live multi-catalog demo.

dbt adapter

duckrun is also a dbt adapter — a thin wrapper around dbt-duckdb that adds Delta-backed table / incremental materializations (everything else dbt-duckdb gives you is inherited). Point a profile at a lakehouse and dbt run:

# ~/.dbt/profiles.yml
my_project:
  outputs:
    dev:
      type: duckrun
      root_path: "abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables"

Profiles, materializations, incremental strategies (incl. append_if_unchanged), sources, and automatic compaction/vacuum are all in docs/dbt-adapter.md.

Building with an AI assistant

duckrun ships a guide so AI coding assistants get the adapter's defaults right (several differ from other dbt adapters). For Claude Code:

/plugin marketplace add djouallah/duckrun
/plugin install duckrun-projects@duckrun

Other assistants read the AGENTS.md at the repo root, which points to the full guide. None of this is required to use duckrun.

Docs

Doc What's in it
Connection API The duckrun.connect() notebook API + the live per-method scorecard.
Spark / Delta coverage What the connect() surface maps to in PySpark / Delta.
dbt adapter Profiles, materializations, incremental strategies, sources, maintenance, limitations.
Design document Why delta-rs (not DuckDB's native Delta writer), why Delta (not Iceberg), why a separate adapter.
Snapshot isolation How a read-modify-write is fenced to the version you read, and how it compares to delta-rs/Spark/SQL Server.
dbt adapter conformance Official dbt-tests-adapter results, regenerated on every push to main.
Incremental MERGE benchmark ~60M-row TPCH merge / append / overwrite scorecard — the release gate.

License

MIT

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckrun-0.3.23.tar.gz (98.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duckrun-0.3.23-py3-none-any.whl (104.0 kB view details)

Uploaded Python 3

File details

Details for the file duckrun-0.3.23.tar.gz.

File metadata

  • Download URL: duckrun-0.3.23.tar.gz
  • Upload date:
  • Size: 98.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for duckrun-0.3.23.tar.gz
Algorithm Hash digest
SHA256 fda34a43ed6193fb7d45245d874246d7e4f13dc1fd5d45d95ce6d47c3afb9a9f
MD5 ed0de0de9db969a2064c2cc901f9c113
BLAKE2b-256 c51046137af9fc9968100edd1de177a96f9983e152b623e7af93d071bf066bd4

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckrun-0.3.23.tar.gz:

Publisher: publish.yml on djouallah/duckrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file duckrun-0.3.23-py3-none-any.whl.

File metadata

  • Download URL: duckrun-0.3.23-py3-none-any.whl
  • Upload date:
  • Size: 104.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for duckrun-0.3.23-py3-none-any.whl
Algorithm Hash digest
SHA256 832aa6092093cc2ede0140a7149ba0ff118d2468bb29354d116a0df63ba73ed1
MD5 9b502fb4fbe32982f200785cda2bee35
BLAKE2b-256 7bf21c7bbbeed7f2edfeb20304539e019064f0c18abc7c61c19faeb74684c779

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckrun-0.3.23-py3-none-any.whl:

Publisher: publish.yml on djouallah/duckrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page