dbt-duckdb·PyPI

The duckdb adapter plugin for dbt (data build tool)

Project description

dbt-duckdb

DuckDB is an embedded database, similar to SQLite, but designed for OLAP-style analytics. It is crazy fast and allows you to read and write data stored in CSV and Parquet files directly, without requiring you to load them into the database first.

dbt is the best way to manage a collection of data transformations written in SQL or Python for analytics and data science. dbt-duckdb is the project that ties DuckDB and dbt together, allowing you to create a Modern Data Stack In A Box or a simple and powerful data lakehouse- no Java or Scala required.

Installation

This project is hosted on PyPI, so you should be able to install it and the necessary dependencies via:

pip3 install dbt-duckdb

The latest supported version targets dbt-core 1.3.x and duckdb version 0.5.x, but we work hard to ensure that newer versions of DuckDB will continue to work with the adapter as they are released. If you would like to use our new (and experimental!) support for persisting the tables that DuckDB creates to the AWS Glue Catalog, you should install dbt-duckdb[glue] in order to get the AWS dependencies as well.

Configuring Your Profile

A minimal dbt-duckdb profile only needs two settings, type and path:

default:
  outputs:
   dev:
     type: duckdb
     path: /tmp/dbt.duckdb
  target: dev

The path field should normally be the path to a local DuckDB file on your filesystem, but it can also be set equal to :memory: if you would like to run an in-memory only version of dbt-duckdb. Keep in mind that any models that you want to keep from the dbt run will need to be persisted using one of the external materialization strategies described below.

dbt-duckdb also supports standard profile settings including threads (to control how many concurrent models dbt will run at once) and schema (to control the default schema that models will be materialized in.)

DuckDB Extensions and Settings

As of version 1.2.3, you can load any supported DuckDB extensions by listing them in the extensions field in your profile. You can also set any additional DuckDB configuration options via the settings field, including options that are supported in any loaded extensions. For example, to be able to connect to S3 and read/write Parquet files using an AWS access key and secret, your profile would look something like this:

default:
  outputs:
    dev:
      type: duckdb
      path: /tmp/dbt.duckdb
      extensions:
        - httpfs
        - parquet
      settings:
        s3_region: my-aws-region
        s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
        s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
  target: dev

External Materializations and Sources

One of DuckDB's most powerful features is its ability to read and write CSV and Parquet files directly, without needing to import/export them from the database first. In dbt-duckdb, we support creating models that are backed by external files via the external materialization strategy:

{{ config(materialized='external', location='local/directory/file.parquet') }}
SELECT m.*, s.id IS NOT NULL as has_source_id
FROM {{ ref('upstream_model') }} m
LEFT JOIN {{ source('upstream', 'source') }} s USING (id)

Option	Default	Description
location	`{{ name }}.{{ format }}`	The path to write the external materialization to. See below for more details.
format	parquet	The format of the external file, either `parquet` or `csv`.
delimiter	,	For CSV files, the delimiter to use for fields.
glue_register	false	If true, try to register the file created by this model with the AWS Glue Catalog.
glue_database	default	The name of the AWS Glue database to register the model with.

If no location argument is specified, then the external file will be named after the model.sql (or model.py) file that defined it with an extension that matches the file format (either .parquet or .csv). By default, external materializations are created relative to the current working directory, but you can change the default directory (or S3 bucket/prefix) by specifying the external_root setting in your DuckDB profile:

default:
  outputs:
    dev:
      type: duckdb
      path: /tmp/dbt.duckdb
      extensions:
        - httpfs
        - parquet
      settings:
        s3_region: my-aws-region
        s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
        s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
      external_root: "s3://my-bucket/my-prefix-path/"
  target: dev

dbt-duckdb also includes support for referencing external CSV and Parquet files as dbt sources via the external_location meta option:

sources:
  - name: external_source
    meta:
      external_location: "s3://my-bucket/my-sources/{name}.parquet"
    tables:
      - name: source1
      - name: source2

Here, the meta options on external_source defines external_location as an f-string that allows us to express a pattern that indicates the location of any of the tables defined for that source. So a dbt model like:

SELECT *
FROM {{ source('external_source', 'source1') }}

will be compiled as:

SELECT *
FROM 's3://my-bucket/my-sources/source1.parquet'

If one of the source tables deviates from the pattern or needs some other special handling, then the external_location can also be set on the meta options for the table itself, for example:

sources:
  - name: external_source
    meta:
      external_location: "s3://my-bucket/my-sources/{name}.parquet"
    tables:
      - name: source1
      - name: source2
        meta:
          external_location: "read_parquet(['s3://my-bucket/my-sources/source2a.parquet', 's3://my-bucket/my-sources/source2b.parquet'])"

Python Support

dbt added support for Python models in version 1.3.0. For most data platforms, dbt will package up the Python code defined in a .py file and ship it off to be executed in whatever Python environment that data platform supports. However, in dbt-duckdb, the local machine is the data platform, and so we support executing any Python code that will run on your machine via an exec call. The value of the dbt.ref and dbt.source functions will be a DuckDB Relation object that can be easily converted into a Pandas DataFrame or Arrow table, and the return value of the def models function can be either a DuckDB Relation, a Pandas DataFrame, or an Arrow Table.

Roadmap

Things that we would like to add in the near future:

Support for Delta and Iceberg external table formats (both as sources and destinations)
Make dbt's incremental models and snapshots work with external materializations
Make AWS Glue registration a first-class concept and add support for Snowflake/BigQuery registrations

Project details

Release history Release notifications | RSS feed

1.9.4

Jun 25, 2025

1.9.3

Apr 21, 2025

1.9.2

Feb 12, 2025

1.9.1 yanked

Dec 4, 2024

Reason this release was yanked:

Incorrectly marked as compatible with Python 3.8

1.9.0

Oct 7, 2024

1.8.4

Sep 27, 2024

1.8.3

Aug 19, 2024

1.8.2

Jul 24, 2024

1.8.1

May 30, 2024

1.8.0

May 10, 2024

1.7.5

May 8, 2024

1.7.4

Apr 17, 2024

1.7.3

Mar 7, 2024

1.7.2

Feb 22, 2024

1.7.1

Jan 13, 2024

1.7.0

Nov 6, 2023

1.6.2

Oct 26, 2023

1.6.1

Oct 14, 2023

1.6.0

Aug 2, 2023

1.5.2

Jun 22, 2023

1.5.1

May 5, 2023

1.5.0

Apr 27, 2023

1.5.0rc1 pre-release

Apr 19, 2023

1.4.2

Apr 26, 2023

1.4.1

Mar 16, 2023

1.4.0

Feb 14, 2023

This version

1.3.4

Jan 10, 2023

1.3.3

Dec 5, 2022

1.3.2

Nov 16, 2022

1.3.1

Nov 7, 2022

1.3.0

Nov 7, 2022

1.2.3

Oct 24, 2022

1.2.2

Oct 5, 2022

1.2.1

Oct 2, 2022

1.2.0

Sep 23, 2022

1.1.4

Jul 6, 2022

1.1.3

Jun 30, 2022

1.1.2

Jun 29, 2022

1.1.1

Apr 6, 2022

1.1.0

Apr 6, 2022

1.0.2

Jan 12, 2022

1.0.1

Jan 10, 2022

1.0.0

Jan 6, 2022

0.18.1

Nov 6, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt-duckdb-1.3.4.tar.gz (24.6 kB view details)

Uploaded Jan 10, 2023 Source

Built Distribution

dbt_duckdb-1.3.4-py3-none-any.whl (28.6 kB view details)

Uploaded Jan 10, 2023 Python 3

File details

Details for the file dbt-duckdb-1.3.4.tar.gz.

File metadata

Download URL: dbt-duckdb-1.3.4.tar.gz
Upload date: Jan 10, 2023
Size: 24.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for dbt-duckdb-1.3.4.tar.gz
Algorithm	Hash digest
SHA256	`5e53e7051e651d5c48ad3bcb587e83655c7107dc305981e6d578228daafc09ac`
MD5	`98ca4ee047a83763b6a3fef73bae64a8`
BLAKE2b-256	`aced6a036ab483f4092faa573408ee62e8368c91b353670e352e85aa75373de4`

See more details on using hashes here.

File details

Details for the file dbt_duckdb-1.3.4-py3-none-any.whl.

File metadata

Download URL: dbt_duckdb-1.3.4-py3-none-any.whl
Upload date: Jan 10, 2023
Size: 28.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for dbt_duckdb-1.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5465b0a63f8b70e1ece7bfc342320653986abcaa9dd593ee1525498f15723c3e`
MD5	`ea643814137ea8bc376982d3c831fa10`
BLAKE2b-256	`546c34b04c5c7a8b43ed68d99e2bbe502fa90e6771bdfb620696bd1e7df53851`

See more details on using hashes here.

dbt-duckdb 1.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

dbt-duckdb

Installation

Configuring Your Profile

DuckDB Extensions and Settings

External Materializations and Sources

Python Support

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes