Skip to main content

The presto adpter plugin for dbt (data build tool)

Project description

dbt-presto

Documentation

For more information on using Presto with dbt, consult the dbt documentation:

Installation

This plugin can be installed via pip:

$ pip install dbt-presto

Configuring your profile

A dbt profile can be configured to run against Presto using the following configuration:

Option Description Required? Example
method The Presto authentication method to use Optional (default is none) none or kerberos
user Username for authentication Required drew
password Password for authentication Optional (required if method is ldap or kerberos) none or abc123
http_headers HTTP Headers to send alongside requests to Presto, specified as a yaml dictionary of (header, value) pairs. Optional X-Presto-Routing-Group: my-cluster
http_scheme The HTTP scheme to use for requests to Presto Optional (default is http, or https for method: kerberos and method: ldap) https or http
database Specify the database to build models into Required analytics
schema Specify the schema to build models into. Note: it is not recommended to use upper or mixed case schema names Required dbt_drew
host The hostname to connect to Required 127.0.0.1
port The port to connect to the host on Required 8080
threads How many threads dbt should use Optional (default is 1) 8

Example profiles.yml entry:

my-presto-db:
  target: dev
  outputs:
    dev:
      type: presto
      user: drew
      host: 127.0.0.1
      port: 8080
      database: analytics
      schema: dbt_drew
      threads: 8

Usage Notes

Supported Functionality

Due to the nature of Presto, not all core dbt functionality is supported. The following features of dbt are not implemented on Presto:

  • Archival
  • Incremental models

Also, note that upper or mixed case schema names will cause catalog queries to fail. Please only use lower case schema names with this adapter.

If you are interested in helping to add support for this functionality in dbt on Presto, please open an issue!

Required configuration

dbt fundamentally works by dropping and creating tables and views in databases. As such, the following Presto configs must be set for dbt to work properly on Presto:

hive.metastore-cache-ttl=0s
hive.metastore-refresh-interval = 5s
hive.allow-drop-table=true
hive.allow-rename-table=true

Use table properties to configure connector specifics

Trino/Presto connectors use table properties to configure connector specifics.

Check the Presto/Trino connector documentation for more information.

{{
  config(
    materialized='table',
    properties={
      "format": "'PARQUET'",
      "partitioning": "ARRAY['bucket(id, 2)']",
    }
  )
}}

Reporting bugs and contributing code

  • Want to report a bug or request a feature? Let us know on Slack, or open an issue.

Running tests

Build dbt container locally:

./docker/dbt/build.sh

Run a Presto server locally:

./docker/init.bash

If you see errors while about "inconsistent state" while bringing up presto, you may need to drop and re-create the public schema in the hive metastore:

# Example error

Initialization script hive-schema-2.3.0.postgres.sql
Error: ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

Solution: Drop (or rename) the public schema to allow the init script to recreate the metastore from scratch. Only run this against a test Presto deployment. Do not run this in production!

-- run this against the hive metastore (port forwarded to 10005 by default)
-- DO NOT RUN THIS IN PRODUCTION!

drop schema public cascade;
create schema public;

You probably should be slightly less reckless than this.

Run tests against Presto:

./docker/run_tests.bash

Run the locally-built docker image (from docker/dbt/build.sh):

export DBT_PROJECT_DIR=$HOME/... # wherever the dbt project you want to run is
docker run -it --mount "type=bind,source=$HOME/.dbt/,target=/home/dbt_user/.dbt" --mount="type=bind,source=$DBT_PROJECT_DIR,target=/usr/app" --network dbt-net dbt-presto /bin/bash

Code of Conduct

Everyone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the PyPA Code of Conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt-presto-0.21.1rc1.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_presto-0.21.1rc1-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file dbt-presto-0.21.1rc1.tar.gz.

File metadata

  • Download URL: dbt-presto-0.21.1rc1.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for dbt-presto-0.21.1rc1.tar.gz
Algorithm Hash digest
SHA256 1345436866e421953fc2ee83edc6a2b5a0e8837b90f2c2ad373d744ae0c0d8cd
MD5 bf40b341f090a3b9ce7118b9c31a338b
BLAKE2b-256 947bc88cf0c53e9808c97296b2435c5d725fd60f63649a4ecaa8de3f894397c5

See more details on using hashes here.

File details

Details for the file dbt_presto-0.21.1rc1-py3-none-any.whl.

File metadata

  • Download URL: dbt_presto-0.21.1rc1-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for dbt_presto-0.21.1rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 204b86f02128ae6b8063547d1afb45b57afa00c4c48f03110bb09fd73b77df80
MD5 cc42a9d18b0f84f2e5e23d329f3e9811
BLAKE2b-256 86b2ee8d37bbfd45889f75295571b27b19f9e15bdae4c2a9a3ef32620743c7fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page