The Teradata adapter plugin for dbt (data build tool)
Project description
dbt-teradata
This plugin ports dbt functionality to Teradata Vantage.
Installation
pip install dbt-teradata
If you are new to dbt on Teradata see dbt with Teradata Vantage tutorial.
Sample profile
Here is a working example of a dbt-teradata
profile:
my-teradata-db-profile:
target: dev
outputs:
dev:
type: teradata
host: localhost
user: dbc
password: dbc
schema: dbt_test
tmode: ANSI
At a minimum, you need to specify host
, user
, password
, schema
(database), tmode
.
Python compatibility
Plugin version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 |
---|---|---|---|---|---|
0.19.0.x | ✅ | ✅ | ✅ | ❌ | ❌ |
0.20.0.x | ✅ | ✅ | ✅ | ✅ | ❌ |
0.21.1.x | ✅ | ✅ | ✅ | ✅ | ❌ |
1.0.0.x | ❌ | ✅ | ✅ | ✅ | ❌ |
1.1.0.x | ❌ | ✅ | ✅ | ✅ | ✅ |
1.2.0.x | ❌ | ✅ | ✅ | ✅ | ✅ |
1.3.0.x | ❌ | ✅ | ✅ | ✅ | ✅ |
dbt dependent packages version compatibility
dbt-teradta | dbt-core | dbt-teradata-util | dbt-util |
---|---|---|---|
1.2.x | 1.2.x | 0.1.0 | 0.9.x or below |
Optional profile configurations
Logmech
The logon mechanism for Teradata jobs that dbt executes can be configured with the logmech
configuration in your Teradata profile. The logmech
field can be set to: TD2
, LDAP
, KRB5
, TDNEGO
. For more information on authentication options, go to Teradata Vantage authentication documentation.
my-teradata-db-profile:
target: dev
outputs:
dev:
type: teradata
host: <host>
user: <user>
password: <password>
schema: dbt_test
tmode: ANSI
logmech: LDAP
Logdata
The logon mechanism for Teradata jobs that dbt executes can be configured with the logdata
configuration in your Teradata profile. Addtional data like secure token, distinguished Name, or a domain/realm name can be set in your Teradata profile using logdata
. The logdata
field can be set to: JWT
, LDAP
, KRB5
, TDNEGO
. logdata
is not used with the TD2 mechanism.
my-teradata-db-profile:
target: dev
outputs:
dev:
type: teradata
host: <host>
schema: dbt_test
tmode: ANSI
logmech: LDAP
logdata: 'authcid=username password=password'
port: <port>
For more information on authentication options, go to Teradata Vantage authentication documentation
Stored Password Protection
Stored Password Protection enables an application to provide a connection password in encrypted form to the driver. The plugin supports Stored Password Protection feature through prefix ENCRYPTED_PASSWORD(
either in password
connection parameter or in logdata
connection parameter.
password
my-teradata-db-profile:
target: dev
outputs:
dev:
type: teradata
host: <host>
user: <user>
password: ENCRYPTED_PASSWORD(file:PasswordEncryptionKeyFileName,file:EncryptedPasswordFileName)
schema: dbt_test
tmode: ANSI
port: <port>
logdata
my-teradata-db-profile:
target: dev
outputs:
dev:
type: teradata
host: <host>
schema: dbt_test
tmode: ANSI
logmech: LDAP
logdata: 'authcid=username password=ENCRYPTED_PASSWORD(file:PasswordEncryptionKeyFileName,file:EncryptedPasswordFileName)'
port: <port>
For full description of Stored Password Protection see https://github.com/Teradata/python-driver#StoredPasswordProtection.
Port
If your Teradata database runs on port different than the default (1025), you can specify a custom port in your dbt profile using port
configuration.
my-teradata-db-profile:
target: dev
outputs:
dev:
type: teradata
host: <host>
user: <user>
password: <password>
schema: dbt_test
tmode: ANSI
port: <port>
Retries
Allows an adapter to automatically try again when the attempt to open a new connection on the database has a transient, infrequent error. This option can be set using the retries
configuration. Default value is 0. The default wait period between connection attempts is one second. retry_timeout
(seconds) option allows us to adjust this waiting period.
If retries
is set to 3, the adapter will try to establish a new connection three times if an error occurs.
my-teradata-db-profile:
target: dev
outputs:
dev:
type: teradata
host: <host>
user: <user>
password: <password>
schema: dbt_test
tmode: ANSI
retries: 3
retry_timeout: 10
Other Teradata connection parameters
The plugin also supports the following Teradata connection parameters:
- account
- column_name
- cop
- coplast
- encryptdata
- fake_result_sets
- field_quote
- field_sep
- lob_support
- log
- logdata
- max_message_body
- partition
- sip_support
- teradata_values
For full description of the connection parameters see https://github.com/Teradata/python-driver#connection-parameters.
Supported Features
Materializations
view
table
ephemeral
incremental
Incremental Materialization
The following incremental materialization strategies are supported:
append
(default)delete+insert
merge
To learn more about dbt incremental strategies please check the dbt incremental strategy documentation.
Commands
All dbt commands are supported.
Custom configurations
General
-
Enable view column types in docs - Teradata Vantage has a dbscontrol configuration flag called
DisableQVCI
(QVCI - Queryable View Column Index). This flag instructs the database to buildDBC.ColumnsJQV
with view column type definitions.:information_source: Existing customers, please see KB0022230 for more information about enabling QVCI.
To enable this functionality you need to:
- Enable QVCI mode in Vantage. Use
dbscontrol
utility and then restart Teradata. Run these commands as a privileged user on a Teradata node:# option 551 is DisableQVCI. Setting it to false enables QVCI. dbscontrol << EOF M internal 551=false W EOF # restart Teradata tpareset -y Enable QVCI
- Instruct
dbt
to useQVCI
mode. Include the following variable in yourdbt_project.yml
:vars: use_qvci: true
For example configuration, seetest/catalog/with_qvci/dbt_project.yml
.
- Enable QVCI mode in Vantage. Use
Models
Table
The following options apply to table, snapshots and seed materializations.
-
table_kind
- define the table kind. Legal values areMULTISET
(default for ANSI transaction mode required bydbt-teradata
) andSET
, e.g.:- in sql materialization definition file:
{{ config( materialized="table", table_kind="SET" ) }}
- in seed configuration:
seeds: <project-name>: table_kind: "SET"
For details, see CREATE TABLE documentation.
- in sql materialization definition file:
-
table_option
- define table options. Legal values are:{ MAP = map_name [COLOCATE USING colocation_name] | [NO] FALLBACK [PROTECTION] | WITH JOURNAL TABLE = table_specification | [NO] LOG | [ NO | DUAL ] [BEFORE] JOURNAL | [ NO | DUAL | LOCAL | NOT LOCAL ] AFTER JOURNAL | CHECKSUM = { DEFAULT | ON | OFF } | FREESPACE = integer [PERCENT] | mergeblockratio | datablocksize | blockcompression | isolated_loading }
where:
- mergeblockratio:
{ DEFAULT MERGEBLOCKRATIO | MERGEBLOCKRATIO = integer [PERCENT] | NO MERGEBLOCKRATIO }
- datablocksize:
DATABLOCKSIZE = { data_block_size [ BYTES | KBYTES | KILOBYTES ] | { MINIMUM | MAXIMUM | DEFAULT } DATABLOCKSIZE }
- blockcompression:
BLOCKCOMPRESSION = { AUTOTEMP | MANUAL | ALWAYS | NEVER | DEFAULT } [, BLOCKCOMPRESSIONALGORITHM = { ZLIB | ELZS_H | DEFAULT } ] [, BLOCKCOMPRESSIONLEVEL = { value | DEFAULT } ]
- isolated_loading:
WITH [NO] [CONCURRENT] ISOLATED LOADING [ FOR { ALL | INSERT | NONE } ]
Examples:
- in sql materialization definition file:
{{ config( materialized="table", table_option="NO FALLBACK" ) }}
{{ config( materialized="table", table_option="NO FALLBACK, NO JOURNAL" ) }}
{{ config( materialized="table", table_option="NO FALLBACK, NO JOURNAL, CHECKSUM = ON, NO MERGEBLOCKRATIO, WITH CONCURRENT ISOLATED LOADING FOR ALL" ) }}
- in seed configuration:
seeds: <project-name>: table_option:"NO FALLBACK"
seeds: <project-name>: table_option:"NO FALLBACK, NO JOURNAL"
seeds: <project-name>: table_option: "NO FALLBACK, NO JOURNAL, CHECKSUM = ON, NO MERGEBLOCKRATIO, WITH CONCURRENT ISOLATED LOADING FOR ALL"
For details, see CREATE TABLE documentation.
- mergeblockratio:
-
with_statistics
- should statistics be copied from the base table, e.g.:{{ config( materialized="table", with_statistics="true" ) }}
This option is not available for seeds as seeds do not use
CREATE TABLE ... AS
syntax.For details, see CREATE TABLE documentation.
-
index
- defines table indices:[UNIQUE] PRIMARY INDEX [index_name] ( index_column_name [,...] ) | NO PRIMARY INDEX | PRIMARY AMP [INDEX] [index_name] ( index_column_name [,...] ) | PARTITION BY { partitioning_level | ( partitioning_level [,...] ) } | UNIQUE INDEX [ index_name ] [ ( index_column_name [,...] ) ] [loading] | INDEX [index_name] [ALL] ( index_column_name [,...] ) [ordering] [loading] [,...]
where:
- partitioning_level:
{ partitioning_expression | COLUMN [ [NO] AUTO COMPRESS | COLUMN [ [NO] AUTO COMPRESS ] [ ALL BUT ] column_partition ] } [ ADD constant ]
- ordering:
ORDER BY [ VALUES | HASH ] [ ( order_column_name ) ]
- loading:
WITH [NO] LOAD IDENTITY
e.g.:
- in sql materialization definition file:
{{ config( materialized="table", index="UNIQUE PRIMARY INDEX ( GlobalID )" ) }}
:information_source: Note, unlike in
table_option
, there are no commas between index statements!{{ config( materialized="table", index="PRIMARY INDEX(id) PARTITION BY RANGE_N(create_date BETWEEN DATE '2020-01-01' AND DATE '2021-01-01' EACH INTERVAL '1' MONTH)" ) }}
{{ config( materialized="table", index="PRIMARY INDEX(id) PARTITION BY RANGE_N(create_date BETWEEN DATE '2020-01-01' AND DATE '2021-01-01' EACH INTERVAL '1' MONTH) INDEX index_attrA (attrA) WITH LOAD IDENTITY" ) }}
- in seed configuration:
seeds: <project-name>: index: "UNIQUE PRIMARY INDEX ( GlobalID )"
:information_source: Note, unlike in
table_option
, there are no commas between index statements!seeds: <project-name>: index: "PRIMARY INDEX(id) PARTITION BY RANGE_N(create_date BETWEEN DATE '2020-01-01' AND DATE '2021-01-01' EACH INTERVAL '1' MONTH)"
seeds: <project-name>: index: "PRIMARY INDEX(id) PARTITION BY RANGE_N(create_date BETWEEN DATE '2020-01-01' AND DATE '2021-01-01' EACH INTERVAL '1' MONTH) INDEX index_attrA (attrA) WITH LOAD IDENTITY"
- partitioning_level:
Seeds
Seeds, in addition to the above materialization modifiers, have the following options:
use_fastload
- use fastload when handlingdbt seed
command. The option will likely speed up loading when your seed files have hundreds of thousands of rows. You can set this seed configuration option in yourproject.yml
file, e.g.:seeds: <project-name>: +use_fastload: true
Grants
Grants are supported in dbt-teradata adapter with release version 1.2.0 and above. You can use grants to manage access to the datasets you're producing with dbt. To implement these permissions, define grants as resource configs on each model, seed, or snapshot. Define the default grants that apply to the entire project in your dbt_project.yml
, and define model-specific grants within each model's SQL or YAML file.
for e.g. : models/schema.yml
models:
- name: model_name
config:
grants:
select: ['user_a', 'user_b']
Another e.g. for adding multiple grants:
models:
- name: model_name
config:
materialized: table
grants:
select: ["user_b"]
insert: ["user_c"]
:information_source:
copy_grants
is not supported in Teradata.
More on Grants can be found at https://docs.getdbt.com/reference/resource-configs/grants
Cross DB macros
Starting with release 1.3, some macros were migrated from teradata-dbt-utils dbt package to the connector. See the table below for the macros supported from the connector.
For using cross DB macros, teradata-utils as a macro namespace will not be used, as cross DB macros have been migrated from teradata-utils to Dbt-Teradata.
Compatibility
Macro Group | Macro Name | Status | Comment |
---|---|---|---|
Cross-database macros | current_timestamp | :white_check_mark: | custom macro provided |
Cross-database macros | dateadd | :white_check_mark: | custom macro provided |
Cross-database macros | datediff | :white_check_mark: | custom macro provided, see compatibility note |
Cross-database macros | split_part | :white_check_mark: | custom macro provided |
Cross-database macros | date_trunc | :white_check_mark: | custom macro provided |
Cross-database macros | hash | :white_check_mark: | custom macro provided, see compatibility note |
Cross-database macros | replace | :white_check_mark: | custom macro provided |
Cross-database macros | type_string | :white_check_mark: | custom macro provided |
Cross-database macros | last_day | :white_check_mark: | no customization needed, see compatibility note |
Cross-database macros | width_bucket | :white_check_mark: | no customization |
examples for cross DB macros
Replace: {{ dbt.replace("string_text_column", "old_chars", "new_chars") }} {{ replace('abcgef', 'g', 'd') }}
Date truncate: {{ dbt.date_trunc("date_part", "date") }} {{ dbt.date_trunc("DD", "'2018-01-05 12:00:00'") }}
datediff
datediff
macro in teradata supports difference between dates. Differece between timestamps is not supported.
hash
Hash
macro needs an md5
function implementation. Teradata doesn't support md5
natively. You need to install a User Defined Function (UDF):
- Download the md5 UDF implementation from Teradata (registration required): https://downloads.teradata.com/download/extensibility/md5-message-digest-udf.
- Unzip the package and go to
src
directory. - Start up
bteq
and connect to your database. - Create database
GLOBAL_FUNCTIONS
that will host the UDF. You can't change the database name as it's hardcoded in the macro:CREATE DATABASE GLOBAL_FUNCTIONS AS PERMANENT = 60e6, SPOOL = 120e6;
- Create the UDF. Replace
<CURRENT_USER>
with your current database user:GRANT CREATE FUNCTION ON GLOBAL_FUNCTIONS TO <CURRENT_USER>; DATABASE GLOBAL_FUNCTIONS; .run file = hash_md5.btq
- Grant permissions to run the UDF with grant option.
GRANT EXECUTE FUNCTION ON GLOBAL_FUNCTIONS TO PUBLIC WITH GRANT OPTION;
last_day
last_day
in teradata_utils
, unlike the corresponding macro in dbt_utils
, doesn't support quarter
datepart.
Common Teradata-specific tasks
- collect statistics - when a table is created or modified significantly, there might be a need to tell Teradata to collect statistics for the optimizer. It can be done using
COLLECT STATISTICS
command. You can perform this step using dbt'spost-hooks
, e.g.:{{ config( post_hook=[ "COLLECT STATISTICS ON {{ this }} COLUMN (column_1, column_2 ...);" ] )}}
See Collecting Statistics documentation for more information.
Support for dbt-utils
package
dbt-utils
package is supported through teradata/teradata_utils
dbt package. The package provides a compatibility layer between dbt_utils
and dbt-teradata
. See teradata_utils package for install instructions.
Limitations
Transaction mode
Only ANSI transaction mode is supported.
Credits
The adapter was originally created by Doug Beatty. Teradata took over the adapter in January 2022. We are grateful to Doug for founding the project and accelerating the integration of dbt + Teradata.
License
The adapter is published using Apache-2.0 License. Please see the license for terms and conditions, such as creating derivative work and the support model.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dbt-teradata-1.3.3.1.tar.gz
.
File metadata
- Download URL: dbt-teradata-1.3.3.1.tar.gz
- Upload date:
- Size: 36.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9017e692b5bb92df1c6f200701b1f23d604e486449d0fed6dc8fc76ed7636468 |
|
MD5 | 4c7c06cf0e333897a1f960b266be2da3 |
|
BLAKE2b-256 | e5596fd8d8400a384d9c57f1548e5a219f4831262333e0992f484cbf58e67809 |
File details
Details for the file dbt_teradata-1.3.3.1-py3-none-any.whl
.
File metadata
- Download URL: dbt_teradata-1.3.3.1-py3-none-any.whl
- Upload date:
- Size: 35.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | addc0d083c49a55c70d63adb0c6260492dd5556c8d989d4cec940cedd81f40bc |
|
MD5 | ba912146f27ed70f4fe97abfe4a21465 |
|
BLAKE2b-256 | 7913ad7f22c910f81b89eb61c3c044d5326101672c254f7636187a0b39608109 |