Skip to main content

The Decodable adapter plugin for DBT

Project description

dbt-decodable

dbt adapter for Decodable.

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Decodable is a fully managed stream processing service, based on [Apache Flink®] and using SQL as the primary means of defining data streaming pipelines.

Installation

dbt-decodable is available on PyPI. To install the latest version via pip (optionally using a virtual environment), run:

python3 -m venv dbt-venv         # create the virtual environment
source dbt-venv/bin/activate     # activate the virtual environment
pip install dbt-decodable        # install the adapter

Getting Started

Once you've installed dbt in a virtual environment, we recommend trying out the example project provided by decodable:

# clone the example project
git clone https://github.com/decodableco/dbt-decodable.git
cd dbt-decodable/example_project/example/

# Ensure you can connect to decodable via the decodable CLI:
# If you don't have installed the decodable CLI,
# install it following these instructions: https://docs.decodable.co/docs/setup#install-the-cli-command-line-interface
decodable connection list

# Ensure you have a  ~/.dbt/profiles.yml file:
cat ~/.dbt/profiles.yml
dbt-decodable: # this name must match the 'profile' from dbt_project.yml
  outputs:
    dev:
      account_name: <fill in your decodable account name>
      profile_name: default # fill in any profile defined in ~/.decodable/config
      type: decodable
      database: db
      schema: demo
  target: dev

# This will launch the example project
dbt run

Configuring your profile

Profiles in dbt describe a set of configurations specific to a connection with the underlying data warehouse. Each dbt project should have a corresponding profile (though profiles can be reused for different project). Within a profile, multiple targets can be described to further control dbt's behavior. For example, it's very common to have a dev target for development and a prod target for production related configurations.

Most of the profile configuration options available can be found inside the dbt documentation. Additionally, dbt-decodable defines a few adapter-specific ones that can be found below.

dbt-decodable:        # the name of the profile
  target: dev         # the default target to run commands with
  outputs:            # the list of all defined targets under this profile
    dev:              # the name of the target
      type: decodable
      database: None  # Ignored by this adapter, but required properties
      schema: None    # Ignored by this adapter, but required properties

      # decodable specific settings
      account_name: [your account]          # Decodable account name
      profile_name: [name of the profile]   # Decodable profile name
      materialize_tests: [true | false]     # whether to materialize tests as a pipeline/stream pair, default is `false`
      timeout: [ms]                         # maximum accumulative time a preview request should run for, default is `60000`
      preview_start: [earliest | latest]    # whether preview should be run with `earliest` or `latest` start position, default is `earliest`
      local_namespace: [namespace prefix]   # prefix added to all entities created on Decodable, default is `None`, meaning no prefix gets added.

dbt looks for the profiles.yml file in the ~/.dbt directory. This file contains all user profiles.

Supported Features

Materializations

Only table materialization is supported for dbt models at the moment. A dbt table model translates to a pipeline/stream pair on Decodable, both sharing the same name. Pipelines for models are automatically activated upon materialization.

To materialize your models simply run the dbt run command, which will perform the following steps for each model:

  1. Create a stream with the model's name and schema inferred by Decodable from the model's SQL.

  2. Create a pipeline that inserts the SQL's results into the newly created stream.

  3. Activate the pipeline.

By default, the adapter will not tear down and recreate the model on Decodable if no changes to the model have been detected. Invoking dbt with the --full-refresh flag set, or setting that configuration option for a specific model will cause the corresponding resources on Decodable to be destroyed and built from scratch. See the docs for more information on using this option.

Custom model configuration

A watermark option can be configured to specify the watermark to be set for the model's respective Decodable stream.

More on specifying configuration options per model can be found here.

Seeds

dbt seed will perform the following steps for each specified seed:

  1. Create a REST connection and an associated stream with the same name (reflecting the seed's name).

  2. Activate the connection.

  3. Send the data stored in the seed's .csv file to the connection as events.

  4. Deactivate the connection.

After these steps are completed, you can access the seed's data on the newly created stream.

Sources

Sources in dbt correspond to Decodable's source connections. However, dbt source command is not supported at the moment.

Documentation

dbt docs is not supported at the moment. You can check your Decodable account for details about your models.

Testing

Based on the materialize_tests option set for the current target, dbt test will behave differently:

  • materialize_tests = false will cause dbt to run the specified tests as previews return the results after they finish. The exact time the preview runs for, as well as whether they run starting positions should be set to earliest or latest can be changed using the timeout and preview_start target configurations respectively.

  • materialize_tests = true will cause dbt to persist the specified tests as pipeline/stream pairs on Decodable. This configuration is designed to allow continous testing of your models. You can then run a preview on the created stream (for example using Decodable CLI) to monitor the results.

Snapshots

Neither the [dbt snapshot] command nor the notion of snapshots are supported at the moment.

Additional Operations

dbt-decodable provides a set of commands for managing the project's resources on Decodable. Those commands can be run using dbt run-operation {name} --args {args}.

Example invocation of the delete_streams operation detailed below:

$ dbt run-operation delete_streams --args '{streams: [stream1, stream2], skip_errors: True}'

stop_pipelines(pipelines)

pipelines : Optional list of names. Default value is None.

Deactivate pipelines for resources defined within the project. If the pipelines arg is provided, the command only considers the listed resources. Otherwise, it deactivates all pipelines associated with the project.


delete_pipelines(pipelines)

pipelines : Optional list of names. Default value is None.

Delete pipelines for resources defined within the project. If the pipelines arg is provided, the command only considers the listed resources. Otherwise, it deletes all pipelines associated with the project.


delete_streams(streams, skip_errors)

streams : Optional list of names. Default value is None.
skip_errors : Whether to treat errors as warnings. Default value is true.

Delete streams for resources defined within the project. Note that it does not delete pipelines associated with those streams, failing to remove a stream if one exists. For a complete removal of stream/pipeline pairs, see the cleanup operation.
If the streams arg is provided, the command only considers the listed resources. Otherwise, it attempts to delete all streams associated with the project.
If skip_errors is set to true, failure to delete a stream (e.g. due to an associated pipeline) will be reported as a warning. Otherwise, the operation stops upon the first error encountered.


cleanup(list, models, seeds, tests)

list : Optional list of names. Default value is None.
models : Whether to include models during cleanup. Default value is true.
seeds : Whether to include seeds during cleanup. Default value is true.
tests : Whether to include tests during cleanup. Default value is true.

Delete all Decodable entities resulting from the materialization of the project's resources, i.e. connections, streams and pipelines.
If the list arg is provided, the command only considers the listed resources. Otherwise, it deletes all entities associated with the project.
The models, seeds and tests arguments specify whether those resource types should be included in the cleanup. Note that cleanup does nothing for tests that have not been materialized.

Contributions

Contributions to this repository are more than welcome. Please create any pull requests against the [main] branch.

Each release is maintained in a release/ branch, such as release/v1.3.2, and there's a tag for it.

License

This code base is available under the Apache License, version 2.

Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_decodable-1.3.3.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

dbt_decodable-1.3.3-py3-none-any.whl (52.4 kB view details)

Uploaded Python 3

File details

Details for the file dbt_decodable-1.3.3.tar.gz.

File metadata

  • Download URL: dbt_decodable-1.3.3.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for dbt_decodable-1.3.3.tar.gz
Algorithm Hash digest
SHA256 3ee6445159cd8f81fc78f0b5cafc5b41eb426d1399c9d5c327fc5f95e63df9ce
MD5 1fdee9e2f9575194a7f6908aca64f1b8
BLAKE2b-256 89c9c3948bd04cda75beec41714c629a29043134b3bc325d09a185dc89428491

See more details on using hashes here.

File details

Details for the file dbt_decodable-1.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_decodable-1.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5efdd3f328ee40b56a2cc680df3efb1d3255d9da64004f909d650ca24619027a
MD5 d439a8b1b82bbf7c170607d02daf4534
BLAKE2b-256 a448496eebf3be48c5030d9b063e36091a11b31660bafcd7198296ee6d7cc2f2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page