Skip to main content

Utilities for dbt and Ascend

Project description

Utility to convert a dbt project with compiled SQL models into Ascend transforms

Overview

This utility allows you to convert a dbt project with compiled SQL models into Ascend transforms. It will use the SQL files generated with dbt compile in conjunction with manifest.json generated by dbt compile to create the transforms.

Usage

Run --help to see the usage:

Usage: ascend_dbt_utils [OPTIONS] COMMAND [ARGS]...

  Collection of utilities to help convert dbt projects into Ascend dataflows.

Options:
  --help  Show this message and exit.

Commands:
  dbt-model   Operations to perform on an individual dbt model.
  dbt-models  Operations to perform on a collection of dbt models together.
  dbt-tests   Operations to perform on dbt tests.

Prerequisites

Super-useful to run using venv:

python3 -m venv .venv
source venv/bin/activate

Clone the Github repo, you know how to do it. Install the required modules.

pip install -r requirements.txt

May require installing additional modules that are not mentioned in the requirements.txt file, depending on your environment. The following modules are optional if you want to use Ascend CLI or do dbt compilation locally:

pip install ascend-io-cli
pip install dbt-core
pip install dbt-snowflake

Place the dbt project in a subdirectory. Compile the project with dbt compile. Ensure the target/compiled directory is present. The manifest.json file should be present in the target/compiled directory, as well as the SQL files for each model.

.env file support

To save the amount of command line params typing, the utility supports the .env file to specify the parameters instead of specifying them on the command line. The .env file should be placed in the same directory where the ascend_dbt_utils utility is run from. The following environment variables are supported:

ASCEND_HOSTNAME=myhost.ascend.io
ASCEND_DATA_SERVICE=My_Snowflake_Service
ASCEMD_DATAFLOW=dbt_model_flow
DBT_MANIFEST_FILE=target/manifest.json
ASCEND_LANGUAGE=bigquery-sql # Or, "snowflake-sql" by default when omitted.

Workflow

ascend_dbt_utils supports .env file instead of specifying the parameters on the command line

Example of the .env file:

ASCEND_HOSTNAME=myhost.ascend.io
ASCEND_DATA_SERVICE=My_Snowflake_Service
ASCEND_DATAFLOW=dbt_model_flow
DBT_MANIFEST_FILE=target/manifest.json

Use the dbt-models show command first to see the dependencies, seeds and models

The dbt-models show command will display the list of seeds and models defined in the dbt project. Seeds will have to be pre-created in Ascend dataflow to use them as transformer inputs. Models will be converted into transforms. The following output is an example of the dbt-models show command:

Node: seed.acme.subscription_periods
  -> Depends on: 
Node: model.acme.util_months
  -> Depends on: 
Node: model.acme.customer_revenue_by_month
  -> Depends on: seed.acme.subscription_periods
  -> Depends on: model.acme.util_months
Node: model.acme.customer_churn_month
  -> Depends on: model.acme.customer_revenue_by_month
Node: model.acme.mrr
  -> Depends on: model.acme.customer_revenue_by_month
  -> Depends on: model.acme.customer_churn_month

The output above requires us to have an Ascend dataflow with a connector or a data share connection called subscription_periods to allow the transforms connect to it. The transforms will be created in the dataflow with the following names:

util_months
customer_revenue_by_month
customer_churn_month
mrr

It is useful to check the following before creating the transforms:

  1. The transforms will be created in the dataflow with the same names as the models. If there are any name conflicts, the transforms will not be created.
  2. The utility doesn't interfere with the schema, if the transforms SQL relies on certain fields, they should be present in the seed connector and other transforms. Some manual cleanup may be required after the transforms are created to clean up errors.
  3. dbt allows for models without inputs, but Ascend doesn't. Ensure that all the generated SQL files are referencing other models or seeds. If needed, add a dummy SQL statement connecting the model to a seed or another model.
  4. The utility will determine the right order of model transforms creation based on manifest.json. If there are any dependencies missing, the transforms will not be created.

Use the dbt-models validate command to ensure all the required seeds are present in the flow

The dbt-models validate command will validate that all the seeds required by the transforms are present in the dataflow. The following output is an example of the dbt-models validate command:

Nodes present in the dataflow:
  raw_customers
  raw_orders
  raw_payments
Nodes absent in the dataflow:

Use the dbt-models merge command to create the transforms or merge the transforms into an existing flow

The dbt-models merge command will merge the transforms into the dataflow. The following output is an example of the dbt-models merge command:

I1021 12:52:07.756062 75551 applier.py:350] Apply Dataflow: (ds=Mikes_Snowflake_demos df=dbt_test)
I1021 12:52:08.019690 75551 applier.py:360] Update Dataflow: (ds=Mikes_Snowflake_demos df=dbt_test)
I1021 12:52:09.084092 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test util_months)
I1021 12:52:09.356551 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test util_months)
I1021 12:52:10.335402 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test customer_revenue_by_month)
I1021 12:52:10.617655 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test customer_revenue_by_month)
I1021 12:52:11.593106 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test customer_churn_month)
I1021 12:52:11.856432 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test customer_churn_month)
I1021 12:52:12.824820 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test mrr)
I1021 12:52:13.091333 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test mrr)

The dbt-models merge command requires the pointers to the manifest.json file that contains the required information. The transforms will be created in the dataflow with the same names as the models. If there are any name conflicts, the transforms will not be created. The --default-seed option allows you to specify the default seed to use for the transforms that are not connected to anything, but this requires their SQL files to be edited to reference the default seed as XXX.YYY.subscription_periods. If the transforms SQL relies on certain fields, they should be present in the seed connector and other transforms. Some manual cleanup may be required after the transforms are created to clean up errors.

Use the dbt-models update-sql command to only update the SQL statements of the existing transforms

The dbt-models update-sql command will only update the SQL statements of the existing transforms and leave the rest of the configuration as-is. This is useful to update in place the SQL of the existing transforms that were created from the dbt models.

If needed, use the dbt-models delete command

The dbt-models delete command will delete the transforms created from dbt models from the dataflow. Make sure to delete the deployed tests first with dbt-tests delete, otherwise the deletion of transforms will fail. The following output is an example of the dbt-models delete command:

Deleting node mrr
Deleting node customer_churn_month
Deleting node customer_revenue_by_month
Deleting node util_months

Testing

To run dbt tests in Ascend you will need to first deploy the tests. Once deployed, the tests will appear as grouped transform components in the data flow. Once deployed, the tests will automatically run and will remain running continuously in the flow. Hence, the dbt tests can not only be used to validate the deployment but can also serve as continuously running data quality tests to validate the correct operation of your data flow. The dbt-tests deploy-tests command will deploy the tests to the dataflow. The dbt-tests delete-tests command will remove the deployed tests. The dbt-tests check-test-results command will check the test results. The following output is an example of the dbt-tests check-test-results command:

Checking test results based on the number of records returned by tests...
Test unique_customers_customer_id produced 0 records. Test result is 'Ok'.
Test not_null_customers_customer_id produced 0 records. Test result is 'Ok'.
Test unique_orders_order_id produced 0 records. Test result is 'Ok'.
Test not_null_orders_order_id produced 0 records. Test result is 'Ok'.
...

Checking test results based on data quality check results...
Test unique_customers_customer_id produced 'passed'
Test not_null_customers_customer_id produced 'passed'
Test unique_orders_order_id produced 'passed'
Test not_null_orders_order_id produced 'passed'
...

Creating individual transforms

The dbt-model create-component command will create a single transform from the model. This command can also create the associated tests and group them separately per model.

Updating the SQL of individually created transforms

The dbt-model update-sql command will update the SQL of the transform created from the model.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ascend-io-dbt-utils-0.11.3.tar.gz (13.9 kB view details)

Uploaded Source

File details

Details for the file ascend-io-dbt-utils-0.11.3.tar.gz.

File metadata

  • Download URL: ascend-io-dbt-utils-0.11.3.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for ascend-io-dbt-utils-0.11.3.tar.gz
Algorithm Hash digest
SHA256 e7b76dc75b28baab5d607c48de935ee03a78743abcee944345bacdd840713e50
MD5 e3bcbfdb67de2bfe7662edc1a7047542
BLAKE2b-256 14be4d12d7a1926689b5f19e4d1eeabcc09227f26c9070399cc9df4484a9e287

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page