Utilities for dbt and Ascend
Project description
Utility to convert a dbt project with compiled SQL models into Ascend transforms
Overview
This utility allows you to convert a dbt project with compiled SQL models into Ascend transforms. It will use the SQL files generated with dbt compile
in conjunction with manifest.json
generated by dbt compile
to create the transforms.
Usage
Run --help
to see the usage:
Usage: ascend_dbt_utils [OPTIONS] COMMAND [ARGS]...
Collection of utilities to help convert dbt projects into Ascend dataflows.
Options:
--help Show this message and exit.
Commands:
dbt-model Operations to perform on an individual dbt model.
dbt-models Operations to perform on a collection of dbt models together.
dbt-tests Operations to perform on dbt tests.
Prerequisites
Super-useful to run using venv:
python3 -m venv .venv
source venv/bin/activate
Clone the Github repo, you know how to do it. Install the required modules.
pip install -r requirements.txt
May require installing additional modules that are not mentioned in the requirements.txt
file, depending on your environment. The following modules are optional if you want to use Ascend CLI or do dbt compilation locally:
pip install ascend-io-cli
pip install dbt-core
pip install dbt-snowflake
Place the dbt project in a subdirectory. Compile the project with dbt compile
. Ensure the target/compiled
directory is present. The manifest.json
file should be present in the target/compiled
directory, as well as the SQL files for each model.
.env
file support
To save the amount of command line params typing, the utility supports the .env
file to specify the parameters instead of specifying them on the command line. The .env
file should be placed in the same directory where the ascend_dbt_utils
utility is run from. The following environment variables are supported:
ASCEND_HOSTNAME=myhost.ascend.io
ASCEND_DATA_SERVICE=My_Snowflake_Service
ASCEMD_DATAFLOW=dbt_model_flow
DBT_MANIFEST_FILE=target/manifest.json
ASCEND_LANGUAGE=bigquery-sql # Or, "snowflake-sql" by default when omitted.
Workflow
ascend_dbt_utils
supports .env file instead of specifying the parameters on the command line
Example of the .env
file:
ASCEND_HOSTNAME=myhost.ascend.io
ASCEND_DATA_SERVICE=My_Snowflake_Service
ASCEND_DATAFLOW=dbt_model_flow
DBT_MANIFEST_FILE=target/manifest.json
Use the dbt-models show
command first to see the dependencies, seeds and models
The dbt-models show
command will display the list of seeds and models defined in the dbt project. Seeds will have to be pre-created in Ascend dataflow to use them as transformer inputs. Models will be converted into transforms. The following output is an example of the dbt-models show
command:
Node: seed.acme.subscription_periods
-> Depends on:
Node: model.acme.util_months
-> Depends on:
Node: model.acme.customer_revenue_by_month
-> Depends on: seed.acme.subscription_periods
-> Depends on: model.acme.util_months
Node: model.acme.customer_churn_month
-> Depends on: model.acme.customer_revenue_by_month
Node: model.acme.mrr
-> Depends on: model.acme.customer_revenue_by_month
-> Depends on: model.acme.customer_churn_month
The output above requires us to have an Ascend dataflow with a connector or a data share connection called subscription_periods
to allow the transforms connect to it. The transforms will be created in the dataflow with the following names:
util_months
customer_revenue_by_month
customer_churn_month
mrr
It is useful to check the following before creating the transforms:
- The transforms will be created in the dataflow with the same names as the models. If there are any name conflicts, the transforms will not be created.
- The utility doesn't interfere with the schema, if the transforms SQL relies on certain fields, they should be present in the seed connector and other transforms. Some manual cleanup may be required after the transforms are created to clean up errors.
- dbt allows for models without inputs, but Ascend doesn't. Ensure that all the generated SQL files are referencing other models or seeds. If needed, add a dummy SQL statement connecting the model to a seed or another model.
- The utility will determine the right order of model transforms creation based on
manifest.json
. If there are any dependencies missing, the transforms will not be created.
Use the dbt-models validate
command to ensure all the required seeds are present in the flow
The dbt-models validate
command will validate that all the seeds required by the transforms are present in the dataflow. The following output is an example of the dbt-models validate
command:
Nodes present in the dataflow:
raw_customers
raw_orders
raw_payments
Nodes absent in the dataflow:
Use the dbt-models merge
command to create the transforms or merge the transforms into an existing flow
The dbt-models merge
command will merge the transforms into the dataflow. The following output is an example of the dbt-models merge
command:
I1021 12:52:07.756062 75551 applier.py:350] Apply Dataflow: (ds=Mikes_Snowflake_demos df=dbt_test)
I1021 12:52:08.019690 75551 applier.py:360] Update Dataflow: (ds=Mikes_Snowflake_demos df=dbt_test)
I1021 12:52:09.084092 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test util_months)
I1021 12:52:09.356551 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test util_months)
I1021 12:52:10.335402 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test customer_revenue_by_month)
I1021 12:52:10.617655 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test customer_revenue_by_month)
I1021 12:52:11.593106 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test customer_churn_month)
I1021 12:52:11.856432 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test customer_churn_month)
I1021 12:52:12.824820 75551 applier.py:756] Apply Component: (ds=Mikes_Snowflake_demos df=dbt_test mrr)
I1021 12:52:13.091333 75551 applier.py:697] Create Transform: (ds=Mikes_Snowflake_demos df=dbt_test mrr)
The dbt-models merge
command requires the pointers to the manifest.json
file that contains the required information. The transforms will be created in the dataflow with the same names as the models. If there are any name conflicts, the transforms will not be created. The --default-seed
option allows you to specify the default seed to use for the transforms that are not connected to anything, but this requires their SQL files to be edited to reference the default seed as XXX.YYY.subscription_periods
. If the transforms SQL relies on certain fields, they should be present in the seed connector and other transforms. Some manual cleanup may be required after the transforms are created to clean up errors.
Use the dbt-models update-sql
command to only update the SQL statements of the existing transforms
The dbt-models update-sql
command will only update the SQL statements of the existing transforms and leave the rest of the configuration as-is. This is useful to update in place the SQL of the existing transforms that were created from the dbt models.
If needed, use the dbt-models delete
command
The dbt-models delete
command will delete the transforms created from dbt models from the dataflow. Make sure to delete the deployed tests first with dbt-tests delete
, otherwise the deletion of transforms will fail. The following output is an example of the dbt-models delete
command:
Deleting node mrr
Deleting node customer_churn_month
Deleting node customer_revenue_by_month
Deleting node util_months
Testing
To run dbt tests in Ascend you will need to first deploy the tests. Once deployed, the tests will appear as grouped transform components in the data flow. Once deployed, the tests will automatically run and will remain running continuously in the flow. Hence, the dbt tests can not only be used to validate the deployment but can also serve as continuously running data quality tests to validate the correct operation of your data flow.
The dbt-tests deploy-tests
command will deploy the tests to the dataflow. The dbt-tests delete-tests
command will remove the deployed tests. The dbt-tests check-test-results
command will check the test results. The following output is an example of the dbt-tests check-test-results
command:
Checking test results based on the number of records returned by tests...
Test unique_customers_customer_id produced 0 records. Test result is 'Ok'.
Test not_null_customers_customer_id produced 0 records. Test result is 'Ok'.
Test unique_orders_order_id produced 0 records. Test result is 'Ok'.
Test not_null_orders_order_id produced 0 records. Test result is 'Ok'.
...
Checking test results based on data quality check results...
Test unique_customers_customer_id produced 'passed'
Test not_null_customers_customer_id produced 'passed'
Test unique_orders_order_id produced 'passed'
Test not_null_orders_order_id produced 'passed'
...
Creating individual transforms
The dbt-model create-component
command will create a single transform from the model. This command can also create the associated tests and group them separately per model.
Updating the SQL of individually created transforms
The dbt-model update-sql
command will update the SQL of the transform created from the model.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ascend-io-dbt-utils-0.11.3.tar.gz
.
File metadata
- Download URL: ascend-io-dbt-utils-0.11.3.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7b76dc75b28baab5d607c48de935ee03a78743abcee944345bacdd840713e50 |
|
MD5 | e3bcbfdb67de2bfe7662edc1a7047542 |
|
BLAKE2b-256 | 14be4d12d7a1926689b5f19e4d1eeabcc09227f26c9070399cc9df4484a9e287 |