Skip to main content

Generate and process base models for dbt

Project description

dbt-generator

This package helps in generating the base models and transform them in bulk. For sources with 10+ models, this package will save you a lot of time by generating base models in bulk and transform them for common fields. Using this package is a great way to start your modeling or onboarding new source.

Installation

To use this packge, you need dbt installed with a profile configed.You will also need to install the code-gen package from dbt Hub. Add the following to the packages.yml file in your dbt repo and run dbt deps to install dependencies.

packages:
  - package: fishtown-analytics/codegen
    version: 0.3.2

Install the package in the same environment with your dbt installation by running:

pip install dbt-generator

This package should be executed inside your dbt repo.

Generate base models

To generate base models, use the dbt-generator generate command. This is a wrapper around the codegen command that will generate the base models. This is especialy useful when you have a lot of models and you want to generate them all at once.

Usage: dbt-generator generate [OPTIONS]

  Gennerate base models based on a .yml source

Options:
  -s, --source-yml PATH   Source .yml file to be used
  -o, --output-path PATH  Path to write generated models
  --source-index INTEGER  Index of the source to generate base models for
  --help                  Show this message and exit.

Example

dbt-generator generate -s ./models/source.yml -o ./models/staging/source_name/

This will read in the source.yml file and generate the base models in the staging/source_name folder. If you have multiple sources defined in your yml file, use the --source-index flag to specify which source you want to generate base models for.

Process base models

For the same source, you often have consistent naming conventions between tables. For example, the created_at and modified_at fields are often named the same for all tables. Changing all these fields to a common values accross different sources is a best practice. However, doing that for all the date columns in 10+ tables is a pain.

With this package you can write a transforms.yml file that will be read in (the .yml file can be named anything). This file will contain the transforms that you want to apply to all the base models. You can just rename the fields in the base models or apply a custom SQL select to the transformed fields.

Usage: dbt-generator transform [OPTIONS]

  Transform base models in a directory using a transforms.yml file

Options:
  -m, --model-path PATH       The path to models
  -t, --transforms-path PATH  Path to a .yml file containing transformations
  -o, --output-path PATH      Path to write transformed models to
  --drop-metadata BOOLEAN     The drop metadata flag
  --case-sensitive BOOLEAN    The case sensitive flag
  --help                      Show this message and exit.

Example

ID:
  name: ID
  sql: CAST(ID as INT64)
CREATED_TIME:
  name: CREATED_AT
UPDATED_TIME:
  name: MODIFIED_AT
DATE_START:
  name: START_AT
DATE_STOP:
  name: STOP_AT

This .yml file when applied to all models in the staging/source_name folder will cast all ID field to INT64 and rename all the date columns to a value in the name key. For example, CREATED_TIME will be renamed to CREATED_AT and DATE_START will be renamed to START_AT. If no sql is provided, the package will just rename the field. If a sql is provided, the package will execute the SQL and rename the field using the name key.

dbt-generator transform -m ./models/staging/source_name/ -t ./transforms.yml

This will transform all models in the staging/source_name folder using the transforms.yml file. You can also drop the metadata by setting the drop-metadata flag to true (dropping columns start with _). The --case-sensitive flag will determine if the transforms will use case sensitive names or not.

Limitations

Here are some of the limitations with the current release. If you want to contribute, please open an issue or a pull request.

  • Transforms only works with model generated with the code-gen package.
  • You cannot transform a model that has already been transformed
  • You cannot use wild card in fields selection for transforms (e.g. *_id)
  • No tests yet
  • No error handling yet

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_generator-0.1.4.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_generator-0.1.4-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file dbt_generator-0.1.4.tar.gz.

File metadata

  • Download URL: dbt_generator-0.1.4.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/1.5.0 pkginfo/1.5.0.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.4

File hashes

Hashes for dbt_generator-0.1.4.tar.gz
Algorithm Hash digest
SHA256 950e9b9cca8dc3c5b2a385b847d510f1621a3c1d501ba5f9fab4029450cc95b5
MD5 f1fac86da4e78e62ce179eaaa1f05268
BLAKE2b-256 4ae804ba4ed4aeb95bc4bc37ccaeab8b79e90dee413288737b8510d31981a4c1

See more details on using hashes here.

File details

Details for the file dbt_generator-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: dbt_generator-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/1.5.0 pkginfo/1.5.0.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.7.4

File hashes

Hashes for dbt_generator-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e350544a5fd655fc8512b8ed6046a5dd7fc0bd5edd7105dd02f0bc93d1910b69
MD5 e8c909599e0304744402c3e6b49ffbb2
BLAKE2b-256 9c288ab315f434b372c60e91ebbd887f5b06a42117534292c73559ae5cc9e36d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page