Skip to main content

dagrules - dbt DAG rule creator and validator

Project description

dagrules

dagrules is a tool that allows you to write your own dbt dag rules and check that your dbt project is conforming to those rules.

Overview

While the dbt community has established some excellent guidelines for how to structure dbt projects, those conventions are not automatically enforced. Those conventions are simply guidelines, and each team may decide on a slightly different set of conventions that work best for their particular set up. dagrules was developed to allow you to write your own conventions in a simple yaml document, and have those conventions enforced via your CI system.

To use dagrules, all you need is a dbt project and a dagrules.yml file located in the root of the dbt project (e.g., dbt/dagrules.yml). The yaml file should look like (for a more complete example, see tests/dagrules.yml:

---
version: '1'
rules:
  - name: The name of my rule
    subject:
       ... # How to select nodes to check that they satisfy the rules
    must:
       ... # Define the rules that must be followed
  - name: Another one of my rules
    ...

Installation and running rules

dagrules can be installed using pip:

pip install dagrules

And then run dagrules with the --check argument from your dbt project root:

dagrules --check

dagrules assumes that it is being executed from the dbt project root and that there is a target/manifest.json file already present (so the dbt project must be compiled any time the dag is changed before dagrules can be run). These defaults can be overridden by setting the DBT_ROOT and DAGRULES_YAML environment variable to point to other locations.

Subjects

For every rule, a subject should be declared that defines how to select nodes of the dbt dag to use for rule validation. Omitting the subject means that the rule will be applied to every dbt model. dagrules currently supports two ways to select subjects: 1) by node type (source, snapshot, model) and 2) by tags. For example, the follow subject includes all models that are tagged "staging":

rules:
  - name: All staging models must ...
    subject:
      type: model
      tags: staging
    must:
      ...

Tag selection

Tag selection applies both to subject and must section of the dagrules yaml spec. Tags can be defined several ways.

Single string - Selecting with a single tag can be expressed as a simple string

tags: staging

List of tags: match any - A list of tags can also be specified, and dagrules will match nodes with any of the tags in the list. The example below will match nodes having either base or intermediate tags.

tags:
  - base
  - intermediate

Include: match all with exclusions - When you need to select nodes that match all tags in a list, and possibly exclude nodes with some tags as well, you can use include/exclude. The example below will select any nodes that have both "staging" and "finance" tags, but that don't also have the base tag.

tags:
  include:
    - staging
    - finance
  exclude:
    - base

The arguments to include and exclude can either be a list or single strings.

Combine any/all - We can also combine any and all syntaxes at once. The following will select all nodes that are either "non-base staging", "core", or "mart" models.:

tags:
  - include: staging
    exclude: base
  - core
  - mart

Musts

"Musts" define the rules that must be adhered to by the subjects defined in the subject section. Multiple "musts" may be included in a rule definition, and all must be satisfied for the rule to pass.

Match name - The match-name rule requires that each subject adhere to a particular naming pattern. dagrules currently only supports regular expression matching. For example, the following rule enforces that all snapshot models must be named with a snap_ prefix:

rules:
  - name: Snapshot must be prefixed with snap_
    subject:
      type: snapshot
    must:
      match-name: /snap_.*/

Have tags - The have-tags-any rule requires that all selected models must have one of any of the listed tags. The following example specifies that all nodes in the dag must have at least one of the tags listed:

rules:
  - name: All models must be tagged either snapshot, base, staging, intermediate, core, mart
    # Omit subject to include all nodes
    must:
      have-tags-any:
        - snaphost
        - base
        - staging
        - intermediate
        - core
        - mart

Have parent or child relationship - The have-child-relationship and have-parent-relationship rules require that the subjects have a certain kind of relationship to either their immediate children or parents. The types of relationship can involve:

  • cardinality - The cardinality of the relationship between a subject and its child/parent can either be one_to_one or one_to_many (default). If one_to_one is selected, that a subject may only have one child/parent.
  • required - Indicates whether a child/parent relationship is required or not. The default is True, meaning that if a relationship is defined, all subject must have at least one child or parent node. If False, then a subject may have 0 children/parents.
  • require-tags-any - Contains a list of tags that the parent/child must have (with syntax defined in the "Tag selection" section above).
  • require-node-type - Indicates the node type (source, snapshot, model) that the child/parent must be in order to pass.
  • select-tags-any - Contains a list of tags that restricts the selection of parents/children involved in the rule.
  • select-node-type - Indicates that only the parents/children with the specified node type are to be considered when checking the rule.

For example,

rules:
  - name: Snapshots must have 0 or 1 children, which must all be base models
    subject:
      type: snapshot
    must:
      have-child-relationship:
        cardinality: one_to_one
        required: false
        require-tags-any:
          - base

  - name: Intermediate models may only depend on non-base staging, core, mart, or other intermediate models
    subject:
      tags:
        include: intermediate
    must:
      have-parent-relationship:
        require-tags-any:
          - include: staging
            exclude: base
          - core
          - mart
          - intermediate

Contributing

We welcome contributors! Please submit any suggests or pull requests in Github.

Developer setup

Create an appropriate python environment. I like miniconda, but use whatever you like:

conda create --name dagrules python=3.9
conda activate dagrules

Then install pip packages

pip install pip-tools
pip install --ignore-installed -r requirements.txt

run tests via

inv test

and the linter via

inv lint

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dagrules-0.1.0.tar.gz (12.6 kB view details)

Uploaded Source

File details

Details for the file dagrules-0.1.0.tar.gz.

File metadata

  • Download URL: dagrules-0.1.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for dagrules-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3e99bcd08c2fb44a98d2ab4ebef931af6f60e5fcb2c19ae0008ef8844f8f1b52
MD5 6075c4fe346e21c9f6fcc60aa0bae1bd
BLAKE2b-256 b1c2de466574e0cea551e87a1d9f4b2bbb127e3afd241943f3ae8768cb915868

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page