Pre-commit hooks for dbt projects
Project description
dbt Review Assistant
A collection of CLI tools designed to make reviewing dbt projects quicker and easier
dbt-review-assistant is a Python-based CLI tool which helps dbt developers with ensuring their projects are well
documented, comprehensively tested and consistent.
Maintaining dbt projects can be challenging, especially when the projects get large, complex and have lots of
contributors. dbt-review-assistant aims to help developers and reviewers to focus on what matters, by taking care of
the most boring checklist items automatically.
There are 21 checks available in this package, which are available as both standalone CLI commands or as pre-commit hooks:
Model checks:
models-have-descriptions: Check if models have descriptionsmodels-have-tags: Check if models have tags. Optionally specify a set from which models must have all tags, or from which they must have at least one tagmodels-have-contracts: Check if models have contracts enabledmodels-have-constraints: Check if models have constraints configuredmodels-have-data-tests: Check if models have data testsmodels-have-unit-tests: Check if models have unit testsmodels-have-properties-file: Check if models have a corresponding properties YAML filemodel-columns-have-descriptions: Check if model columns have descriptionsmodel-columns-have-types: Check if model columns have data types documentedmodel-column-names-match-manifest-vs-catalog: Check if model column names match between the manifest.json and the catalog.jsonmodel-column-types-match-manifest-vs-catalog: Check if model column data types match between the manifest.json and the catalog.jsonmodel-column-descriptions-are-consistent: Check if all instances of a column have the same description across different models
Source checks:
sources-have-descriptions: Check if sources have descriptionssources-have-data-tests: Check if sources have data testssource-columns-have-descriptions: Check if source columns have descriptionssource-columns-have-types: Check if source columns have data types documentedsource-column-names-match-manifest-vs-catalog: Check if source column names match between the manifest.json and the catalog.jsonsource-column-types-match-manifest-vs-catalog: Check if source column data types match between the manifest.json and the catalog.json
Macro checks:
macros-have-descriptions: Check if macros have descriptionsmacro-arguments-have-descriptions: Check if macro arguments have descriptionsmacro-arguments-match-manifest-vs-sql: Check if macro arguments match between the manifest.json and the macro SQL code
Usage
Supported Check Arguments
The following arguments may be used globally, or per-check:
--project-dir: Optional - path to the dbt project directory (where the dbt_project.yml file is located). Defaults to
the current working directory.
--manifest-dir: Optional - path to the dbt manifest.json file (usually in the dbt project's target directory).
Defaults to the target directory underneath the dbt project directory.
--catalog-dir: path to the dbt catalog.json file (usually in the dbt project's target directory).
Defaults to the target directory underneath the dbt project directory.
--include-materializations: Optional - list of materializations to include models by. Only models materialized as one
of these values will be considered in-scope for the check(s).
--exclude-materializations: Optional - list of materializations to exclude models by. Only models not materialized as
one of these values will be considered in-scope for the check(s).
--include-packages: Optional - list of dbt package names to include nodes by. Only nodes in one of these packages will
be considered in-scope for the check(s).
--exclude-packages: Optional - list of dbt package names to exclude nodes by. Only nodes not in one of these packages
will be considered in-scope for the check(s).
--include-tags: Optional - list of tags to include nodes by. Only nodes having at least one of these tags will
be considered in-scope for the check(s).
--exclude-packages: Optional - list of tags to exclude nodes by. Nodes which have at least one of these tags will be
considered out-of-scope for the check(s).
--include-node-paths: Optional - list of node paths to include nodes by. Nodes not under any of these paths will be
considered out-of-scope for the check(s).
--exclude-node-paths: Optional - list of node paths to exclude nodes by. Nodes under any of these paths will be
considered out-of-scope for the check(s).
--must-have-all-constraints-from: Optional - List of constraint names, from which objects must have the full set.
--must-have-any-constraint-from: Optional - List of constraint names, from which objects must have at least one value.
--must-have-all-data-tests-from: Optional - List of data test names, from which objects must have the full set.
--must-have-any-data-test-from: Optional - List of data test names, from which objects must have at least one value.
--must-have-all-tags-from: Optional - List of tags, from which objects must have the full set.
--must-have-any-tag-from: Optional - List of tags, from which objects must have at least one value.
Running checks individually
To run individual checks using the CLI, run the dbt-review-assistant command followed by the name of a check, and any
arguments required, for example:
dbt-review-assistant all-models-have-descriptions --include-packages my_dbt_project
Running several checks together
The intended usage of this tool is running several checks all together. This way users ensure the integrity of their dbt
whole project with one single command, and several checks can be written to complement each other and give wide
coverage. There may also be a performance advantage, as dbt-review-assistant can cache data between checks to avoid
having to re-load data from the manifest and catalog files.
The config file allows users to configure any number of checks in YAML. Simply create a file named
.dbt-review-assistant.yaml and place it anywhere in the repo. Here is an example of a basic config file, defining two
checks:
# .dbt-review-assistant.yaml
global_arguments:
arguments: [
"--project-dir",
"my_dbt_project",
"--include-packages",
"my_dbt_project",
]
per_check_arguments:
- check_id: models-have-descriptions
description: We love descriptions! Everything should have descriptions
- check_id: models-have-constraints
description: >
Primary Key constraints are great, but we only want them on tables
arguments: [
"--must-have-all-constraints-from",
"primary_key",
"--include-materializations",
"table",
"incremental"
]
To run all the checks specified in the config file, use all-checks as the check id, and include the --config-dir or
-c
argument, to tell dbt-review-assistant where the config file is:
dbt-review-assistant all-checks --config-dir ./my_dbt_project
Note - if using all-checks then any arguments other than --config-dir are ignored, in favour of arguments specified
in the config file.
global_arguments
The global_arguments section sets default arguments which will be passed to every check. These arguments will be overridden by individual checks, if they also define the same arguments with different values.
per_check_arguments
The per_check_arguments section sets the arguments for each check instance. Each check instance must specify a
check-id, and may optionally set an array of string arguments. Note that check-id does not need to be unique -
the same check can be used any number of times, to allow them to be used with different arguments. The description
key is completely optional, but it is suggested to add a description to tell other developers what the specific check is
and why it is needed, so all contributors to your project know what the expectations are.
Running checks using pre-commit
All checks may be run as pre-commit hooks, either individually, or as one single entry encompassing one or more checks.
For example, to run all hooks, use the all-checks hook and point it to the config file directory:
repos:
- repo: https://github.com/sambloom92/dbt-review-assistant
rev: <latest tag>
hooks:
- id: all-checks
args: [ "--config-dir", "my_dbt_project" ]
Or to run individual checks as standalone hooks:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/sambloom92/dbt-review-assistant
rev: <latest tag>
hooks:
- id: all-models-have-descriptions
pass_filenames: false
args: [ "--include-packages", "my_dbt_project" ]
- id: all-models-have-constraints
pass_filenames: false
args: [
"--must-have-all-constraints-from",
"primary_key",
"--include-materializations",
"table",
"incremental"
]
Note that the recommended option is to use the single entry version, because this can benefit from improved performance
by allowing dbt-review-assistant to cache data in memory between checks. Running checks individually forces them to be
run in
separate environments, so they cannot share cached data.
Using pass_filenames
pre-commit hooks have an option called pass_filenames, which defaults to true. This instructs pre-commit to pass all
filenames that are staged for commit into the hook entry command as positional arguments.
dbt-review-assistant does not support pass_filenames: true, and so all hooks will come with pass_filenames: false
by default, and it should not be overridden. Be aware that if using these hooks with repo: local, this will change the
default value back to pass_filenames: false, so all examples here explicitly include the correct setting, even though
it is not always strictly necessary.
Disabling pass_filenames for hooks is a deliberate design choice, which greatly simplifies how the tool works.
Although it can be helpful to only run checks on files that have changed, this is very complicated to do correctly in
practice, due to the complex dependencies between files within dbt projects. A more 'slim' option might be developed as
a future improvement, but for now the entire project is checked (unless nodes are excluded by specific arguments),
regardless of which files are staged for commit.
Refreshing dbt artifacts
All checks rely on the data in the dbt manifest.json file, and some checks have an additional dependency on the dbt
catalog.json file. As such, these files need to be refreshed whenever any change is made to the dbt project, otherwise
dbt-review-assistant will not have the most up-to-date view of your project. dbt-review-assistant does not look at
any SQL or YAML files in your project at all, or connect to you database, or even run any dbt commands - the manifest
and catalog JSON files are its only source of truth.
This table shows which checks require which dbt artifacts:
| check-id | manifest | catalog |
|---|---|---|
models-have-descriptions |
✅ | ❌ |
models-have-tags |
✅ | ❌ |
models-have-contracts |
✅ | ❌ |
models-have-constraints |
✅ | ❌ |
models-have-data-tests |
✅ | ❌ |
models-have-unit-tests |
✅ | ❌ |
models-have-properties-file |
✅ | ❌ |
model-columns-have-descriptions |
✅ | ❌ |
model-columns-have-types |
✅ | ❌ |
model-column-names-match-manifest-vs-catalog |
✅ | ✅ |
model-column-types-match-manifest-vs-catalog |
✅ | ✅ |
model-column-descriptions-are-consistent |
✅ | ❌ |
sources-have-descriptions |
✅ | ❌ |
sources-have-data-tests |
✅ | ❌ |
source-columns-have-descriptions |
✅ | ❌ |
source-columns-have-types |
✅ | ❌ |
source-column-names-match-manifest-vs-catalog |
✅ | ✅ |
source-column-types-match-manifest-vs-catalog |
✅ | ✅ |
macros-have-descriptions |
✅ | ❌ |
macro-arguments-have-descriptions |
✅ | ❌ |
macro-arguments-match-manifest-vs-sql |
✅ | ❌ |
These JSON files are typically in the .gitignore, so they are not tracked in git, and are often cleaned up when
running dbt clean, so knowing how to generate them is important.
To refresh the manifest, run:
dbt parse
To refresh the catalog, run:
dbt docs generate --no-compile
To ensure the manifest and/or catalog are refreshed automatically by pre-commit, simply add dbt commands as locally installed entries to your existing pre-commit configuration, before the checks:
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: refresh-manifest
name: Refresh dbt Manifest
entry: dbt parse
args: [
"--project-dir",
"./my_dbt_project",
"--profiles-dir",
"./my_dbt_project"
]
language: python
pass_filenames: false
- id: refresh-catalog
name: Refresh dbt Catalog
entry: dbt docs generate
args: [
"--project-dir",
"./my_dbt_project",
"--profiles-dir",
"./my_dbt_project",
"--no-compile"
]
language: python
pass_filenames: false
- repo: https://github.com/sambloom92/dbt-review-assistant
rev: <latest tag>
hooks:
- id: all-models-have-descriptions
args: [ "--include-packages", "my_dbt_project" ]
- id: all-models-have-constraints
args: [
"--must-have-all-constraints-from",
"primary_key",
"--include-materializations",
"table",
"incremental"
]
The refresh-manifest and refresh-catalog hooks demonstrated above are not part of dbt-review-assistant, and rely
on your project's own local dbt installation. Add whichever arguments you would normally include when running dbt
commands within your project. To use these in a CI environment such as GitHubActions, ensure that the worker has
the dbt adapter installed and, if refreshing the catalog, has permission to connect to your database.
Acknowledgements
This tool was inspired by the popular dbt-checkpoint pre-commit hooks by DataCoves (formerly pre-commit-dbt). I have found these hooks immensely useful for my own dbt projects, and I am very grateful to them for contributing it. That said, there were a number of ways in which I believed it could be improved and simplified, so I decided to try writing my own tool. While there may be similarities in some of the checks, all code in this repository is written by myself, with nothing taken from any other projects or AI tools.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_review_assistant-1.0.0.tar.gz.
File metadata
- Download URL: dbt_review_assistant-1.0.0.tar.gz
- Upload date:
- Size: 28.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a463d500fd63b4900ec2858ce715f760a7f9b3960a47419cfa884cf54c466a5f
|
|
| MD5 |
fff635ebd285e54f613af5c5ce107d27
|
|
| BLAKE2b-256 |
b9cd772f243f847e1bacd2c88fb1abf404504f8060b1f2e8496dae6d0dadff70
|
Provenance
The following attestation bundles were made for dbt_review_assistant-1.0.0.tar.gz:
Publisher:
publish.yml on dbt-review-assistant/dbt-review-assistant
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_review_assistant-1.0.0.tar.gz -
Subject digest:
a463d500fd63b4900ec2858ce715f760a7f9b3960a47419cfa884cf54c466a5f - Sigstore transparency entry: 645636439
- Sigstore integration time:
-
Permalink:
dbt-review-assistant/dbt-review-assistant@85d25a26be6821d47a4dcaeba7546ab41b363e67 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/dbt-review-assistant
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@85d25a26be6821d47a4dcaeba7546ab41b363e67 -
Trigger Event:
release
-
Statement type:
File details
Details for the file dbt_review_assistant-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dbt_review_assistant-1.0.0-py3-none-any.whl
- Upload date:
- Size: 40.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df6c70f01f5dab3881d9fffebe68857341f43bd5d0f6aa084dd43fea3beb3f9b
|
|
| MD5 |
6818fba6731e5d0523664b55004d80b3
|
|
| BLAKE2b-256 |
9e6164785a9a793b8ee9acaddd738d712c782a08354218eab2bfcd75bb0d78d1
|
Provenance
The following attestation bundles were made for dbt_review_assistant-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on dbt-review-assistant/dbt-review-assistant
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbt_review_assistant-1.0.0-py3-none-any.whl -
Subject digest:
df6c70f01f5dab3881d9fffebe68857341f43bd5d0f6aa084dd43fea3beb3f9b - Sigstore transparency entry: 645636465
- Sigstore integration time:
-
Permalink:
dbt-review-assistant/dbt-review-assistant@85d25a26be6821d47a4dcaeba7546ab41b363e67 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/dbt-review-assistant
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@85d25a26be6821d47a4dcaeba7546ab41b363e67 -
Trigger Event:
release
-
Statement type: