Environment diff tool for dbt
Project description
Recce
Recce
is a PR review tool designed for DBT projects. It facilitates the comparison of results between two environments, such as development and production, and helps in identifying any differences.
Features
- Lineage diff
- Schema diff
- Row count diff
- Ad-Hoc Query and Query Diff
- Profile Diff
- Value Diff
- Checklist
Use cases
Recce is primarily designed for PR Review purposes. However, it can be expanded to the following use cases:
- During development, we can verify new results by contrasting them with those from production prior to pushing the changes.
- While reviewing PR, you can grasp the extent of the changes and their impact before merging.
- For troubleshooting, you can execute ad-hoc diff queries to pinpoint the root causes.
Usage
Prerequisites
You have to have at least two environments in your dbt project. For example, one is for development and another is for production. You can prepare two targets with separate schemas in your DBT profile. Here is profiles.yml
example
jaffle_shop:
target: dev
outputs:
dev:
type: duckdb
path: jaffle_shop.duckdb
schema: dev
prod:
type: duckdb
path: jaffle_shop.duckdb
schema: main
Getting Started
5 minutes walkthrough by jaffle shop example
-
Installation
pip install recce
-
Go to your DBT project
cd your-dbt-project/
-
Prepare artifacts for base environment in
target-base/
foldergit checkout main # Generate artifacts for base environment to 'target-base' dbt docs generate --target prod --target-path target-base/
-
Prepare artifacts for current working environment
git checkout feature/my-awesome-feature # Run dbt and generate artifacts for current working environments dbt run dbt docs generate
-
Run the recce server.
recce server # or with persistent state # recce server issue-123.json
Recce would diff environments between
target/
andtarget-base/
Recce use dbt artifacts, which is generated when every invocation. You can find these files in the target/
folder.
artifacts | DBT command |
---|---|
manifest.json | dbt docs generate , dbt run , .. |
catalog.json | dbt docs generate |
[!TIP] The regeneration of the
catalog.json
file is not required after everydbt run
. it is only required to regenerate this file when models or columns are added or updated.
Lineage Diff
Ad-Hoc Query and Query Diff
You can use any dbt macros installed in your project.
select * from {{ ref("mymodel") }}
In the query diff, which involves comparing the results from two different environments, the browser is required to pull all result data to the client side. Consequently, minimizing the data volume in the query results is essential for efficiency and performance.
[!TIP] Hotkeys Cmd + Enter: Run query Cmd + Shift + Enter: Run query diff
Schema diff
To use schema diff, make sure that both environments have catalog.json.
Row count diff
Profile diff
Profile diff uses the get_profile
from dbt_profiler. Make sure that this package is installed in your project.
packages:
- package: data-mie/dbt_profiler
version: <version>
Please reference [dbt-profiler] to understand the definition of each profiling stats.
Value diff
- Added: New added PKs.
- Removed: Removed PKs.
- Matched: For a column, the count of matched value of common PKs.
- Matched %: For a column, the ratio of matched over common PKs.
PK: Primary key
Value diff uses the compare_column_values
from audit-helper. Make sure that this package is installed in your project.
packages:
- package: dbt-labs/audit_helper
version: <version>
Value diff requires to select a column as private key. The catalog.json
is required to list the available columns.
Checklist
When you feel that your query is worth recording, you can add it to the checklist and name the title and leave a description. This is very helpful for subsequent posting in PR comments.
Q&A
Q: How recce
connect to my data warehouse? Does recce support my data warehouse?
recce
use the dbt adapter to connect to your warehouse. So it should work for your data warehouse.
Q: What credential does recce
connect to the two environments?
Recce uses the same target in the profile to connect your warehouse. If you use the default target dev
, it uses the credentials to connect to both environments. So please make sure that the credential can access both environments.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file recce-0.6.0.tar.gz
.
File metadata
- Download URL: recce-0.6.0.tar.gz
- Upload date:
- Size: 674.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 714c2c9587cc918b91424cab805624deab0f22ea5523a917227e3f0109d6d857 |
|
MD5 | 11b447f3601b632c03392080153274c5 |
|
BLAKE2b-256 | 300caa77092e97c421a1a3d36df4ec335232dd9f57c0cfcaed79549033ef0903 |
File details
Details for the file recce-0.6.0-py3-none-any.whl
.
File metadata
- Download URL: recce-0.6.0-py3-none-any.whl
- Upload date:
- Size: 687.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 088b5621fb7d897416fbd8de24a43b86212284f4588f40cad7b59554bf709982 |
|
MD5 | 1ab18b52025fa04dc4309f054ddb6817 |
|
BLAKE2b-256 | 9ae46ba7d9a2e0e896ad98336f9c647fcc81bdd2b49a4f3233eec72d31497a62 |