Skip to main content

Environment diff tool for dbt

Project description

Recce

install pipy Python downloads license

InfuseAI Discord Invite

Recce is a PR review tool designed for dbt projects. It facilitates the comparison of results between two environments, such as development and production, and helps in identifying any differences.

Features

Use cases

Recce is primarily designed for PR Review purposes. However, it can be expanded to the following use cases:

  1. During development, we can verify new results by contrasting them with those from production prior to pushing the changes.
  2. While reviewing PR, you can grasp the extent of the changes and their impact before merging.
  3. For troubleshooting, you can execute ad-hoc diff queries to pinpoint the root causes.

Usage

Prerequisites

You have to have at least two environments in your dbt project. For example, one is for development and another is for production. You can prepare two targets with separate schemas in your dbt profile. Here is profiles.yml example

jaffle_shop:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: jaffle_shop.duckdb
      schema: dev
    prod:
      type: duckdb
      path: jaffle_shop.duckdb
      schema: main

Getting Started

5 minutes walkthrough by jaffle shop example

  1. Installation

    pip install recce
    
  2. Go to your dbt project

    cd your-dbt-project/
    
  3. Prepare artifacts for base environment in target-base/ folder

    git checkout main
    
    # Generate artifacts for base environment to 'target-base'
    dbt docs generate --target prod --target-path target-base/
    
  4. Prepare artifacts for current working environment

    git checkout feature/my-awesome-feature
    
    # Run dbt and generate artifacts for current working environments
    dbt run
    dbt docs generate
    
  5. Run the recce server.

    recce server
    
    # or with persistent state
    # recce server issue-123.json
    

    Recce would diff environments between target/ and target-base/

Recce use dbt artifacts, which is generated when every invocation. You can find these files in the target/ folder.

artifacts dbt command
manifest.json dbt docs generate, dbt run, ..
catalog.json dbt docs generate

[!TIP] The regeneration of the catalog.json file is not required after every dbt run. it is only required to regenerate this file when models or columns are added or updated.

Lineage Diff

Ad-Hoc Query and Query Diff

You can use any dbt macros installed in your project.

select * from {{ ref("mymodel") }}

In the query diff, which involves comparing the results from two different environments, the browser is required to pull all result data to the client side. Consequently, minimizing the data volume in the query results is essential for efficiency and performance.

[!TIP] Hotkeys Cmd + Enter: Run query Cmd + Shift + Enter: Run query diff

Schema diff

To use schema diff, make sure that both environments have catalog.json.

Row count diff

Profile diff

Profile diff uses the get_profile from dbt_profiler. Make sure that this package is installed in your project.

packages:
  - package: data-mie/dbt_profiler
    version: <version>

Please reference [dbt-profiler] to understand the definition of each profiling stats.

Value diff

  1. Added: New added PKs.
  2. Removed: Removed PKs.
  3. Matched: For a column, the count of matched value of common PKs.
  4. Matched %: For a column, the ratio of matched over common PKs.

PK: Primary key

Value diff uses the compare_column_values from audit-helper. Make sure that this package is installed in your project.

packages:
  - package: dbt-labs/audit_helper
    version: <version>

Value diff requires to select a column as private key. The catalog.json is required to list the available columns.

Checklist

When you feel that your query is worth recording, you can add it to the checklist and name the title and leave a description. This is very helpful for subsequent posting in PR comments.

Q&A

Q: How recce connect to my data warehouse? Does recce support my data warehouse?

recce use the dbt adapter to connect to your warehouse. So it should work for your data warehouse.

Q: What credential does recce connect to the two environments?

Recce uses the same target in the profile to connect your warehouse. If you use the default target dev, it uses the credentials to connect to both environments. So please make sure that the credential can access both environments.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recce-0.8.0.tar.gz (798.8 kB view details)

Uploaded Source

Built Distribution

recce-0.8.0-py3-none-any.whl (812.9 kB view details)

Uploaded Python 3

File details

Details for the file recce-0.8.0.tar.gz.

File metadata

  • Download URL: recce-0.8.0.tar.gz
  • Upload date:
  • Size: 798.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for recce-0.8.0.tar.gz
Algorithm Hash digest
SHA256 49ceabcdf5e800c4e957dafe21ad78792b4f04c745548b926ad6c2047149096d
MD5 cf17067c9a7794cb8c54f087f6d242e9
BLAKE2b-256 9e2583cfe4f048ead20da775b0b8bcb0c1ba0479fdd7a8af32f92195ede93e06

See more details on using hashes here.

File details

Details for the file recce-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: recce-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 812.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for recce-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 833f30694e443b712d9ae9b9556816730ac16c5a47ca37af5297514287bec016
MD5 06cf8308451ad8e00c7e2d2e82727c81
BLAKE2b-256 f4e9e8ee0bb2548ee5da1a6cb907b16daf9aac2e324831399ad4410902ef108d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page