Skip to main content

Environment diff tool for dbt

Project description

recce

recce is a environment diff tool for dbt

Features

  1. Support both Web UI & CLI
  2. Multiple diff tools, including lineage diff, schema diff, and query diff. And more in the future.
  3. Use the dbt-core adapter framework to connect to your data warehouse. No additional configuration required.

Use cases

  1. During development, we can verify new results by contrasting them with those from production prior to pushing the changes.
  2. While reviewing PR, you can grasp the extent of the changes and their impact before merge.
  3. For troubleshooting, you can execute ad-hoc diff queries to pinpoint the root causes.

Usage

Prerequisites

You have to have at least two environments in your dbt project. For example, one is for developing and another is for production. You can prepare two targets with separate schemas in you dbt profile. Here is profiles.yml example

jaffle_shop:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: jaffle_shop.duckdb
      schema: dev
    prod:
      type: duckdb
      path: jaffle_shop.duckdb
      schema: main

Getting Started

  1. Installation

    pip install recce
    
  2. Recce use dbt artifacts to interact with your dbt project. You need to prepare the artifacts for the base environment.

    # transform the data to data warehouse
    dbt run --target prod
    
    # generate the catalog.json
    dbt docs generate --target prod
    

    The artifacts are generated within the target/ directory. Copy these artifacts into the target-base/ directory as the base state to diff.

    mkdir -p target-base/
    cp -R target/ target-base/
    
  3. Develop your awesome features

    # transform the data to data warehouse
    dbt run
    
    # generate the catalog.json
    dbt docs generate
    
  4. Run the recce server

    recce server
    

    and open the url link

  5. Check the lineage diff to see the modified node. Click one node to see the schema difference.

  6. Switch to query tab, Write and run a query diff. It would query on the both side and diff the query results.

    select * from {{ ref("mymodel") }}
    

    where ref is a Jinja function to reference a model name.

Query Diff

You can run query diff in both Web UI and CLI

  • Web UI: Go to Query tab

    select * from {{ ref("mymodel") }}
    
  • CLI:

    recce diff --sql 'select * from {{ ref("mymodel") }}'
    

Primay key

In the query diff, primary key columns serve as the fundamental identifiers for distinguishing each record uniquely across both sides.

  • Web UI: In the query result, click the key icons in the column headers to toggle if it is in the primary key list.

  • CLI: Use the option --primary-keys to specify the primary keys. Use comma to separate the columns if it is a compound key.

    recce diff --primary-keys event_id --sql 'select * from {{ ref("events") }} order by 1'
    

Q&A

Q: How recce connect to my data warehouse? Does recce support my data warehouse?

recce use the dbt adapter to connect to your warehouse. So it should work for your data warehouse.

Q: What credential does recce connect to the two environments?

Recce uses the same target in the profile to connect your warehouse. If you use the default target dev, it use the credentials to connect to both environments. So please make sure that the credential able to access both environments.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recce-nightly-0.1.0.20231128.tar.gz (589.3 kB view details)

Uploaded Source

Built Distribution

recce_nightly-0.1.0.20231128-py3-none-any.whl (597.4 kB view details)

Uploaded Python 3

File details

Details for the file recce-nightly-0.1.0.20231128.tar.gz.

File metadata

  • Download URL: recce-nightly-0.1.0.20231128.tar.gz
  • Upload date:
  • Size: 589.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for recce-nightly-0.1.0.20231128.tar.gz
Algorithm Hash digest
SHA256 95cbec356e23113e232d67a705845c3d244c8429fc27024aa73157eebbdd37af
MD5 f9bcabbb00a50e5cd8f91092f5739508
BLAKE2b-256 db5d154041e707264aa75846d61096270b750c9d081ffe0466b166ff13307264

See more details on using hashes here.

File details

Details for the file recce_nightly-0.1.0.20231128-py3-none-any.whl.

File metadata

File hashes

Hashes for recce_nightly-0.1.0.20231128-py3-none-any.whl
Algorithm Hash digest
SHA256 de03d2ab8ec2f89bd1afd3a61ba5ebd6af014183f30e95b8dbbfa56d67785a3b
MD5 da04a83283762c9e4138ca325ccf2ba1
BLAKE2b-256 b2dc08b109f470833dc119c243749573114c6ecbb47c33f30080cb67b3d2e007

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page