Skip to main content

dbt relation-export wrapper for DataForge data-quality repair audits.

Project description

dataforge-dbt

Run DataForge checks from a dbt model post-hook and keep the audit artifacts in your dbt target/ directory.

python -m pip install -e ".[dev]"

Quick start

1. Install the Python dependency

The PyPI package is not published yet. Install from this source checkout:

python -m pip install -e ".[dev]"

After PyPI publication, pip install dataforge_07_dbt will install the integration plus the supported dbt-duckdb adapter path for the free local integration-test warehouse.

2. Add the dbt package

In packages.yml:

packages:
  - package: dataforge/dataforge
    version: 0.1.0

During local development from a checkout, use:

packages:
  - local: ../dataforge-dbt

Then run:

dbt deps

3. Add a model post-hook

{{ config(post_hook="{{ dataforge.dataforge_repair('column_x', mode='dry_run') }}") }}

select *
from {{ ref('my_source_model') }}

Supported modes:

  • dry_run: logs proposed DataForge findings as warnings and writes no transaction artifact.
  • apply: writes a JSONL audit artifact under target/dataforge_txns/.
  • refuse: fails the hook if DataForge detects any UNSAFE issue.

4. Run dbt through the dispatcher

dataforge-dbt \
  --relation main.my_model \
  --column column_x \
  --mode dry_run \
  --target-path target \
  --project-dir . \
  --run-dbt

The macro emits a stable log line showing the dispatcher command for the relation and column. The Python dispatcher is the execution boundary: it can run dbt, export the relation through the adapter, run DataForge detectors, and write audit artifacts under target/. The macro itself remains SQL-only because dbt post-hooks do not provide a portable, safe shell execution surface.

5. Verify transaction artifacts

For mode='apply', verify that the target directory contains a DataForge transaction artifact:

ls target/dataforge_txns/

Each file is JSONL and includes the dbt relation, inspected column, UTC timestamp, and serialized DataForge issues.

Python dispatch entrypoint

The package also ships a Python command used by tests and automation:

dataforge-dbt \
  --relation main.example_with_dirty_data \
  --column column_x \
  --mode dry_run \
  --input-csv integration_tests/dbt_project/seeds/dirty_decimal_shift.csv \
  --target-path target

This command reads the CSV with dtype=str, runs DataForge detectors, logs any findings with the prefix DATAFORGE_DBT, and enforces the selected mode.

Configuration block

You may add a conservative integration block to profiles.yml:

my_profile:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: ./dev.duckdb
  dataforge:
    mode: dry_run
    target_path: target

Macro arguments remain the source of truth for the common case. The profile block is useful for central defaults in CI; an explicit hook or dispatcher --mode always overrides dataforge.mode from profiles.yml.

Free-tier test setup

The integration tests use DuckDB only. No Snowflake, BigQuery, or paid warehouse account is required.

python -m pip install -e ".[dev]"
python -m pytest

The test project includes a seed CSV with a known decimal-shift anomaly and asserts that DataForge logs the issue.

Publishing

The package is not published yet. The intended release path is PyPI Trusted Publishing after repository ownership is configured; do not add PyPI API tokens to GitHub Secrets.

When this is the wrong tool

Use dbt tests or warehouse constraints first for deterministic rules such as not_null, unique, accepted values, and referential integrity. Use dataforge-dbt when you want DataForge's anomaly detectors and repair audit trail around messy values that are hard to express as static SQL tests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataforge_07_dbt-0.1.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataforge_07_dbt-0.1.0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file dataforge_07_dbt-0.1.0.tar.gz.

File metadata

  • Download URL: dataforge_07_dbt-0.1.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataforge_07_dbt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ddbf623c788acb04383b97d311e3e4725d3ee2dfbceee5e94db01d1b5163c873
MD5 f9d391244d0d5dc62fbcf251a0550ee7
BLAKE2b-256 8a0443df72e3f9c32574f90fff4e8a9ffed2c0c7d486ce3a229987dc34ff8776

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataforge_07_dbt-0.1.0.tar.gz:

Publisher: publish-dataforge-dbt.yml on Aegis15/dataforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataforge_07_dbt-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dataforge_07_dbt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 26f6ba9a1642ab8f8951321bb8c80ef63f85e239ce38f5d4a9011dc2a4a9955f
MD5 fa8409969943c555f62c25940a27d7f1
BLAKE2b-256 c27b9efb59ac9d7162d8e65dbb2ab4cd2a0c417a03061a1d87256b2fa80ed329

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataforge_07_dbt-0.1.0-py3-none-any.whl:

Publisher: publish-dataforge-dbt.yml on Aegis15/dataforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page