dbt relation-export wrapper for DataForge data-quality repair audits.
Project description
dataforge-dbt
Run DataForge checks from a dbt model post-hook and keep the audit artifacts in your dbt target/ directory.
python -m pip install -e ".[dev]"
Quick start
1. Install the Python dependency
The PyPI package is not published yet. Install from this source checkout:
python -m pip install -e ".[dev]"
After PyPI publication, pip install dataforge_07_dbt will install the
integration plus the supported dbt-duckdb adapter path for the free local
integration-test warehouse.
2. Add the dbt package
In packages.yml:
packages:
- package: dataforge/dataforge
version: 0.1.0
During local development from a checkout, use:
packages:
- local: ../dataforge-dbt
Then run:
dbt deps
3. Add a model post-hook
{{ config(post_hook="{{ dataforge.dataforge_repair('column_x', mode='dry_run') }}") }}
select *
from {{ ref('my_source_model') }}
Supported modes:
dry_run: logs proposed DataForge findings as warnings and writes no transaction artifact.apply: writes a JSONL audit artifact undertarget/dataforge_txns/.refuse: fails the hook if DataForge detects anyUNSAFEissue.
4. Run dbt through the dispatcher
dataforge-dbt \
--relation main.my_model \
--column column_x \
--mode dry_run \
--target-path target \
--project-dir . \
--run-dbt
The macro emits a stable log line showing the dispatcher command for the
relation and column. The Python dispatcher is the execution boundary: it can
run dbt, export the relation through the adapter, run DataForge detectors, and
write audit artifacts under target/. The macro itself remains SQL-only
because dbt post-hooks do not provide a portable, safe shell execution surface.
5. Verify transaction artifacts
For mode='apply', verify that the target directory contains a DataForge transaction artifact:
ls target/dataforge_txns/
Each file is JSONL and includes the dbt relation, inspected column, UTC timestamp, and serialized DataForge issues.
Python dispatch entrypoint
The package also ships a Python command used by tests and automation:
dataforge-dbt \
--relation main.example_with_dirty_data \
--column column_x \
--mode dry_run \
--input-csv integration_tests/dbt_project/seeds/dirty_decimal_shift.csv \
--target-path target
This command reads the CSV with dtype=str, runs DataForge detectors, logs any findings with the prefix DATAFORGE_DBT, and enforces the selected mode.
Configuration block
You may add a conservative integration block to profiles.yml:
my_profile:
target: dev
outputs:
dev:
type: duckdb
path: ./dev.duckdb
dataforge:
mode: dry_run
target_path: target
Macro arguments remain the source of truth for the common case. The profile
block is useful for central defaults in CI; an explicit hook or dispatcher
--mode always overrides dataforge.mode from profiles.yml.
Free-tier test setup
The integration tests use DuckDB only. No Snowflake, BigQuery, or paid warehouse account is required.
python -m pip install -e ".[dev]"
python -m pytest
The test project includes a seed CSV with a known decimal-shift anomaly and asserts that DataForge logs the issue.
Publishing
The package is not published yet. The intended release path is PyPI Trusted Publishing after repository ownership is configured; do not add PyPI API tokens to GitHub Secrets.
When this is the wrong tool
Use dbt tests or warehouse constraints first for deterministic rules such as not_null, unique, accepted values, and referential integrity. Use dataforge-dbt when you want DataForge's anomaly detectors and repair audit trail around messy values that are hard to express as static SQL tests.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataforge_07_dbt-0.1.0.tar.gz.
File metadata
- Download URL: dataforge_07_dbt-0.1.0.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ddbf623c788acb04383b97d311e3e4725d3ee2dfbceee5e94db01d1b5163c873
|
|
| MD5 |
f9d391244d0d5dc62fbcf251a0550ee7
|
|
| BLAKE2b-256 |
8a0443df72e3f9c32574f90fff4e8a9ffed2c0c7d486ce3a229987dc34ff8776
|
Provenance
The following attestation bundles were made for dataforge_07_dbt-0.1.0.tar.gz:
Publisher:
publish-dataforge-dbt.yml on Aegis15/dataforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataforge_07_dbt-0.1.0.tar.gz -
Subject digest:
ddbf623c788acb04383b97d311e3e4725d3ee2dfbceee5e94db01d1b5163c873 - Sigstore transparency entry: 1807565107
- Sigstore integration time:
-
Permalink:
Aegis15/dataforge@d498b656734241e343673fafe1b11676b475bf60 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Aegis15
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-dataforge-dbt.yml@d498b656734241e343673fafe1b11676b475bf60 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file dataforge_07_dbt-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dataforge_07_dbt-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26f6ba9a1642ab8f8951321bb8c80ef63f85e239ce38f5d4a9011dc2a4a9955f
|
|
| MD5 |
fa8409969943c555f62c25940a27d7f1
|
|
| BLAKE2b-256 |
c27b9efb59ac9d7162d8e65dbb2ab4cd2a0c417a03061a1d87256b2fa80ed329
|
Provenance
The following attestation bundles were made for dataforge_07_dbt-0.1.0-py3-none-any.whl:
Publisher:
publish-dataforge-dbt.yml on Aegis15/dataforge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataforge_07_dbt-0.1.0-py3-none-any.whl -
Subject digest:
26f6ba9a1642ab8f8951321bb8c80ef63f85e239ce38f5d4a9011dc2a4a9955f - Sigstore transparency entry: 1807565148
- Sigstore integration time:
-
Permalink:
Aegis15/dataforge@d498b656734241e343673fafe1b11676b475bf60 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Aegis15
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-dataforge-dbt.yml@d498b656734241e343673fafe1b11676b475bf60 -
Trigger Event:
workflow_dispatch
-
Statement type: