Skip to main content

Data quality for dbt, without writing tests. Auto-generates dbt tests from DQLens profiling.

Project description

dbt-dqlens

CI PyPI

Data quality for dbt, without writing tests.

dbt-dqlens brings auto-generated data quality checks into your dbt project. It profiles your models, detects problems (null spikes, schema drift, empty strings, row count anomalies), and runs checks as native dbt tests.

You don't write tests. DQLens writes them for you.

Quick Start

1. Install

Add to your packages.yml:

packages:
  - package: vahid110/dbt_dqlens
    version: 0.2.0

Then:

dbt deps

2. Profile your models

dbt run --select dqlens_profile_results

This profiles every table in your schema (null rates, distinct counts, empty strings, value ranges) and stores the results in your warehouse.

3. Run checks

dbt test --select tag:dqlens

The catch-all test compares the current profile against the previous one and flags what changed. No configuration needed.

That's it.

Two dbt commands. No CLI. No Python. No YAML to write. Everything stays inside dbt.

Alternative: CLI approach

If you prefer a CLI workflow (e.g., for CI pipelines outside dbt):

pip install dbt-dqlens
dqlens-dbt profile        # profiles models using your profiles.yml
dqlens-dbt generate-tests # outputs _dqlens_tests.yml
dbt test --select tag:dqlens

2. Profile your models

After dbt run, profile your warehouse:

dqlens-dbt profile

This reads your profiles.yml, connects to the same warehouse dbt uses, profiles every model, and stores baselines.

3. Generate tests

dqlens-dbt generate-tests

This creates a _dqlens_tests.yml file with auto-generated tests for every model. Review it, commit it, done.

4. Run tests

dbt test --select tag:dqlens

Your auto-generated tests run as native dbt tests. Failures show up in dbt docs, dbt Cloud, and your CI pipeline.

What it detects

Check What it catches
Null drift Null rate increased significantly from baseline
Schema drift Columns added, removed, or type changed
Orphaned records FK references to non-existent rows
Empty strings Columns full of '' that look non-null but aren't
Outliers Values beyond 1.5x IQR bounds
Row count anomalies Unusual growth or shrinkage
Freshness Data that hasn't been updated recently
Pattern violations Values that don't match detected patterns (email, UUID, etc.)

How it works

dbt run --select dqlens_profile_results   (profiles all tables, stores in warehouse)
    |
dbt test --select tag:dqlens              (compares current vs baseline, flags changes)

On the first run, it profiles and stores a baseline. On subsequent runs, it compares against the previous profile and flags drift: null spikes, schema changes, row count anomalies, empty strings.

No external tools. No file writing. Everything lives in your warehouse.

The dqlens_findings model

Every profiling run materializes a dqlens_findings table in your warehouse:

column type description
finding_id text Unique identifier
table_name text Which model
column_name text Which column (null for table-level)
severity text HIGH / MEDIUM / LOW
category text null_anomaly, schema_change, fk_mismatch, etc.
message text Human-readable description
detail text Why it was flagged
current_value text Current metric value
baseline_value text Previous metric value
detected_at timestamp When the finding was detected

Query it in your BI tool, build alerts on it, or just SELECT * FROM dqlens.dqlens_findings WHERE severity = 'HIGH'.

Configuration

In your dbt_project.yml:

vars:
  dqlens:
    dqlens_schema: "dqlens"        # where findings table lives
    min_severity: "MEDIUM"          # only store MEDIUM+ findings
    exclude_tables: ["staging_*"]   # skip these models

vs other dbt quality packages

dbt_expectations elementary dbt-dqlens
Auto-generates tests No Partial Yes
Requires writing config Yes (per column) Yes (YAML) No
Drift detection No Yes (paid) Yes (free)
Baseline comparison No Yes (paid) Yes (free)
Outlier detection No Yes (paid) Yes (free)
Pricing Free Free + paid cloud Free

Requirements

  • dbt-core >= 1.0.0
  • Python with dqlens installed (pip install dqlens[duckdb] for DuckDB)
  • Supported databases: PostgreSQL, DuckDB, SQLite, MySQL (Snowflake, BigQuery coming soon)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_dqlens-0.3.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_dqlens-0.3.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file dbt_dqlens-0.3.0.tar.gz.

File metadata

  • Download URL: dbt_dqlens-0.3.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for dbt_dqlens-0.3.0.tar.gz
Algorithm Hash digest
SHA256 4fc18a5c9c96af6d26d570a181e546dbff4e2618531a9001f0e37bb0f4a58968
MD5 0b4439b464f505addc48be0802ca0cea
BLAKE2b-256 841d292b80e422a1c262f9c336c0aa44d639d56c55b0b6e5e3c9beaeb3258fc5

See more details on using hashes here.

File details

Details for the file dbt_dqlens-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: dbt_dqlens-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for dbt_dqlens-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5de096a94f742fbcde79dfab501b6b9fefff0c0575e8d202e1ff8c843e09822a
MD5 a8d3179bbcf2461df3da816b85a7eef9
BLAKE2b-256 0f502722529f9af51c6e7675160e31df68cc0c909a668223c1fe844eb5faf612

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page