Data quality for dbt, without writing tests. Auto-generates dbt tests from DQLens profiling.
Project description
dbt-dqlens
Data quality for dbt, without writing tests.
dbt-dqlens brings auto-generated data quality checks into your dbt project. It profiles your models, detects problems (null spikes, schema drift, empty strings, row count anomalies), and runs checks as native dbt tests.
You don't write tests. DQLens writes them for you.
Quick Start
1. Install
Add to your packages.yml:
packages:
- package: vahid110/dbt_dqlens
version: 0.2.0
Then:
dbt deps
2. Profile your models
dbt run --select dqlens_profile_results
This profiles every table in your schema (null rates, distinct counts, empty strings, value ranges) and stores the results in your warehouse.
3. Run checks
dbt test --select tag:dqlens
The catch-all test compares the current profile against the previous one and flags what changed. No configuration needed.
That's it.
Two dbt commands. No CLI. No Python. No YAML to write. Everything stays inside dbt.
Alternative: CLI approach
If you prefer a CLI workflow (e.g., for CI pipelines outside dbt):
pip install dbt-dqlens
dqlens-dbt profile # profiles models using your profiles.yml
dqlens-dbt generate-tests # outputs _dqlens_tests.yml
dbt test --select tag:dqlens
2. Profile your models
After dbt run, profile your warehouse:
dqlens-dbt profile
This reads your profiles.yml, connects to the same warehouse dbt uses, profiles every model, and stores baselines.
3. Generate tests
dqlens-dbt generate-tests
This creates a _dqlens_tests.yml file with auto-generated tests for every model. Review it, commit it, done.
4. Run tests
dbt test --select tag:dqlens
Your auto-generated tests run as native dbt tests. Failures show up in dbt docs, dbt Cloud, and your CI pipeline.
What it detects
| Check | What it catches |
|---|---|
| Null drift | Null rate increased significantly from baseline |
| Schema drift | Columns added, removed, or type changed |
| Orphaned records | FK references to non-existent rows |
| Empty strings | Columns full of '' that look non-null but aren't |
| Outliers | Values beyond 1.5x IQR bounds |
| Row count anomalies | Unusual growth or shrinkage |
| Freshness | Data that hasn't been updated recently |
| Pattern violations | Values that don't match detected patterns (email, UUID, etc.) |
How it works
dbt run --select dqlens_profile_results (profiles all tables, stores in warehouse)
|
dbt test --select tag:dqlens (compares current vs baseline, flags changes)
On the first run, it profiles and stores a baseline. On subsequent runs, it compares against the previous profile and flags drift: null spikes, schema changes, row count anomalies, empty strings.
No external tools. No file writing. Everything lives in your warehouse.
The dqlens_findings model
Every profiling run materializes a dqlens_findings table in your warehouse:
| column | type | description |
|---|---|---|
| finding_id | text | Unique identifier |
| table_name | text | Which model |
| column_name | text | Which column (null for table-level) |
| severity | text | HIGH / MEDIUM / LOW |
| category | text | null_anomaly, schema_change, fk_mismatch, etc. |
| message | text | Human-readable description |
| detail | text | Why it was flagged |
| current_value | text | Current metric value |
| baseline_value | text | Previous metric value |
| detected_at | timestamp | When the finding was detected |
Query it in your BI tool, build alerts on it, or just SELECT * FROM dqlens.dqlens_findings WHERE severity = 'HIGH'.
Configuration
In your dbt_project.yml:
vars:
dqlens:
dqlens_schema: "dqlens" # where findings table lives
min_severity: "MEDIUM" # only store MEDIUM+ findings
exclude_tables: ["staging_*"] # skip these models
vs other dbt quality packages
| dbt_expectations | elementary | dbt-dqlens | |
|---|---|---|---|
| Auto-generates tests | No | Partial | Yes |
| Requires writing config | Yes (per column) | Yes (YAML) | No |
| Drift detection | No | Yes (paid) | Yes (free) |
| Baseline comparison | No | Yes (paid) | Yes (free) |
| Outlier detection | No | Yes (paid) | Yes (free) |
| Pricing | Free | Free + paid cloud | Free |
Requirements
- dbt-core >= 1.0.0
- Python with
dqlensinstalled (pip install dqlens[duckdb]for DuckDB) - Supported databases: PostgreSQL, DuckDB, SQLite, MySQL (Snowflake, BigQuery coming soon)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt_dqlens-0.3.0.tar.gz.
File metadata
- Download URL: dbt_dqlens-0.3.0.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fc18a5c9c96af6d26d570a181e546dbff4e2618531a9001f0e37bb0f4a58968
|
|
| MD5 |
0b4439b464f505addc48be0802ca0cea
|
|
| BLAKE2b-256 |
841d292b80e422a1c262f9c336c0aa44d639d56c55b0b6e5e3c9beaeb3258fc5
|
File details
Details for the file dbt_dqlens-0.3.0-py3-none-any.whl.
File metadata
- Download URL: dbt_dqlens-0.3.0-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5de096a94f742fbcde79dfab501b6b9fefff0c0575e8d202e1ff8c843e09822a
|
|
| MD5 |
a8d3179bbcf2461df3da816b85a7eef9
|
|
| BLAKE2b-256 |
0f502722529f9af51c6e7675160e31df68cc0c909a668223c1fe844eb5faf612
|