Skip to main content

Auto-generated data quality testing — find data problems automatically, no config, no test writing.

Project description

DQLens

CI PyPI

Find data problems automatically. No config, no test writing.

DQLens auto-generates data quality tests by profiling your database. No YAML, no Python, no configuration files. Just point it at your database and get instant visibility into data quality issues.

Quick Start

pip install dqlens

# Initialize (stores connection config)
dqlens init postgres://localhost/mydb --schema public

# Profile your database (auto-generates tests)
dqlens profile

# Run checks and see problems
dqlens run

What It Does

DQLens connects to your database, profiles every table, and automatically generates tests based on what it finds:

  • Null anomalies: columns with unexpected null rates or null rate drift
  • Uniqueness violations: duplicate values in columns that should be unique
  • Foreign key mismatches: orphaned rows referencing non-existent records
  • Pattern violations: values that don't match detected patterns (email, UUID, URL, etc.)
  • Row count anomalies: unusual growth or shrinkage compared to baseline
  • Freshness checks: data that hasn't been updated recently
  • Distribution shifts: value range changes between profiles

Signal Over Coverage

DQLens shows problems first, not 20 green checkmarks:

public.orders: 14 tests, 11 passed, 3 PROBLEMS FOUND

  PROBLEMS:
  HIGH   customer_id: 142 rows reference non-existent customers (FK mismatch)
  HIGH   email: 3.2% null (was 0.1% in baseline), 32x increase
  MEDIUM orders grew 47% today (usual daily growth: 2-5%)

  ✓ 11 checks passed (use --verbose to see all)

Every finding includes:

  • Severity level (HIGH / MEDIUM / LOW)
  • Explanation of why it was flagged
  • Baseline comparison when available

Commands

Command Description
dqlens init <url> Initialize config with database connection
dqlens profile Profile tables and save baseline
dqlens profile --quick Quick mode: sample data, under 5 seconds
dqlens run Run checks, show problems
dqlens run --verbose Show all checks including passing
dqlens run --focus high Only HIGH severity findings
dqlens run --ci Exit code 1 on failure (for CI/CD)
dqlens run --json-output Output as JSON
dqlens diff Compare two most recent profiles
dqlens diff --json-output Diff as JSON
dqlens ignore <key> Suppress a known finding

Python API

import dqlens

suite = dqlens.profile("postgres://localhost/mydb", schema="public")
results = suite.run()

for table in results:
    for test in table.tests:
        if test.failed:
            print(f"{table.name}.{test.column}: {test.message}")

Supported Databases

  • PostgreSQL
  • SQLite
  • MySQL
  • Parquet, CSV (coming soon)

dbt Integration

Using dbt? dbt-dqlens auto-generates native dbt test YAML from profiling results. No more writing not_null and unique by hand.

pip install dbt-dqlens
dqlens-dbt profile        # profiles models using your profiles.yml
dqlens-dbt generate-tests # outputs _dqlens_tests.yml
dbt test --select tag:dqlens

Development

# Clone and install
git clone https://github.com/vahid110/dqlens.git
cd dqlens
pip install -e ".[dev]"

# Run unit tests (no database needed)
pytest tests/ -k "unit" -v

# Run integration tests (needs PostgreSQL, see .env.example)
pytest tests/ -k "integration" -v

# Run all tests
pytest tests/ -v

Demo

See demo/README.md for a 5-minute walkthrough with a local PostgreSQL database.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dqlens-0.4.0.tar.gz (86.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dqlens-0.4.0-py3-none-any.whl (64.8 kB view details)

Uploaded Python 3

File details

Details for the file dqlens-0.4.0.tar.gz.

File metadata

  • Download URL: dqlens-0.4.0.tar.gz
  • Upload date:
  • Size: 86.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dqlens-0.4.0.tar.gz
Algorithm Hash digest
SHA256 dade75d7ea8210c4561c94359ec13862ac1d958192eea30b6e7a89f2ad808099
MD5 dc1de720c56ba3db384d517ab5afb961
BLAKE2b-256 70f4475f93f2dc39a4ae27b8c49f4a7faf5b83ca7e39b88c971e842dfbe6662a

See more details on using hashes here.

File details

Details for the file dqlens-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: dqlens-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 64.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dqlens-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af4bd63d15eac4cf66f6c433ba4c54f05f3b19cf49a11f8aa98a3e9ef4229148
MD5 e4426c61d49b771f772279459773f59e
BLAKE2b-256 d4766c67f50e46720aa9246263bdd4214a146805f5bcca05eb04f9bd109e26ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page