The standard benchmark for data quality and validation tools

These details have not been verified by PyPI

Project links

Project description

DQBench

The standard benchmark for data quality and validation tools.

Python 3.11+ Tests License: MIT

The ImageNet of data quality — standardized benchmarks for validation tools.

Why DQBench?

Every data validation tool claims to be the best. But there's no standard way to compare them. DQBench fixes that with:

Three difficulty tiers — basics, realistic, and adversarial
Ground truth — every planted issue is documented with affected rows
Fair scoring — recall AND precision matter (no gaming by flagging everything)
One number — DQBench Score (0-100) for easy comparison
20-line integration — implement one method to benchmark any tool

Install

pip install dqbench

Quick Start

# Run with built-in GoldenCheck adapter
pip install goldencheck
dqbench run goldencheck

# Run with a custom adapter
dqbench run --adapter my_adapter.py

Tiers

Tier	Rows	Columns	Domain	Difficulty
1 — Basics	5,000	20	Customer DB	Obvious errors, baseline
2 — Realistic	50,000	30	E-commerce	Subtle issues + false positive traps
3 — Adversarial	100,000	50	Healthcare	Encoding traps, semantic errors, cross-column logic

Each tier has columns WITH planted issues and columns WITHOUT (false positive traps). Tools that flag clean columns lose precision points.

Scoring

Metric	Description
Recall	% of planted-issue columns detected
Precision	% of flagged columns that actually have issues
F1	Harmonic mean of recall and precision
FPR	Clean columns incorrectly flagged (WARNING/ERROR only)
DQBench Score	Tier1_F1 × 20% + Tier2_F1 × 40% + Tier3_F1 × 40%

Write Your Own Adapter

Implement one class to benchmark any tool:

from dqbench.adapters.base import DQBenchAdapter
from dqbench.models import DQBenchFinding
from pathlib import Path

class MyToolAdapter(DQBenchAdapter):
    @property
    def name(self) -> str:
        return "MyTool"

    @property
    def version(self) -> str:
        return "1.0.0"

    def validate(self, csv_path: Path) -> list[DQBenchFinding]:
        # Run your tool on the CSV
        # Return a list of DQBenchFinding objects
        return [
            DQBenchFinding(
                column="email",
                severity="error",      # "error", "warning", or "info"
                check="format",         # what kind of issue
                message="Invalid email format",
                confidence=0.9,         # optional, 0.0-1.0
            )
        ]

Then run:

dqbench run --adapter my_adapter.py

CLI Reference

Command	Description
`dqbench run <adapter>`	Run benchmark
`dqbench run --adapter <path>`	Run with custom adapter file
`dqbench run <adapter> --tier 2`	Run specific tier only
`dqbench run <adapter> --json`	JSON output
`dqbench generate`	Generate/cache datasets
`dqbench generate --force`	Regenerate datasets

Built-in Adapters

Adapter	Tool	Install
`goldencheck`	GoldenCheck	`pip install goldencheck`

Want to add your tool? See CONTRIBUTING.md.

Reproducibility

Datasets are generated deterministically (random.seed(42), stdlib only)
Canonical datasets committed as release artifacts
Version-locked: published benchmark versions are immutable

License

MIT

From the maker of GoldenCheck and GoldenMatch.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dqbench-1.0.0.tar.gz (47.2 kB view details)

Uploaded Mar 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dqbench-1.0.0-py3-none-any.whl (34.8 kB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file dqbench-1.0.0.tar.gz.

File metadata

Download URL: dqbench-1.0.0.tar.gz
Upload date: Mar 23, 2026
Size: 47.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for dqbench-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`57a51d21c1ea337b38737385b14462108b57d46adf541d9947283c4f1586f04d`
MD5	`7ffa267d75e1ff5a4aed7283f5b6278e`
BLAKE2b-256	`d667a4408bb4e5a3370f4ec7684065fb5a5bd40969f15b9e4ce6c8c3907f213a`

See more details on using hashes here.

File details

Details for the file dqbench-1.0.0-py3-none-any.whl.

File metadata

Download URL: dqbench-1.0.0-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 34.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for dqbench-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e2a86b06c8802e207a37179f05db79380899e1a56e4643b8eda3dc900ae7ebd`
MD5	`ef69a02f47896f41034dcefd6bb1587b`
BLAKE2b-256	`93281a98a98c28594f88e38beb5cf431a7f650ce6bc62db294ae4983e4bca20c`

See more details on using hashes here.

dqbench 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DQBench

Why DQBench?

Install

Quick Start

Tiers

Scoring

Write Your Own Adapter

CLI Reference

Built-in Adapters

Reproducibility

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes