Prebuilt dq-prof binary wrapper
Project description
dq-prof 0.1.8
Fast, zero-config data sanity checks for pipelines.
Run one command and instantly see if your data is broken. Like Ruff, but for datasets.
Quick example
dq-prof data.parquet
DATA HEALTH: WARN
CRITICAL
- created_at: stale timestamps (last value 2025-03-01)
WARNING
- revenue: nulls 12% (expected <5%)
- region: skew (US = 78%)
Titanic (real public data):
./examples/download_titanic.sh
dq-prof examples/titanic.csv --fail-on warning
DATA HEALTH: FAIL
Rows: 891 sampled of 891 (mode=Head)
CRITICAL
- age: high null ratio 19.87% (obs=0.1987, exp=< 0.05)
- deck: high null ratio 77.22% (obs=0.7722, exp=< 0.05)
WARNING
- sibsp: outlier ratio 3.37% (obs=0.0337, exp=< 0.03)
- survived: distinct ratio very low (obs=0.0022, exp=> 0.01)
- pclass: distinct ratio very low (obs=0.0034, exp=> 0.01)
- sex: distinct ratio very low (obs=0.0022, exp=> 0.01)
- sibsp: distinct ratio very low (obs=0.0079, exp=> 0.01)
Baseline vs drift example:
dq-prof examples/clean_sales.csv --full-scan --save-baseline baseline_clean.json
dq-prof examples/drift_sales.csv --baseline baseline_clean.json --fail-on warning --color never
DATA HEALTH: WARN
CRITICAL
- region: top value dominates 100.0% of rows (obs=1.0000, exp=< 0.75)
WARNING
- region: top value share increased by 40.0pp vs baseline (obs=1.0000, exp=0.6000)
Why dq-prof
- Zero config – no YAML, no expectations to write
- Fast – runs in seconds on sampled data
- Catches real issues – null spikes, skew, outliers, freshness, schema drift
- Baseline-aware – detect changes vs previous runs
- CI-friendly – fail pipelines when data looks wrong
Install
Download a release binary and run:
chmod +x dq-prof
./dq-prof data.parquet
Or pip:
pip install dq-prof
dq-prof --help
Examples
Inspect a file:
dq-prof examples/clean_sales.csv
Compare to baseline (full scan required to save):
dq-prof data.csv --full-scan --save-baseline baseline.json
dq-prof data.csv --baseline baseline.json --fail-on warning
Postgres:
dq-prof public.sales \
--pg-url postgres://user:pass@host/db \
--sample-rows 50000
Output
Text or JSON with severity:
- CRITICAL – likely broken data
- WARNING – suspicious change
Philosophy
dq-prof is not a data observability platform. It’s a fast sanity check you run inline — a linter for data — before or after a pipeline step to catch issues immediately.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dq_prof-0.1.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: dq_prof-0.1.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 20.5 MB
- Tags: Python 3, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e4b72f233ce504c9bef67155caa09f54eb17620b4d36e22a89a64c7da2e451f
|
|
| MD5 |
7e3fa967100d5298d8069244bc579950
|
|
| BLAKE2b-256 |
977bf5915908862c140f5608d9c3ea15cb59754c95be13336b1e394efb39b53f
|
Provenance
The following attestation bundles were made for dq_prof-0.1.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on kraftaa/dq-prof
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dq_prof-0.1.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
6e4b72f233ce504c9bef67155caa09f54eb17620b4d36e22a89a64c7da2e451f - Sigstore transparency entry: 1140483716
- Sigstore integration time:
-
Permalink:
kraftaa/dq-prof@cc564857091caab60833182c518c4eb601106d71 -
Branch / Tag:
refs/tags/v0.1.19 - Owner: https://github.com/kraftaa
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@cc564857091caab60833182c518c4eb601106d71 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dq_prof-0.1.19-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: dq_prof-0.1.19-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 19.6 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80ffa9bad44e7ba5e0a8798375ad4a2d110f08afd721282a554d11f32e686bff
|
|
| MD5 |
2c59705918cbc5dfd9685cb3dc174872
|
|
| BLAKE2b-256 |
07638bd93528fae032121c3b49d641a8c8c037f7f17ce7fa7ffa17327810cd61
|
Provenance
The following attestation bundles were made for dq_prof-0.1.19-py3-none-macosx_11_0_arm64.whl:
Publisher:
release.yml on kraftaa/dq-prof
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dq_prof-0.1.19-py3-none-macosx_11_0_arm64.whl -
Subject digest:
80ffa9bad44e7ba5e0a8798375ad4a2d110f08afd721282a554d11f32e686bff - Sigstore transparency entry: 1140483585
- Sigstore integration time:
-
Permalink:
kraftaa/dq-prof@cc564857091caab60833182c518c4eb601106d71 -
Branch / Tag:
refs/tags/v0.1.19 - Owner: https://github.com/kraftaa
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@cc564857091caab60833182c518c4eb601106d71 -
Trigger Event:
push
-
Statement type: