Skip to main content

Prebuilt dq-prof binary wrapper

Project description

dq-prof 0.1.8

Release

Fast, zero-config data sanity checks for pipelines.

Run one command and instantly see if your data is broken. Like Ruff, but for datasets.

Quick example

dq-prof data.parquet
DATA HEALTH: WARN

CRITICAL
- created_at: stale timestamps (last value 2025-03-01)

WARNING
- revenue: nulls 12% (expected <5%)
- region: skew (US = 78%)

Titanic (real public data):

./examples/download_titanic.sh
dq-prof examples/titanic.csv --fail-on warning

DATA HEALTH: FAIL
Rows: 891 sampled of 891 (mode=Head)

CRITICAL
- age: high null ratio 19.87% (obs=0.1987, exp=< 0.05)
- deck: high null ratio 77.22% (obs=0.7722, exp=< 0.05)

WARNING
- sibsp: outlier ratio 3.37% (obs=0.0337, exp=< 0.03)
- survived: distinct ratio very low (obs=0.0022, exp=> 0.01)
- pclass: distinct ratio very low (obs=0.0034, exp=> 0.01)
- sex: distinct ratio very low (obs=0.0022, exp=> 0.01)
- sibsp: distinct ratio very low (obs=0.0079, exp=> 0.01)

Baseline vs drift example:

dq-prof examples/clean_sales.csv --full-scan --save-baseline baseline_clean.json
dq-prof examples/drift_sales.csv --baseline baseline_clean.json --fail-on warning --color never

DATA HEALTH: WARN
CRITICAL
- region: top value dominates 100.0% of rows (obs=1.0000, exp=< 0.75)

WARNING
- region: top value share increased by 40.0pp vs baseline (obs=1.0000, exp=0.6000)

Why dq-prof

  • Zero config – no YAML, no expectations to write
  • Fast – runs in seconds on sampled data
  • Catches real issues – null spikes, skew, outliers, freshness, schema drift
  • Baseline-aware – detect changes vs previous runs
  • CI-friendly – fail pipelines when data looks wrong

Install

Download a release binary and run:

chmod +x dq-prof
./dq-prof data.parquet

Or pip:

pip install dq-prof
dq-prof --help

Examples

Inspect a file:

dq-prof examples/clean_sales.csv

Compare to baseline (full scan required to save):

dq-prof data.csv --full-scan --save-baseline baseline.json
dq-prof data.csv --baseline baseline.json --fail-on warning

Postgres:

dq-prof public.sales \
  --pg-url postgres://user:pass@host/db \
  --sample-rows 50000

Output

Text or JSON with severity:

  • CRITICAL – likely broken data
  • WARNING – suspicious change

Philosophy

dq-prof is not a data observability platform. It’s a fast sanity check you run inline — a linter for data — before or after a pipeline step to catch issues immediately.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dq_prof-0.1.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

dq_prof-0.1.19-py3-none-macosx_11_0_arm64.whl (19.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file dq_prof-0.1.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dq_prof-0.1.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6e4b72f233ce504c9bef67155caa09f54eb17620b4d36e22a89a64c7da2e451f
MD5 7e3fa967100d5298d8069244bc579950
BLAKE2b-256 977bf5915908862c140f5608d9c3ea15cb59754c95be13336b1e394efb39b53f

See more details on using hashes here.

Provenance

The following attestation bundles were made for dq_prof-0.1.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on kraftaa/dq-prof

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dq_prof-0.1.19-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dq_prof-0.1.19-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 80ffa9bad44e7ba5e0a8798375ad4a2d110f08afd721282a554d11f32e686bff
MD5 2c59705918cbc5dfd9685cb3dc174872
BLAKE2b-256 07638bd93528fae032121c3b49d641a8c8c037f7f17ce7fa7ffa17327810cd61

See more details on using hashes here.

Provenance

The following attestation bundles were made for dq_prof-0.1.19-py3-none-macosx_11_0_arm64.whl:

Publisher: release.yml on kraftaa/dq-prof

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page