Skip to main content

Prebuilt dq-prof binary wrapper

Project description

dq-prof 0.1.8

Release

Fast, zero-config data sanity checks for pipelines.

Run one command and instantly see if your data is broken. Like Ruff, but for datasets.

Quick example

dq-prof data.parquet
DATA HEALTH: WARN

CRITICAL
- created_at: stale timestamps (last value 2025-03-01)

WARNING
- revenue: nulls 12% (expected <5%)
- region: skew (US = 78%)

Titanic (real public data):

./examples/download_titanic.sh
dq-prof examples/titanic.csv --fail-on warning

DATA HEALTH: FAIL
Rows: 891 sampled of 891 (mode=Head)

CRITICAL
- age: high null ratio 19.87% (obs=0.1987, exp=< 0.05)
- deck: high null ratio 77.22% (obs=0.7722, exp=< 0.05)

WARNING
- sibsp: outlier ratio 3.37% (obs=0.0337, exp=< 0.03)
- survived: distinct ratio very low (obs=0.0022, exp=> 0.01)
- pclass: distinct ratio very low (obs=0.0034, exp=> 0.01)
- sex: distinct ratio very low (obs=0.0022, exp=> 0.01)
- sibsp: distinct ratio very low (obs=0.0079, exp=> 0.01)

Baseline vs drift example:

dq-prof examples/clean_sales.csv --full-scan --save-baseline baseline_clean.json
dq-prof examples/drift_sales.csv --baseline baseline_clean.json --fail-on warning --color never

DATA HEALTH: WARN
CRITICAL
- region: top value dominates 100.0% of rows (obs=1.0000, exp=< 0.75)

WARNING
- region: top value share increased by 40.0pp vs baseline (obs=1.0000, exp=0.6000)

Why dq-prof

  • Zero config – no YAML, no expectations to write
  • Fast – runs in seconds on sampled data
  • Catches real issues – null spikes, skew, outliers, freshness, schema drift
  • Baseline-aware – detect changes vs previous runs
  • CI-friendly – fail pipelines when data looks wrong

Install

Download a release binary and run:

chmod +x dq-prof
./dq-prof data.parquet

Or pip:

pip install dq-prof
dq-prof --help

Examples

Inspect a file:

dq-prof examples/clean_sales.csv

Compare to baseline (full scan required to save):

dq-prof data.csv --full-scan --save-baseline baseline.json
dq-prof data.csv --baseline baseline.json --fail-on warning

Postgres:

dq-prof public.sales \
  --pg-url postgres://user:pass@host/db \
  --sample-rows 50000

JSON output:

dq-prof data.parquet --format json

Output

Text or JSON with severity:

  • CRITICAL – likely broken data
  • WARNING – suspicious change

Philosophy

dq-prof is not a data observability platform. It’s a fast sanity check you run inline — a linter for data — before or after a pipeline step to catch issues immediately.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dq_prof-0.1.20-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

dq_prof-0.1.20-py3-none-macosx_11_0_arm64.whl (19.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file dq_prof-0.1.20-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dq_prof-0.1.20-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ec4d885fc045a9667dcc90b7c71eba4afaa1cc697bffeb515ce7b47471e047e3
MD5 a4b5a7c3e92dfa765b3cec7da0616254
BLAKE2b-256 6ac315229b17844e68edb336cdd650332049fac37ad46f84a8a5b37f515ad621

See more details on using hashes here.

Provenance

The following attestation bundles were made for dq_prof-0.1.20-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on kraftaa/dq-prof

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dq_prof-0.1.20-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dq_prof-0.1.20-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 54843962cc78f47623187029d6ab1079406cdffc381563a72b99633cc697cfec
MD5 dc026c95056997bb4f5fa783c44dd87f
BLAKE2b-256 63af16c4f104ec70f7844b1386f013d14bcbb7690738ffe472cbb40e247f339b

See more details on using hashes here.

Provenance

The following attestation bundles were made for dq_prof-0.1.20-py3-none-macosx_11_0_arm64.whl:

Publisher: release.yml on kraftaa/dq-prof

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page