A lightweight data quality toolkit for pandas: profiling, validation schemas, and a zero-config scan.

These details have not been verified by PyPI

Project links

Project description

dqscore

A lightweight data quality toolkit for pandas — profile any DataFrame, declare expectations with a fluent schema, or run a zero-config scan. No heavy dependencies, no config files required.

dqscore helps you catch the boring-but-costly data problems — nulls where there shouldn't be any, duplicate keys, out-of-range values, malformed strings — before they reach a model, a dashboard, or a stakeholder.

Why this exists ?

Data quality issues are the silent killers of analytics and ML work. A null in the wrong column, a duplicate primary key, a value outside its expected range — these don't crash your pipeline. They quietly corrupt your output, and you find out three weeks later in a stakeholder meeting. The Python ecosystem already has excellent tools for this. Great Expectations is comprehensive and battle-tested. Pandera offers powerful schema-based validation. ydata-profiling produces rich exploratory reports. If you're building a long-lived production data platform, those are the right answers. But there's a gap in shape. When an analyst gets a fresh CSV and wants a fast read on whether it's trustworthy, the existing tools ask for a lot upfront — a schema, a config, a project structure, sometimes a framework integration. The lightest possible question — is this data OK? — doesn't have a one-line answer in any of them. And once you do set up checks, getting a single number you can put on a dashboard, or a non-zero exit code you can wire into CI, often needs custom code on top. dqscore is built for that middle ground. It has one dependency (pandas) and three things to learn: profile a DataFrame, declare a schema with a fluent API, or run a zero-config scan that infers sensible defaults. Every validation produces a 0–100 quality score and a report that exports to HTML, Markdown, or JSON. The CLI returns exit code 1 on failure, so dqscore scan data.csv drops straight into a CI pipeline or a pre-commit hook with no glue code. It's not a replacement for Great Expectations or pandera. It's the tool you reach for at the start of a project, or when reviewing a new dataset, or when you want a simple quality gate in CI without standing up a whole framework. That's the gap, and I think it's a useful one to fill — especially for individuals, smaller teams, and educators where the ceremony of heavier tools is the actual barrier to checking data at all. The package is MIT-licensed and feedback is welcome. If a check is missing, a report format would be useful, or the auto-scan heuristics could be smarter for your data, open an issue.

Why dqscore?

Tiny surface area. Three things to learn: profile, Schema, auto_scan.
Readable reports. Every result exports to dict, JSON, Markdown, or styled HTML.
Scoreable. Each validation produces a 0–100 quality score for dashboards/CI.
CLI included. dqscore scan data.csv returns a non-zero exit code on failure, so it drops straight into a pipeline or pre-commit hook.
One dependency: pandas.

Installation

pip install dqscore

Or install the latest from source:

git clone https://github.com/dgvj-work/dqscore.git
cd dqscore
pip install -e ".[dev]"

Quick start

1. Profile a DataFrame

import pandas as pd
import dqscore as dq

df = pd.read_csv("customers.csv")
profile = dq.profile(df)

print(profile.to_markdown())   # per-column stats
profile.to_html("profile.html")

2. Validate against a schema

schema = dq.Schema("customers")
schema.column("id").not_null().unique()
schema.column("age").in_range(0, 120)
schema.column("email").matches(r"^[^@]+@[^@]+\.[^@]+$")
schema.column("country").in_set(["US", "CA", "MX"])
schema.no_duplicate_rows()

result = schema.validate(df)

print(result.summary())        # human-readable report
print("Quality score:", result.score)
result.to_html("dq_report.html")

if not result.passed:
    raise SystemExit("Data quality checks failed")

3. Zero-config scan

When you just want a quick read on a new file:

result = dq.auto_scan(df)       # checks nulls, duplicate keys, duplicate rows
print(result.summary())

Command line

# Profile every column
dqscore profile data.csv --html profile.html

# Quick quality scan (exit code 1 if it fails — great for CI)
dqscore scan data.csv --json report.json
dqscore scan data.csv --max-null-pct 5

Available checks

Method	Fails when…
`not_null()`	value is null / NaN / NaT
`unique()`	a non-null value occurs more than once
`in_range(min, max, inclusive)`	numeric value is outside the bounds
`in_set([...])`	value is not one of the allowed values
`matches(pattern, full_match)`	string does not match the regex
`is_numeric()` / `is_integer()`	value can't be parsed as a number / integer
`is_datetime(fmt)`	value can't be parsed as a date/time
`string_length(min_len, max_len)`	string length is out of bounds
`custom(fn, name)`	your function returns `True` for a row
`Schema.no_duplicate_rows(subset)`	rows are exact duplicates

Checks chain on a column and most let nulls pass, so not_null() stays the single source of truth for missing values:

schema.column("score").not_null().is_numeric().in_range(0, 100)

Reports & scoring

A ValidationResult gives you:

result.passed — True/False
result.score — percentage of checks passed (0–100)
result.failures — only the failing checks (with sample failing values & indices)
result.summary() / to_markdown() / to_json() / to_html(path)

Contributing

Contributions and feedback are very welcome — see CONTRIBUTING.md. Found a bug or want a new check? Open an issue.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dqscore-0.1.0.tar.gz (17.7 kB view details)

Uploaded Jun 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dqscore-0.1.0-py3-none-any.whl (16.3 kB view details)

Uploaded Jun 28, 2026 Python 3

File details

Details for the file dqscore-0.1.0.tar.gz.

File metadata

Download URL: dqscore-0.1.0.tar.gz
Upload date: Jun 28, 2026
Size: 17.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dqscore-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c02667529041adee7b30fb18987e20ae5fedc7362ddfd45ffeea0e7455158816`
MD5	`8e4e1f84c84d74d390a9ebbf37479d20`
BLAKE2b-256	`ba1f411991e67c431eaf5a429f3c0a2e0f3666e7b89b815388276b80756aae52`

See more details on using hashes here.

File details

Details for the file dqscore-0.1.0-py3-none-any.whl.

File metadata

Download URL: dqscore-0.1.0-py3-none-any.whl
Upload date: Jun 28, 2026
Size: 16.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dqscore-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b699e482b96588942919230816e938cea013adf6bdb52552d3ccf5700f209d6a`
MD5	`0552c58adc4987023ef82fde054d4b51`
BLAKE2b-256	`057bdbf3887065029c65f0861e787a0393b5f5f5b2bb7275cc1bfcc97072b24f`

See more details on using hashes here.

dqscore 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dqscore

Why this exists ?

Why dqscore?

Installation

Quick start

1. Profile a DataFrame

2. Validate against a schema

3. Zero-config scan

Command line

Available checks

Reports & scoring

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes