Developer-first data quality engine
Project description
Kontra
Fast data quality validation for files, databases, and DataFrames.
Kontra validates data against declarative rules. It stays fast on large datasets by resolving checks from metadata when possible, then running the rest via batched SQL pushdown (DuckDB / PostgreSQL / SQL Server).
pip install kontra
Quick Start
import kontra
from kontra import rules
result = kontra.validate("users.parquet", rules=[
rules.not_null("user_id"),
rules.unique("email"),
rules.range("age", min=0, max=120),
])
result.passed # True
result.to_dict() # Structured output for CI/services
result.to_llm() # Token-optimized summary for agents
DataFrames work too:
result = kontra.validate(df, rules=[...]) # Polars or pandas
CLI
kontra profile users.parquet --draft > contract.yml
kontra validate contract.yml
✅ users — PASSED (4 of 4 rules)
✅ COL:user_id:not_null [metadata]
✅ COL:age:range [metadata]
✅ COL:email:unique [sql]
✅ COL:status:allowed_values [sql]
Execution
Metadata (preplan) resolves what it can prove. Remaining rules run via SQL pushdown when available, or locally (Polars). Preplan and pushdown are configurable.
Contracts
Rules can also be defined in YAML:
name: users
datasource: users.parquet
rules:
- name: not_null
params: { column: user_id }
- name: unique
params: { column: email }
severity: warning
- name: allowed_values
params:
column: status
values: [active, inactive, pending]
- name: range
params: { column: age, min: 0, max: 120 }
What You Get
- 18 built-in rules for nulls, uniqueness, ranges, regex, freshness, and more (reference)
- Fast execution: metadata analysis + batched SQL pushdown
- Multiple sources: Parquet, CSV, PostgreSQL, SQL Server, S3, Azure ADLS Gen2
- Agent-friendly: structured, token-optimized summaries via
.to_llm() - Debuggable failures: collect failing rows during validation, fetch more later on demand
- Track drift: save runs and compare over time with
kontra diff
Fail Fast vs Exact Counts
By default, Kontra runs in fail-fast mode: it stops at the first violation per rule and reports failed_count: 1 as a lower bound. This enables early termination and metadata-only resolution — large Parquet tables can validate in milliseconds when Parquet statistics are sufficient to prove a rule passes.
When you need exact counts, enable tally:
result = kontra.validate("users.parquet", rules=[...], tally=True)
Or per-rule in YAML:
rules:
- name: not_null
params: { column: user_id }
tally: true # scan all rows, count all violations
Results:
- default (fail fast) →
failed_count: 1(≥1 violation exists) tally: true→failed_count: 23741(exact)
Failure Samples
# Collect samples during validation
result = kontra.validate("users.parquet", rules=[...], sample=5)
# Access what was collected
for rule in result.rules:
if not rule.passed and rule.samples:
print(rule.rule_id, rule.samples)
# Need more? Fetch on demand
result.sample_failures("COL:user_id:not_null", n=20)
Install Extras
pip install "kontra[postgres]" # PostgreSQL
pip install "kontra[sqlserver]" # SQL Server
pip install "kontra[s3]" # S3 / MinIO
Documentation
| Doc | Audience |
|---|---|
| Getting Started | New users |
| Python API | Library users |
| Rules Reference | All 18 rules |
| Configuration | Project setup |
| Advanced Topics | Agents, state, performance |
| Architecture | Contributors |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kontra-0.6.0.tar.gz.
File metadata
- Download URL: kontra-0.6.0.tar.gz
- Upload date:
- Size: 357.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
825a603f909e57cad671cd00f7a25a70b10c952bada2571ca0380bd7dc923064
|
|
| MD5 |
f9a5d964dc4d8ae4760cd112fed746f6
|
|
| BLAKE2b-256 |
f1cad9b1aefc3e817ef55a0d4827adcacc3472f909defb7ea3b3ebceb97952de
|
File details
Details for the file kontra-0.6.0-py3-none-any.whl.
File metadata
- Download URL: kontra-0.6.0-py3-none-any.whl
- Upload date:
- Size: 319.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db93dd7c2476a752b02f51104679d8aa6b9df22a7fbca64b7ff28d5de48cbee9
|
|
| MD5 |
753abc2b23e3ea791c31213596b8395e
|
|
| BLAKE2b-256 |
0e27b0f4951a8e045852f30e70cf386fcb34d26bcdd8a054b409c8542450cc64
|