Skip to main content

Developer-first data quality engine

Project description

Kontra

Fast data quality validation for files, databases, and DataFrames.

Kontra validates data against declarative rules. It stays fast on large datasets by resolving checks from metadata when possible, then running the rest via batched SQL pushdown (DuckDB / PostgreSQL / SQL Server).

pip install kontra

Quick Start

import kontra
from kontra import rules

result = kontra.validate("users.parquet", rules=[
    rules.not_null("user_id"),
    rules.unique("email"),
    rules.range("age", min=0, max=120),
])

result.passed        # True
result.to_dict()     # Structured output for CI/services
result.to_llm()      # Token-optimized summary for agents

DataFrames work too:

result = kontra.validate(df, rules=[...])  # Polars or pandas

CLI

kontra profile users.parquet --draft > contract.yml
kontra validate contract.yml
✅ users — PASSED (4 of 4 rules)
  ✅ COL:user_id:not_null [metadata]
  ✅ COL:age:range [metadata]
  ✅ COL:email:unique [sql]
  ✅ COL:status:allowed_values [sql]

Execution

Metadata (preplan) resolves what it can prove. Remaining rules run via SQL pushdown when available, or locally (Polars). Preplan and pushdown are configurable.

Contracts

Rules can also be defined in YAML:

name: users
datasource: users.parquet

rules:
  - name: not_null
    params: { column: user_id }

  - name: unique
    params: { column: email }
    severity: warning

  - name: allowed_values
    params:
      column: status
      values: [active, inactive, pending]

  - name: range
    params: { column: age, min: 0, max: 120 }

What You Get

  • 18 built-in rules for nulls, uniqueness, ranges, regex, freshness, and more (reference)
  • Fast execution: metadata analysis + batched SQL pushdown
  • Multiple sources: Parquet, CSV, PostgreSQL, SQL Server, S3, Azure ADLS Gen2
  • Agent-friendly: structured, token-optimized summaries via .to_llm()
  • Debuggable failures: collect failing rows during validation, fetch more later on demand
  • Track drift: save runs and compare over time with kontra diff

Fail Fast vs Exact Counts

By default, Kontra runs in fail-fast mode: it stops at the first violation per rule and reports failed_count: 1 as a lower bound. This enables early termination and metadata-only resolution — large Parquet tables can validate in milliseconds when Parquet statistics are sufficient to prove a rule passes.

When you need exact counts, enable tally:

result = kontra.validate("users.parquet", rules=[...], tally=True)

Or per-rule in YAML:

rules:
  - name: not_null
    params: { column: user_id }
    tally: true      # scan all rows, count all violations

Results:

  • default (fail fast) → failed_count: 1 (≥1 violation exists)
  • tally: truefailed_count: 23741 (exact)

Failure Samples

# Collect samples during validation
result = kontra.validate("users.parquet", rules=[...], sample=5)

# Access what was collected
for rule in result.rules:
    if not rule.passed and rule.samples:
        print(rule.rule_id, rule.samples)

# Need more? Fetch on demand
result.sample_failures("COL:user_id:not_null", n=20)

Install Extras

pip install "kontra[postgres]"     # PostgreSQL
pip install "kontra[sqlserver]"    # SQL Server
pip install "kontra[s3]"           # S3 / MinIO

Documentation

Doc Audience
Getting Started New users
Python API Library users
Rules Reference All 18 rules
Configuration Project setup
Advanced Topics Agents, state, performance
Architecture Contributors

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kontra-0.6.2.tar.gz (363.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kontra-0.6.2-py3-none-any.whl (326.4 kB view details)

Uploaded Python 3

File details

Details for the file kontra-0.6.2.tar.gz.

File metadata

  • Download URL: kontra-0.6.2.tar.gz
  • Upload date:
  • Size: 363.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for kontra-0.6.2.tar.gz
Algorithm Hash digest
SHA256 5b352660f000598fb4ff43f1d05878cc41dc537f6b216b742e2f5b39530490d2
MD5 e0bb93b88fde432535dd8ab010b7a59b
BLAKE2b-256 c609550250bcf89df425f648005f2a374f49f50359a30f21015e788c58073774

See more details on using hashes here.

File details

Details for the file kontra-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: kontra-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 326.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for kontra-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 639fa3d0297ac180805b9ead85f85b208a955497fef98a3043b60d21a05385da
MD5 4261d661ec32643001b58c502ba9c234
BLAKE2b-256 8f0643e7a461597eb31e7d4c1871982d73d5c41c37870135ada96bff429ec46c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page