Skip to main content

Developer-first data quality engine

Project description

Kontra

Fast data quality validation for files, databases, and DataFrames.

Kontra validates data against declarative rules. It stays fast on large datasets by resolving checks from metadata when possible, then running the rest via batched SQL pushdown (DuckDB / PostgreSQL / SQL Server).

pip install kontra

Quick Start

import kontra
from kontra import rules

result = kontra.validate("users.parquet", rules=[
    rules.not_null("user_id"),
    rules.unique("email"),
    rules.range("age", min=0, max=120),
])

result.passed        # True
result.to_dict()     # Structured output for CI/services
result.to_llm()      # Token-optimized summary for agents

DataFrames work too:

result = kontra.validate(df, rules=[...])  # Polars or pandas

CLI

kontra profile users.parquet --draft > contract.yml
kontra validate contract.yml
✅ users — PASSED (4 of 4 rules)
  ✅ COL:user_id:not_null [metadata]
  ✅ COL:age:range [metadata]
  ✅ COL:email:unique [sql]
  ✅ COL:status:allowed_values [sql]

Execution

Metadata (preplan) resolves what it can prove. Remaining rules run via SQL pushdown when available, or locally (Polars). Preplan and pushdown are configurable.

Contracts

Rules can also be defined in YAML:

name: users
datasource: users.parquet

rules:
  - name: not_null
    params: { column: user_id }

  - name: unique
    params: { column: email }
    severity: warning

  - name: allowed_values
    params:
      column: status
      values: [active, inactive, pending]

  - name: range
    params: { column: age, min: 0, max: 120 }

What You Get

  • 18 built-in rules for nulls, uniqueness, ranges, regex, freshness, and more (reference)
  • Fast execution: metadata analysis + batched SQL pushdown
  • Multiple sources: Parquet, CSV, PostgreSQL, SQL Server, S3, Azure ADLS Gen2
  • Agent-friendly: structured, token-optimized summaries via .to_llm()
  • Debuggable failures: collect failing rows during validation, fetch more later on demand
  • Track drift: save runs and compare over time with kontra diff

Fail Fast vs Exact Counts

By default, Kontra runs in fail-fast mode: it stops at the first violation per rule and reports failed_count: 1 as a lower bound. This enables early termination and metadata-only resolution — large Parquet tables can validate in milliseconds when Parquet statistics are sufficient to prove a rule passes.

When you need exact counts, enable tally:

result = kontra.validate("users.parquet", rules=[...], tally=True)

Or per-rule in YAML:

rules:
  - name: not_null
    params: { column: user_id }
    tally: true      # scan all rows, count all violations

Results:

  • default (fail fast) → failed_count: 1 (≥1 violation exists)
  • tally: truefailed_count: 23741 (exact)

Failure Samples

# Collect samples during validation
result = kontra.validate("users.parquet", rules=[...], sample=5)

# Access what was collected
for rule in result.rules:
    if not rule.passed and rule.samples:
        print(rule.rule_id, rule.samples)

# Need more? Fetch on demand
result.sample_failures("COL:user_id:not_null", n=20)

Install Extras

pip install "kontra[postgres]"     # PostgreSQL
pip install "kontra[sqlserver]"    # SQL Server
pip install "kontra[s3]"           # S3 / MinIO

Documentation

Doc Audience
Getting Started New users
Python API Library users
Rules Reference All 18 rules
Configuration Project setup
Advanced Topics Agents, state, performance
Architecture Contributors

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kontra-0.6.3.tar.gz (364.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kontra-0.6.3-py3-none-any.whl (326.9 kB view details)

Uploaded Python 3

File details

Details for the file kontra-0.6.3.tar.gz.

File metadata

  • Download URL: kontra-0.6.3.tar.gz
  • Upload date:
  • Size: 364.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for kontra-0.6.3.tar.gz
Algorithm Hash digest
SHA256 27ec0895f83a6c266fd242d38ecc47fc8f30edc80486dffe9fe5d7fa4f6d275a
MD5 2d1921185ee8ce74871444715b64dbde
BLAKE2b-256 2d23676bfc9952b36404e97982740bdf7a23c22b66a9b57c3efb6f850bf1fb3f

See more details on using hashes here.

File details

Details for the file kontra-0.6.3-py3-none-any.whl.

File metadata

  • Download URL: kontra-0.6.3-py3-none-any.whl
  • Upload date:
  • Size: 326.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for kontra-0.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 42cb10454d3c0bcae6bfb376268126267639f4bdb4829deeef947abcfe6d0d21
MD5 5d1030c9f8cb3c782423d58af381d9af
BLAKE2b-256 712355b6fb7ad31c87fdec7cbe6bdc3ad1c77c395d9c78501d6980cfc7370a98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page