Skip to main content

Developer-first data quality engine

Project description

Kontra

Fast data quality validation for files, databases, and DataFrames.

Kontra validates data against declarative rules. It stays fast on large datasets by resolving checks from metadata when possible, then running the rest via batched SQL pushdown (DuckDB / PostgreSQL / SQL Server).

pip install kontra

Quick Start

import kontra
from kontra import rules

result = kontra.validate("users.parquet", rules=[
    rules.not_null("user_id"),
    rules.unique("email"),
    rules.range("age", min=0, max=120),
])

result.passed        # True
result.to_dict()     # Structured output for CI/services
result.to_llm()      # Token-optimized summary for agents

DataFrames work too:

result = kontra.validate(df, rules=[...])  # Polars or pandas

CLI

kontra profile users.parquet --draft > contract.yml
kontra validate contract.yml
✅ users — PASSED (4 of 4 rules)
  ✅ COL:user_id:not_null [metadata]
  ✅ COL:age:range [metadata]
  ✅ COL:email:unique [sql]
  ✅ COL:status:allowed_values [sql]

Execution

Metadata (preplan) resolves what it can prove. Remaining rules run via SQL pushdown when available, or locally (Polars). Preplan and pushdown are configurable.

Contracts

Rules can also be defined in YAML:

name: users
datasource: users.parquet

rules:
  - name: not_null
    params: { column: user_id }

  - name: unique
    params: { column: email }
    severity: warning

  - name: allowed_values
    params:
      column: status
      values: [active, inactive, pending]

  - name: range
    params: { column: age, min: 0, max: 120 }

What You Get

  • 18 built-in rules for nulls, uniqueness, ranges, regex, freshness, and more (reference)
  • Fast execution: metadata analysis + batched SQL pushdown
  • Multiple sources: Parquet, CSV, PostgreSQL, SQL Server, S3, Azure ADLS Gen2
  • Agent-friendly: structured, token-optimized summaries via .to_llm()
  • Debuggable failures: collect failing rows during validation, fetch more later on demand
  • Track drift: save runs and compare over time with kontra diff

Fail Fast vs Exact Counts

By default, Kontra runs in fail-fast mode: it stops at the first violation per rule and reports failed_count: 1 as a lower bound. This enables early termination and metadata-only resolution — large Parquet tables can validate in milliseconds when Parquet statistics are sufficient to prove a rule passes.

When you need exact counts, enable tally:

result = kontra.validate("users.parquet", rules=[...], tally=True)

Or per-rule in YAML:

rules:
  - name: not_null
    params: { column: user_id }
    tally: true      # scan all rows, count all violations

Results:

  • default (fail fast) → failed_count: 1 (≥1 violation exists)
  • tally: truefailed_count: 23741 (exact)

Failure Samples

# Collect samples during validation
result = kontra.validate("users.parquet", rules=[...], sample=5)

# Access what was collected
for rule in result.rules:
    if not rule.passed and rule.samples:
        print(rule.rule_id, rule.samples)

# Need more? Fetch on demand
result.sample_failures("COL:user_id:not_null", n=20)

Install Extras

pip install "kontra[postgres]"     # PostgreSQL
pip install "kontra[sqlserver]"    # SQL Server
pip install "kontra[s3]"           # S3 / MinIO

Documentation

Doc Audience
Getting Started New users
Python API Library users
Rules Reference All 18 rules
Configuration Project setup
Advanced Topics Agents, state, performance
Architecture Contributors

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kontra-0.6.0.tar.gz (357.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kontra-0.6.0-py3-none-any.whl (319.6 kB view details)

Uploaded Python 3

File details

Details for the file kontra-0.6.0.tar.gz.

File metadata

  • Download URL: kontra-0.6.0.tar.gz
  • Upload date:
  • Size: 357.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for kontra-0.6.0.tar.gz
Algorithm Hash digest
SHA256 825a603f909e57cad671cd00f7a25a70b10c952bada2571ca0380bd7dc923064
MD5 f9a5d964dc4d8ae4760cd112fed746f6
BLAKE2b-256 f1cad9b1aefc3e817ef55a0d4827adcacc3472f909defb7ea3b3ebceb97952de

See more details on using hashes here.

File details

Details for the file kontra-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: kontra-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 319.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for kontra-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 db93dd7c2476a752b02f51104679d8aa6b9df22a7fbca64b7ff28d5de48cbee9
MD5 753abc2b23e3ea791c31213596b8395e
BLAKE2b-256 0e27b0f4951a8e045852f30e70cf386fcb34d26bcdd8a054b409c8542450cc64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page