Skip to main content

Lightning-fast local data contract validation via Polars & Pydantic V2.

Project description

polaguard

Lightning-fast local data contract validation for CSV, Parquet, and JSON files.
Built to protect local pipelines, software loops, and CI/CD runners before data hits the cloud.

Python License: MIT Powered by Polars

⚡ Why Polaguard?

Most data contract engines are database-centric, slow to connect, and heavy. Polaguard shifts data quality left:

  • Zero Infrastructure: No cloud dependencies, database logins, or heavy configuration.
  • Blazing Fast: Vectorized execution handling millions of rows in milliseconds using Polars.
  • Pipeline Native: Designed to block git commits and GitHub Action pipelines via automated exit codes.

🚀 Quick Start

1. Install

pip install polaguard

2. Auto-Generate a Contract Schema

Point Polaguard at a clean file. It will automatically infer your structures, formats, uniqueness, and null distributions.

polaguard init --file data/baseline.parquet --output contract.yaml

3. Check Incoming Batches

Instantly check incoming files against your established standards:

polaguard check --file data/new_batch.csv --contract contract.yaml

4. Use the Python API

from pathlib import Path
from polaguard import validate_file

result = validate_file(Path("data/new_batch.csv"), Path("contract.yaml"))
if not result.is_valid:
  print(result.errors)

5. Check CLI Version

polaguard --version

🛠️ Automated Integrations

Pre-Commit Hooks

Catch structure breaking data changes before making a git commit. Add this to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/osadose/polaguard
    rev: v0.2.0
    hooks:
      - id: polaguard
        args: ["check", "--file", "data/raw_inputs.csv", "--contract", "contract.yaml"]

📄 Schema Configuration & Constraints

Polaguard YAML contracts support dataset-level constraints, column validations, and custom SQL expression assertions.

Dataset-level Constraints

  • min_columns (integer): Minimal number of columns required.
  • min_rows (integer): Minimal number of rows required.
  • allow_extra_columns (boolean): Whether to fail if the dataset contains columns not defined in the contract.

Column-level Validations

Under columns.<column_name>:

  • type: One of int, float, str, bool, date, datetime.
  • required (boolean): Failing check if the column is absent.
  • unique (boolean): Evaluates if values must be distinct.
  • null_threshold (float between 0.0 and 1.0): Permissible ratio of null values (e.g. 0.2 allows up to 20% nulls).
  • regex (string): For str columns, regular expression format checking.
  • allowed_values (list): Defines an enum of permitted values.
  • min_value / max_value (any): Upper and lower boundary limits (for numeric, date, and datetime columns).
  • min_length / max_length (integer): Length limits for character strings.

Custom SQL Expressions

Define a list of arbitrary SQL checks evaluated in Polars against the dataset under expressions:

expressions:
  - "age >= 18"
  - "start_date < end_date"
  - "revenue - cost > 0"

📜 License

This project is licensed under the MIT License — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polaguard-0.2.0.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polaguard-0.2.0-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file polaguard-0.2.0.tar.gz.

File metadata

  • Download URL: polaguard-0.2.0.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polaguard-0.2.0.tar.gz
Algorithm Hash digest
SHA256 124642650f079f748513d6f9f41cb86ea98b6e1fc29a22d91d18e8d5f5bcf2ab
MD5 e584f8c9f9ff5d3a59e2c130ada4e629
BLAKE2b-256 f93ece496b5a58787c2d094840bb7170da7f2bee7a7646e08b39fe6bb49ec9e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for polaguard-0.2.0.tar.gz:

Publisher: publish.yml on osadose01/polaguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polaguard-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: polaguard-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polaguard-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cd12673f03056a5c1f268088f80e00298e92041a71f603d7254873a240e4077e
MD5 1195050aa919617a5c74dbc997fddcfe
BLAKE2b-256 f902f77108995c90dd622ba568a78f0de8ff62b0988cf50c6c0e32705088889e

See more details on using hashes here.

Provenance

The following attestation bundles were made for polaguard-0.2.0-py3-none-any.whl:

Publisher: publish.yml on osadose01/polaguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page