Lightning-fast local data contract validation via Polars & Pydantic V2.
Project description
polaguard
Lightning-fast local data contract validation for CSV, Parquet, and JSON files.
Built to protect local pipelines, software loops, and CI/CD runners before data hits the cloud.
⚡ Why Polaguard?
Most data contract engines are database-centric, slow to connect, and heavy. Polaguard shifts data quality left:
- Zero Infrastructure: No cloud dependencies, database logins, or heavy configuration.
- Blazing Fast: Vectorized execution handling millions of rows in milliseconds using Polars.
- Pipeline Native: Designed to block git commits and GitHub Action pipelines via automated exit codes.
🚀 Quick Start
1. Install
pip install polaguard
2. Auto-Generate a Contract Schema
Point Polaguard at a clean file. It will automatically infer your structures, formats, uniqueness, and null distributions.
polaguard init --file data/baseline.parquet --output contract.yaml
3. Check Incoming Batches
Instantly check incoming files against your established standards:
polaguard check --file data/new_batch.csv --contract contract.yaml
4. Use the Python API
from pathlib import Path
from polaguard import validate_file
result = validate_file(Path("data/new_batch.csv"), Path("contract.yaml"))
if not result.is_valid:
print(result.errors)
5. Check CLI Version
polaguard --version
🛠️ Automated Integrations
Pre-Commit Hooks
Catch structure breaking data changes before making a git commit. Add this to your .pre-commit-config.yaml:
repos:
- repo: https://github.com/osadose/polaguard
rev: v0.2.0
hooks:
- id: polaguard
args: ["check", "--file", "data/raw_inputs.csv", "--contract", "contract.yaml"]
📄 Schema Configuration & Constraints
Polaguard YAML contracts support dataset-level constraints, column validations, and custom SQL expression assertions.
Dataset-level Constraints
min_columns(integer): Minimal number of columns required.min_rows(integer): Minimal number of rows required.allow_extra_columns(boolean): Whether to fail if the dataset contains columns not defined in the contract.
Column-level Validations
Under columns.<column_name>:
type: One ofint,float,str,bool,date,datetime.required(boolean): Failing check if the column is absent.unique(boolean): Evaluates if values must be distinct.null_threshold(float between0.0and1.0): Permissible ratio of null values (e.g.0.2allows up to 20% nulls).regex(string): Forstrcolumns, regular expression format checking.allowed_values(list): Defines an enum of permitted values.min_value/max_value(any): Upper and lower boundary limits (for numeric, date, and datetime columns).min_length/max_length(integer): Length limits for character strings.
Custom SQL Expressions
Define a list of arbitrary SQL checks evaluated in Polars against the dataset under expressions:
expressions:
- "age >= 18"
- "start_date < end_date"
- "revenue - cost > 0"
📜 License
This project is licensed under the MIT License — see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polaguard-0.2.0.tar.gz.
File metadata
- Download URL: polaguard-0.2.0.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
124642650f079f748513d6f9f41cb86ea98b6e1fc29a22d91d18e8d5f5bcf2ab
|
|
| MD5 |
e584f8c9f9ff5d3a59e2c130ada4e629
|
|
| BLAKE2b-256 |
f93ece496b5a58787c2d094840bb7170da7f2bee7a7646e08b39fe6bb49ec9e8
|
Provenance
The following attestation bundles were made for polaguard-0.2.0.tar.gz:
Publisher:
publish.yml on osadose01/polaguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polaguard-0.2.0.tar.gz -
Subject digest:
124642650f079f748513d6f9f41cb86ea98b6e1fc29a22d91d18e8d5f5bcf2ab - Sigstore transparency entry: 1972245496
- Sigstore integration time:
-
Permalink:
osadose01/polaguard@757f3fc37a29be764c6fd716b18d8cf7290c2fc5 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/osadose01
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@757f3fc37a29be764c6fd716b18d8cf7290c2fc5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polaguard-0.2.0-py3-none-any.whl.
File metadata
- Download URL: polaguard-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd12673f03056a5c1f268088f80e00298e92041a71f603d7254873a240e4077e
|
|
| MD5 |
1195050aa919617a5c74dbc997fddcfe
|
|
| BLAKE2b-256 |
f902f77108995c90dd622ba568a78f0de8ff62b0988cf50c6c0e32705088889e
|
Provenance
The following attestation bundles were made for polaguard-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on osadose01/polaguard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polaguard-0.2.0-py3-none-any.whl -
Subject digest:
cd12673f03056a5c1f268088f80e00298e92041a71f603d7254873a240e4077e - Sigstore transparency entry: 1972245645
- Sigstore integration time:
-
Permalink:
osadose01/polaguard@757f3fc37a29be764c6fd716b18d8cf7290c2fc5 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/osadose01
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@757f3fc37a29be764c6fd716b18d8cf7290c2fc5 -
Trigger Event:
push
-
Statement type: