Data validation library for audit pipelines using Polars
Project description
lokryn-pipe-audit
Data validation library for audit pipelines using Polars. Define validation contracts in TOML and validate data from local files, S3, GCS, or Azure Blob Storage.
Installation
pip install lokryn-pipe-audit
Quick Start
from lokryn_pipe_audit import load_contract, validate_dataframe, get_driver
# Load a validation contract
contract = load_contract("contracts/users.toml")
# Load data with the appropriate driver
driver = get_driver("csv")
df = driver.load(open("users.csv", "rb").read())
# Validate
outcome = validate_dataframe(df, contract)
if outcome.passed:
print("Validation passed!")
else:
for failure in outcome.failures:
print(f"Failed: {failure.rule} on {failure.column}")
Contracts
Define validation rules in TOML:
[contract]
name = "users"
version = "1.0"
format = "csv"
[[columns]]
name = "email"
rules = [
{ rule = "not_null" },
{ rule = "unique" },
{ rule = "pattern", pattern = "^[\\w.-]+@[\\w.-]+\\.\\w+$" }
]
[[columns]]
name = "age"
rules = [
{ rule = "not_null" },
{ rule = "range", min = 0, max = 150 }
]
[[columns]]
name = "status"
rules = [
{ rule = "in_set", values = ["active", "inactive", "pending"] }
]
Built-in Validators
| Validator | Description | Parameters |
|---|---|---|
not_null |
No null values | - |
unique |
All values unique | - |
pattern |
Regex match | pattern |
range |
Numeric range | min, max |
in_set |
Value in allowed set | values |
completeness |
% non-null above threshold | threshold |
mean_between |
Column mean in range | min, max |
row_count |
Row count in range | min, max |
compound_unique |
Unique across columns | columns |
date_format |
Date string format | format |
outlier_sigma |
No outliers beyond N sigma | sigma |
Storage Connectors
Local
from lokryn_pipe_audit import LocalConnector
connector = LocalConnector()
data = await connector.fetch("/path/to/file.csv")
S3
from lokryn_pipe_audit import S3Connector, load_profiles, get_profile
profiles = load_profiles("profiles.toml")
profile = get_profile(profiles, "my_s3_profile")
connector = S3Connector.from_profile_and_url(profile, "s3://bucket/key")
data = await connector.fetch("s3://bucket/data.csv")
GCS
from lokryn_pipe_audit import GCSConnector, load_profiles, get_profile
profiles = load_profiles("profiles.toml")
profile = get_profile(profiles, "my_gcs_profile")
connector = GCSConnector.from_profile_and_url(profile, "gs://bucket/key")
data = await connector.fetch("gs://bucket/data.csv")
Azure Blob Storage
from lokryn_pipe_audit import AzureConnector, load_profiles, get_profile
profiles = load_profiles("profiles.toml")
profile = get_profile(profiles, "my_azure_profile")
connector = AzureConnector.from_profile_and_url(profile, url)
data = await connector.fetch("https://account.blob.core.windows.net/container/blob")
Profiles
Configure storage credentials in profiles.toml:
[s3_profile]
provider = "s3"
region = "us-east-1"
access_key = "${AWS_ACCESS_KEY_ID}"
secret_key = "${AWS_SECRET_ACCESS_KEY}"
[gcs_profile]
provider = "gcs"
service_account_json = "${GCS_SERVICE_ACCOUNT_JSON}"
[azure_profile]
provider = "azure"
connection_string = "${AZURE_STORAGE_CONNECTION_STRING}"
Environment variables in ${VAR} format are automatically expanded.
File Formats
- CSV (
.csv) - Parquet (
.parquet)
License
AGPL-3.0 - See LICENSE for details.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lokryn_pipe_audit-0.1.0.tar.gz.
File metadata
- Download URL: lokryn_pipe_audit-0.1.0.tar.gz
- Upload date:
- Size: 41.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
006df47125d6bee18dc8a4837363dd87db59b6107ea827dcaf3e1bc444268abb
|
|
| MD5 |
3a5b4dda959b82e25be4e457481befa7
|
|
| BLAKE2b-256 |
6d5f1386373794b57173b37eda52a6ae7e341b0f58e9b2fcf6b0ee92f1223e84
|
Provenance
The following attestation bundles were made for lokryn_pipe_audit-0.1.0.tar.gz:
Publisher:
publish.yml on lokryn-llc/pipe-audit-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lokryn_pipe_audit-0.1.0.tar.gz -
Subject digest:
006df47125d6bee18dc8a4837363dd87db59b6107ea827dcaf3e1bc444268abb - Sigstore transparency entry: 796703189
- Sigstore integration time:
-
Permalink:
lokryn-llc/pipe-audit-core@7fd9935ab48d68bd4cb88d65031602497a0341aa -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/lokryn-llc
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7fd9935ab48d68bd4cb88d65031602497a0341aa -
Trigger Event:
push
-
Statement type:
File details
Details for the file lokryn_pipe_audit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lokryn_pipe_audit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 40.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1da43395a3ed3c9d2c774f9f98eec91dae074cdcf77230874a17e2df2e2dd1c
|
|
| MD5 |
09832ee0de573870928dcad17b7c71e0
|
|
| BLAKE2b-256 |
04a918830c07a8a4a53b9a41491105482d90615af631712595fb3c168b87124e
|
Provenance
The following attestation bundles were made for lokryn_pipe_audit-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on lokryn-llc/pipe-audit-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lokryn_pipe_audit-0.1.0-py3-none-any.whl -
Subject digest:
f1da43395a3ed3c9d2c774f9f98eec91dae074cdcf77230874a17e2df2e2dd1c - Sigstore transparency entry: 796703209
- Sigstore integration time:
-
Permalink:
lokryn-llc/pipe-audit-core@7fd9935ab48d68bd4cb88d65031602497a0341aa -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/lokryn-llc
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7fd9935ab48d68bd4cb88d65031602497a0341aa -
Trigger Event:
push
-
Statement type: