Declarative data validation for pandas DataFrames — 33 built-in rules, powered by Rust and Polars.
Project description
invr
Declarative data validation engine for Rust.
Define invariants (validation rules) and evaluate them against a dataset using a typed execution engine.
Features
- 33 built-in invariant types (nullability, uniqueness, numeric, string, date, relational, statistical, …)
- Lazy Polars execution backend
- Load specs from YAML
- Engine-agnostic core — bring your own backend
- Fully typed: no stringly-typed rule names
Installation
[dependencies]
invr = { version = "0.2", features = ["polars"] }
To also load specs from YAML:
invr = { version = "0.2", features = ["polars", "yaml"] }
Quick start
Programmatic spec
use invr::prelude::*;
use polars::prelude::*;
let df = df![
"age" => [25, 30, 45],
"email" => ["a@b.com", "c@d.com", "e@f.com"],
]?;
let spec = Spec::from_invariants(vec![
Invariant::new(
InvariantId::new("age_not_null")?,
PolarsKind::NotNull,
Scope::column("age"),
),
Invariant::new(
InvariantId::new("row_count_min")?,
PolarsKind::RowCountMin,
Scope::Dataset,
)
.with_param_value("min", "1"),
]);
let runner = RunSpec::new(EnginePolarsDataFrame);
let report = runner.run(&df, &spec)?;
if report.failed() {
for v in report.errors() {
eprintln!("violation: {}", v.reason());
}
}
YAML spec
# spec.yaml
invariants:
- id: age_not_null
kind: not_null
scope:
type: column
name: age
- id: email_unique
kind: unique
scope:
type: column
name: email
severity: error
- id: row_count_check
kind: row_count_min
scope:
type: dataset
params:
min: "10"
use invr::prelude::*;
let yaml = std::fs::read_to_string("spec.yaml")?;
let spec = spec_from_str(&yaml)?;
let runner = RunSpec::new(EnginePolarsDataFrame);
let report = runner.run(&df, &spec)?;
Invariant types
| Category | Kinds |
|---|---|
| Nullability | not_null, null_ratio_max |
| Uniqueness | unique, composite_unique, duplicate_ratio_max |
| Row count | row_count_min, row_count_max, row_count_between |
| Structure | column_exists, column_missing, dtype_is, schema_equals |
| Numeric | value_min, value_max, value_between, mean_between, stddev_max, sum_between |
| Date / Time | date_between, no_future_dates, monotonic_increasing, no_gaps_in_sequence |
| String | regex_match, string_length_min, string_length_max, string_length_between |
| Domain | allowed_values, forbidden_values |
| Statistical | outlier_ratio_max, percentile_between |
| Relational | foreign_key, column_equals, conditional_not_null |
| Custom | custom_expr |
Parameters reference
All param values are strings. Numeric values are passed as string literals (e.g. "42", "0.05").
| Kind | Scope | Required params |
|---|---|---|
not_null |
Column | — |
null_ratio_max |
Column | max_ratio — float 0.0–1.0 |
unique |
Column | — |
composite_unique |
Dataset | columns — comma-sep column list e.g. "a,b" |
duplicate_ratio_max |
Column | max_ratio — float 0.0–1.0 |
row_count_min |
Dataset | min |
row_count_max |
Dataset | max |
row_count_between |
Dataset | min, max |
column_exists |
Column | — |
column_missing |
Column | — |
dtype_is |
Column | dtype — e.g. "Int64", "Utf8", "Float64" |
schema_equals |
Dataset | schema — comma-sep col:dtype pairs e.g. "a:Int64,b:Utf8" |
value_min |
Column | min |
value_max |
Column | max |
value_between |
Column | min, max |
mean_between |
Column | min, max |
stddev_max |
Column | max |
sum_between |
Column | min, max |
date_between |
Column | start, end — ISO 8601 e.g. "2024-01-01" |
no_future_dates |
Column | — |
monotonic_increasing |
Column | — |
no_gaps_in_sequence |
Column | — |
regex_match |
Column | pattern — regex string |
string_length_min |
Column | min |
string_length_max |
Column | max |
string_length_between |
Column | min, max |
allowed_values |
Column | values — comma-sep list e.g. "A,B,C" |
forbidden_values |
Column | values — comma-sep list |
outlier_ratio_max |
Column | z — Z-score threshold, max_ratio — float 0.0–1.0 |
percentile_between |
Column | p — percentile 0.0–1.0, min, max |
foreign_key |
Column | allowed_values — comma-sep valid FK values |
column_equals |
Column | other_column — column name to compare against |
conditional_not_null |
Column | condition_column, condition_value |
custom_expr |
Column | column — column name |
Report API
report.failed() // true if any Error-severity violation exists
report.violations() // all violations
report.errors() // iterator over Error violations
report.warnings() // iterator over Warn violations
report.error_count() // number of Error violations
report.metrics() // execution_time_ms, total_invariants, violations
Severity
Each invariant defaults to Error. Override with:
invariant.with_severity(Severity::Warn)
Or in YAML:
severity: warn # info | warn | error
Feature flags
| Feature | Description |
|---|---|
polars |
Enables the Polars execution engine |
yaml |
Enables loading specs from YAML strings |
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file invr-0.2.3-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: invr-0.2.3-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 22.6 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8a1dfa1ebbd452943f9f29ef21b69477dc2895693c49b99ecb09b78064b9d58
|
|
| MD5 |
b721f3bec0d2daffa3a759de98ae3317
|
|
| BLAKE2b-256 |
abe63f3ebf89b6c38994453eace7480a4ab477840ae5b9375f65bbd24ea0b812
|