Skip to main content

Declarative data validation for pandas DataFrames — powered by Rust.

Project description

invr

Declarative data validation engine for Rust.

Define invariants (validation rules) and evaluate them against a dataset using a typed execution engine.

Features

  • 33 built-in invariant types (nullability, uniqueness, numeric, string, date, relational, statistical, …)
  • Lazy Polars execution backend
  • Load specs from YAML
  • Engine-agnostic core — bring your own backend
  • Fully typed: no stringly-typed rule names

Installation

[dependencies]
invr = { version = "0.2", features = ["polars"] }

To also load specs from YAML:

invr = { version = "0.2", features = ["polars", "yaml"] }

Quick start

Programmatic spec

use invars::prelude::*;
use polars::prelude::*;

let df = df![
    "age" => [25, 30, 45],
    "email" => ["a@b.com", "c@d.com", "e@f.com"],
]?;

let spec = Spec::from_invariants(vec![
    Invariant::new(
        InvariantId::new("age_not_null")?,
        PolarsKind::NotNull,
        Scope::Column { name: "age".into() },
    ),
    Invariant::new(
        InvariantId::new("row_count_min")?,
        PolarsKind::RowCountMin,
        Scope::Dataset,
    )
    .with_param_value("min", "1"),
]);

let runner = RunSpec::new(EnginePolarsDataFrame);
let report = runner.run(&df, &spec)?;

if report.failed() {
    for v in report.errors() {
        eprintln!("violation: {}", v.reason());
    }
}

YAML spec

# spec.yaml
invariants:
  - id: age_not_null
    kind: not_null
    scope:
      type: column
      name: age

  - id: email_unique
    kind: unique
    scope:
      type: column
      name: email
    severity: error

  - id: row_count_check
    kind: row_count_min
    scope:
      type: dataset
    params:
      min: "10"
use invars::prelude::*;

let yaml = std::fs::read_to_string("spec.yaml")?;
let spec = spec_from_str(&yaml)?;

let runner = RunSpec::new(EnginePolarsDataFrame);
let report = runner.run(&df, &spec)?;

Invariant types

Category Kinds
Nullability not_null, null_ratio_max
Uniqueness unique, composite_unique, duplicate_ratio_max
Row count row_count_min, row_count_max, row_count_between
Structure column_exists, column_missing, dtype_is, schema_equals
Numeric value_min, value_max, value_between, mean_between, stddev_max, sum_between
Date / Time date_between, no_future_dates, monotonic_increasing, no_gaps_in_sequence
String regex_match, string_length_min, string_length_max, string_length_between
Domain allowed_values, forbidden_values
Statistical outlier_ratio_max, percentile_between
Relational foreign_key, column_equals, conditional_not_null
Custom custom_expr

Report API

report.failed()          // true if any Error-severity violation exists
report.violations()      // all violations
report.errors()          // iterator over Error violations
report.warnings()        // iterator over Warn violations
report.error_count()     // number of Error violations
report.metrics()         // execution_time_ms, total_invariants, violations

Severity

Each invariant defaults to Error. Override with:

invariant.with_severity(Severity::Warn)

Or in YAML:

severity: warn   # info | warn | error

Feature flags

Feature Description
polars Enables the Polars execution engine
yaml Enables loading specs from YAML strings

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

invr-0.2.1-cp313-cp313-macosx_11_0_arm64.whl (22.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file invr-0.2.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for invr-0.2.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7c56854f01a64a47beefa1f47294f3eaafce93ba7a2974ab768f573bc2cb56be
MD5 3348ff564b9aed87e0f0b78bccae332d
BLAKE2b-256 3eb01ec47aec54f8149001cd1c931d99ab88b6c8350999ffebb47440bac51caa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page