Skip to main content

Declarative data validation for pandas DataFrames — powered by Rust.

Project description

invr

Declarative data validation engine for Rust.

Define invariants (validation rules) and evaluate them against a dataset using a typed execution engine.

Features

  • 33 built-in invariant types (nullability, uniqueness, numeric, string, date, relational, statistical, …)
  • Lazy Polars execution backend
  • Load specs from YAML
  • Engine-agnostic core — bring your own backend
  • Fully typed: no stringly-typed rule names

Installation

[dependencies]
invr = { version = "0.2", features = ["polars"] }

To also load specs from YAML:

invr = { version = "0.2", features = ["polars", "yaml"] }

Quick start

Programmatic spec

use invars::prelude::*;
use polars::prelude::*;

let df = df![
    "age" => [25, 30, 45],
    "email" => ["a@b.com", "c@d.com", "e@f.com"],
]?;

let spec = Spec::from_invariants(vec![
    Invariant::new(
        InvariantId::new("age_not_null")?,
        PolarsKind::NotNull,
        Scope::Column { name: "age".into() },
    ),
    Invariant::new(
        InvariantId::new("row_count_min")?,
        PolarsKind::RowCountMin,
        Scope::Dataset,
    )
    .with_param_value("min", "1"),
]);

let runner = RunSpec::new(EnginePolarsDataFrame);
let report = runner.run(&df, &spec)?;

if report.failed() {
    for v in report.errors() {
        eprintln!("violation: {}", v.reason());
    }
}

YAML spec

# spec.yaml
invariants:
  - id: age_not_null
    kind: not_null
    scope:
      type: column
      name: age

  - id: email_unique
    kind: unique
    scope:
      type: column
      name: email
    severity: error

  - id: row_count_check
    kind: row_count_min
    scope:
      type: dataset
    params:
      min: "10"
use invars::prelude::*;

let yaml = std::fs::read_to_string("spec.yaml")?;
let spec = spec_from_str(&yaml)?;

let runner = RunSpec::new(EnginePolarsDataFrame);
let report = runner.run(&df, &spec)?;

Invariant types

Category Kinds
Nullability not_null, null_ratio_max
Uniqueness unique, composite_unique, duplicate_ratio_max
Row count row_count_min, row_count_max, row_count_between
Structure column_exists, column_missing, dtype_is, schema_equals
Numeric value_min, value_max, value_between, mean_between, stddev_max, sum_between
Date / Time date_between, no_future_dates, monotonic_increasing, no_gaps_in_sequence
String regex_match, string_length_min, string_length_max, string_length_between
Domain allowed_values, forbidden_values
Statistical outlier_ratio_max, percentile_between
Relational foreign_key, column_equals, conditional_not_null
Custom custom_expr

Report API

report.failed()          // true if any Error-severity violation exists
report.violations()      // all violations
report.errors()          // iterator over Error violations
report.warnings()        // iterator over Warn violations
report.error_count()     // number of Error violations
report.metrics()         // execution_time_ms, total_invariants, violations

Severity

Each invariant defaults to Error. Override with:

invariant.with_severity(Severity::Warn)

Or in YAML:

severity: warn   # info | warn | error

Feature flags

Feature Description
polars Enables the Polars execution engine
yaml Enables loading specs from YAML strings

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

invr-0.2.2-cp313-cp313-macosx_11_0_arm64.whl (22.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file invr-0.2.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for invr-0.2.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a5c67ea5edd08ddaa3af912d009ff37925f9b4bf71eee17ade94aa9fc31b9d71
MD5 d9462d753cbe5c215743a3ce51d5f874
BLAKE2b-256 9a5a2766656e307a3af41ecd3327327b27d592ced836b2857d5677796161cde6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page