Skip to main content

A great-expectations inspired data validation library entirely powered by ibis.

Project description

Data Quality Engine

Data Quality Engine is a modern, lightweight data validation library inspired by Great Expectations. It provides a completely stateless, embeddable, and high-performance validation engine powered by Ibis.

Because all validation rules are compiled into deferred Ibis expressions, dqe evaluates checks symmetrically across 20+ Execution Backends, including:

  • DuckDB
  • Polars
  • Pandas
  • PySpark
  • Snowflake
  • BigQuery
  • PostgreSQL

Features

  • Zero Boilerplate: No deeply nested uncommitted directories or complex CLI initializations required. It's just a Python library you import.
  • SQL Pushdown: Complex validations execute directly where your data lives (in the database or dataframe engine). Maximize speed by evaluating everything in a single SQL pass!
  • Pure Pydantic & YAML: Validation suites are built entirely with strict Pydantic models. We support seamless definition via YAML configuration files.

Installation

pip install dqe

Note: Depending on your choice of execution environment, you'll need the corresponding client installed (e.g. pip install duckdb pandas polars).

Quick Start (Python API)

You can directly construct everything in pure Python objects. This makes embedding validations into DAG workflows (like Airflow or Dagster) effortless!

import pandas as pd
import dqe

# Create a Context and register your data sources
context = dqe.Context()

# Example: Feed it a pandas dataframe (automatically routed to rapid in-memory DuckDB!)
df = pd.DataFrame({"id": [1, 2, 3], "age": [25, 30, None]})
context.add_data_source("my_db", backend="pandas", dictionary={"users": df})

# Let's say we instead wanted to connect straight to a Postgres or Snowflake warehouse:
# context.add_data_source("my_db", backend="postgres", host=..., database=...)

# Read the table from the Context
table = context.get_table("my_db", "users")

# Build the Expectation Suite
suite = dqe.ExpectationSuite(
    name="users_suite",
    expectations=[
        dqe.BaseExpectation(type="expect_column_to_exist", kwargs={"column": "id"}),
        dqe.BaseExpectation(
            type="expect_column_values_to_be_between", 
            kwargs={"column": "age", "min_value": 0, "max_value": 100}
        )
    ]
)

# Validate!
results = context.validate(table, suite)
print(f"Validation Success: {results.success}")

Quick Start (YAML Configuration)

The real power comes from abstracting rules (and backend connection definitions) out of your code entirely and into maintainable YAML files.

my_validations.yaml:

name: "my_suite"

# Optional: You can declare the Data Source connection directly in the YAML
data_sources:
  - name: "primary_warehouse"
    backend: "duckdb"
    kwargs:
      database: "my_data.db"

# Define the rules
expectations:
  - type: "expect_column_to_exist"
    kwargs:
      column: "id"
  - type: "expect_column_values_to_not_be_null"
    kwargs:
      column: "id"
  - type: "expect_column_values_to_be_between"
    kwargs:
      column: "age"
      min_value: 0
      max_value: 100

app.py:

import dqe

context = dqe.Context()
suite = dqe.ExpectationSuite.from_yaml("my_validations.yaml")

# Automatically provision the duckdb "primary_warehouse" source defined in the YAML!
context.add_data_source_from_suite(suite)

# Read the table and evaluate
table = context.get_table("primary_warehouse", "users")
results = context.validate(table, suite)

print(results.model_dump_json(indent=2))

CLI Interface

You can quickly generate boilerplate validation suites and run them directly from the command line:

# Profile an existing table to automatically generate a baseline validation suite!
dqe profile --backend duckdb --kwargs '{"database": "my_data.db"}' --table users --out baseline.yaml

# Generate a starter my_validations.yaml and run_validations.py script
dqe init

# Validate an existing suite YAML
dqe validate my_validations.yaml

Available Expectations

  • Table Structure:
    • expect_table_row_count_to_be_between(min_value=None, max_value=None)
    • expect_table_columns_to_match_set(column_set, exact_match=True)
    • expect_table_columns_to_match_ordered_list(column_list)
  • Column Structure:
    • expect_column_to_exist(column)
    • expect_column_values_to_be_unique(column)
  • Column Map (Row-level):
    • expect_column_values_to_be_null(column, mostly=1.0)
    • expect_column_values_to_not_be_null(column, mostly=1.0)
    • expect_column_values_to_be_between(column, min_value=None, max_value=None, mostly=1.0)
    • expect_column_values_to_be_in_set(column, value_set, mostly=1.0)
    • expect_column_values_to_not_be_in_set(column, value_set, mostly=1.0)
    • expect_column_values_to_match_regex(column, regex, mostly=1.0)
    • expect_column_value_lengths_to_be_between(column, min_value=None, max_value=None, mostly=1.0)
    • expect_column_values_to_be_of_type(column, type_, mostly=1.0)
  • Column Pair Map (Row-level):
    • expect_column_pair_values_a_to_be_greater_than_b(column_A, column_B, or_equal=False, mostly=1.0)
  • Cross-Table (Reconciliation):
    • expect_column_values_to_exist_in_other_table(column, other_table_name, other_column, other_data_source=None, mostly=1.0)
    • expect_table_row_count_to_equal_other_table(other_table_name, other_data_source=None)
  • Custom Logic Expressions:
    • expect_custom_condition(condition, compiler="ibis", mostly=1.0)
  • Column Aggregate:
    • expect_column_max_to_be_between(column, min_value=None, max_value=None)
    • expect_column_min_to_be_between(column, min_value=None, max_value=None)
    • expect_column_mean_to_be_between(column, min_value=None, max_value=None)
    • expect_column_stdev_to_be_between(column, min_value=None, max_value=None)
    • expect_column_median_to_be_between(column, min_value=None, max_value=None)

Powered by Ibis deferred expressions, new expectations can be quickly created via the @register_expectation decorator pattern.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dqe-0.1.0.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dqe-0.1.0-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file dqe-0.1.0.tar.gz.

File metadata

  • Download URL: dqe-0.1.0.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dqe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 344fb207e16c82a65c8ff554f1a8318fcd7dd3a36bc8687726129570557c747f
MD5 993d774ee1184f656f41570a17b8b1e2
BLAKE2b-256 60c3a26c51a8762b772f3feb1f0dbc34e05be8d9d18b60c2dbc20ef16f86cd29

See more details on using hashes here.

File details

Details for the file dqe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dqe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dqe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0561b2a7007cb2b53ec7010540137a134c96a1929bc3cfb2025a6106c19d9ce1
MD5 192a21a90bbd330e31f35058c202aa49
BLAKE2b-256 4a0f67f5c40aab7dde58cfa2a6b56d1f114fe21e461f751370082937c0dbe8bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page