A great-expectations inspired data validation library entirely powered by ibis.
Project description
Data Quality Engine
Data Quality Engine is a modern, lightweight data validation library inspired by Great Expectations. It provides a completely stateless, embeddable, and high-performance validation engine powered by Ibis.
Because all validation rules are compiled into deferred Ibis expressions, dqe evaluates checks symmetrically across 20+ Execution Backends, including:
- DuckDB
- Polars
- Pandas
- PySpark
- Snowflake
- BigQuery
- PostgreSQL
Features
- Zero Boilerplate: No deeply nested uncommitted directories or complex CLI initializations required. It's just a Python library you import.
- SQL Pushdown: Complex validations execute directly where your data lives (in the database or dataframe engine). Maximize speed by evaluating everything in a single SQL pass!
- Pure Pydantic & YAML: Validation suites are built entirely with strict Pydantic models. We support seamless definition via YAML configuration files.
Installation
pip install dqe
Note: Depending on your choice of execution environment, you'll need the corresponding client installed (e.g. pip install duckdb pandas polars).
Quick Start (Python API)
You can directly construct everything in pure Python objects. This makes embedding validations into DAG workflows (like Airflow or Dagster) effortless!
import pandas as pd
import dqe
# Create a Context and register your data sources
context = dqe.Context()
# Example: Feed it a pandas dataframe (automatically routed to rapid in-memory DuckDB!)
df = pd.DataFrame({"id": [1, 2, 3], "age": [25, 30, None]})
context.add_data_source("my_db", backend="pandas", dictionary={"users": df})
# Let's say we instead wanted to connect straight to a Postgres or Snowflake warehouse:
# context.add_data_source("my_db", backend="postgres", host=..., database=...)
# Read the table from the Context
table = context.get_table("my_db", "users")
# Build the Expectation Suite
suite = dqe.ExpectationSuite(
name="users_suite",
expectations=[
dqe.BaseExpectation(type="expect_column_to_exist", kwargs={"column": "id"}),
dqe.BaseExpectation(
type="expect_column_values_to_be_between",
kwargs={"column": "age", "min_value": 0, "max_value": 100}
)
]
)
# Validate!
results = context.validate(table, suite)
print(f"Validation Success: {results.success}")
Quick Start (YAML Configuration)
The real power comes from abstracting rules (and backend connection definitions) out of your code entirely and into maintainable YAML files.
my_validations.yaml:
name: "my_suite"
# Optional: You can declare the Data Source connection directly in the YAML
data_sources:
- name: "primary_warehouse"
backend: "duckdb"
kwargs:
database: "my_data.db"
# Define the rules
expectations:
- type: "expect_column_to_exist"
kwargs:
column: "id"
- type: "expect_column_values_to_not_be_null"
kwargs:
column: "id"
- type: "expect_column_values_to_be_between"
kwargs:
column: "age"
min_value: 0
max_value: 100
app.py:
import dqe
context = dqe.Context()
suite = dqe.ExpectationSuite.from_yaml("my_validations.yaml")
# Automatically provision the duckdb "primary_warehouse" source defined in the YAML!
context.add_data_source_from_suite(suite)
# Read the table and evaluate
table = context.get_table("primary_warehouse", "users")
results = context.validate(table, suite)
print(results.model_dump_json(indent=2))
CLI Interface
You can quickly generate boilerplate validation suites and run them directly from the command line:
# Profile an existing table to automatically generate a baseline validation suite!
dqe profile --backend duckdb --kwargs '{"database": "my_data.db"}' --table users --out baseline.yaml
# Generate a starter my_validations.yaml and run_validations.py script
dqe init
# Validate an existing suite YAML
dqe validate my_validations.yaml
Available Expectations
- Table Structure:
expect_table_row_count_to_be_between(min_value=None, max_value=None)expect_table_columns_to_match_set(column_set, exact_match=True)expect_table_columns_to_match_ordered_list(column_list)
- Column Structure:
expect_column_to_exist(column)expect_column_values_to_be_unique(column)
- Column Map (Row-level):
expect_column_values_to_be_null(column, mostly=1.0)expect_column_values_to_not_be_null(column, mostly=1.0)expect_column_values_to_be_between(column, min_value=None, max_value=None, mostly=1.0)expect_column_values_to_be_in_set(column, value_set, mostly=1.0)expect_column_values_to_not_be_in_set(column, value_set, mostly=1.0)expect_column_values_to_match_regex(column, regex, mostly=1.0)expect_column_value_lengths_to_be_between(column, min_value=None, max_value=None, mostly=1.0)expect_column_values_to_be_of_type(column, type_, mostly=1.0)
- Column Pair Map (Row-level):
expect_column_pair_values_a_to_be_greater_than_b(column_A, column_B, or_equal=False, mostly=1.0)
- Cross-Table (Reconciliation):
expect_column_values_to_exist_in_other_table(column, other_table_name, other_column, other_data_source=None, mostly=1.0)expect_table_row_count_to_equal_other_table(other_table_name, other_data_source=None)
- Custom Logic Expressions:
expect_custom_condition(condition, compiler="ibis", mostly=1.0)
- Column Aggregate:
expect_column_max_to_be_between(column, min_value=None, max_value=None)expect_column_min_to_be_between(column, min_value=None, max_value=None)expect_column_mean_to_be_between(column, min_value=None, max_value=None)expect_column_stdev_to_be_between(column, min_value=None, max_value=None)expect_column_median_to_be_between(column, min_value=None, max_value=None)
Powered by Ibis deferred expressions, new expectations can be quickly created via the @register_expectation decorator pattern.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dqe-0.1.0.tar.gz.
File metadata
- Download URL: dqe-0.1.0.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
344fb207e16c82a65c8ff554f1a8318fcd7dd3a36bc8687726129570557c747f
|
|
| MD5 |
993d774ee1184f656f41570a17b8b1e2
|
|
| BLAKE2b-256 |
60c3a26c51a8762b772f3feb1f0dbc34e05be8d9d18b60c2dbc20ef16f86cd29
|
File details
Details for the file dqe-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dqe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0561b2a7007cb2b53ec7010540137a134c96a1929bc3cfb2025a6106c19d9ce1
|
|
| MD5 |
192a21a90bbd330e31f35058c202aa49
|
|
| BLAKE2b-256 |
4a0f67f5c40aab7dde58cfa2a6b56d1f114fe21e461f751370082937c0dbe8bc
|