A production-grade Python library and CLI tool for validating data quality

These details have not been verified by PyPI

Project description

iki-dq-check

A production-grade Python library and CLI tool for validating data quality across 25 checks, organized into 3 progressive tiers — Lite, Standard, and Advanced.

Use it from the CLI, import it directly as a library, or call it from a Jupyter notebook via the facade — which accepts every data format a data engineer works with: pandas, Polars, PyArrow, DuckDB, Parquet, CSV, JSON, SQLAlchemy, and SQLite.

Config is Python-native — a typed DQConfig dataclass with full IDE autocomplete, real lambda rules, and zero YAML. No extra dependencies required on the core.

Requirements

Python 3.10+
pytest (for running tests)

pip install pytest

The core framework runs on Python stdlib only. PyYAML is no longer required.

Install

# Core only
pip install iki-dq-check

Optional — facade input formats

The facade (src/Iki_DQ_Check/facade.py) uses lazy imports — install only what your stack needs:

Format	Install
pandas DataFrame	`pip install pandas` or `pip install -e ".[pandas]"`
Polars DataFrame / LazyFrame	`pip install polars` or `pip install -e ".[polars]"`
PyArrow Table	`pip install pyarrow` or `pip install -e ".[pyarrow]"`
Parquet files	`pip install pyarrow`
DuckDB relation	`pip install duckdb` or `pip install -e ".[duckdb]"`
SQLAlchemy	`pip install sqlalchemy` or `pip install -e ".[sqlalchemy]"`
SQLite	stdlib — no install needed
Jupyter HTML rendering	`pip install ipython` or `pip install -e ".[jupyter]"`
Legacy YAML config	`pip install pyyaml` or `pip install -e ".[yaml]"`

Project Structure

iki-dq-check/
│
├── src/
│   └── Iki_DQ_Check/
│       ├── core/
│       │   ├── __init__.py            # Re-exports public API
│       │   ├── base.py                # DataCheck, CheckResult, Severity, CheckTier, QualityReport
│       │   └── pipeline.py            # DataQualityPipeline, REGISTRY, TIER_MAP
│       │
│       ├── checks/
│       │   ├── __init__.py            # Imports all check classes
│       │   ├── lite.py                # NullCheck, PrimaryKeyCheck, DuplicateRowCheck,
│       │   │                          #   DataTypeCheck, NumericRangeCheck
│       │   ├── standard.py            # RegexCheck, DomainCheck, BusinessRuleCheck,
│       │   │                          #   CrossColumnCheck, FreshnessCheck, VolumeCheck,
│       │   │                          #   OutlierCheck, ReferentialIntegrityCheck
│       │   └── advanced.py            # SchemaDriftCheck, DuplicateFileIngestionCheck,
│       │                              #   HierarchyCheck, AuditColumnCheck,
│       │                              #   CrossSystemConsistencyCheck, ReferenceDataCheck,
│       │                              #   ChecksumCheck, DistributionCheck,
│       │                              #   NegativeValueCheck, PercentageTotalCheck,
│       │                              #   StringLengthCheck, CompletenessCheck
│       │
│       ├── cli/
│       │   ├── __init__.py
│       │   ├── args.py                # build_parser()
│       │   ├── loaders.py             # load_data(), load_config(), coerce(),
│       │   │                          #   resolve_config(), safe_eval_rule()
│       │   ├── output.py              # print_summary(), print_list(), save_report(),
│       │   │                          #   ANSI color helpers
│       │   └── runner.py              # build_pipeline(), die(), main()
│       │
│       ├── config.py                  # DQConfig dataclass — Python-native config
│       ├── facade.py                  # Universal input facade — check(), normalize(),
│       │                              #   check_lite/standard/advanced(), RichQualityReport
│       ├── app.py                     # CLI entry point — delegates to cli/runner.py
│       └── __init__.py                # Top-level public re-exports
│
├── tests/
│   ├── conftest.py                    # Shared fixtures, helpers, sample datasets
│   ├── test_lite.py                   # Lite tier checks (5 checks)
│   ├── test_standard.py               # Standard tier checks (8 checks)
│   ├── test_advanced.py               # Advanced tier checks (12 checks)
│   ├── test_pipeline.py               # Pipeline orchestration and QualityReport
│   ├── test_registry.py               # REGISTRY, TIER_MAP, and check metadata
│   ├── test_loaders.py                # Data/config loading and rule compilation
│   ├── test_facade.py                 # Facade normalizers and check() entrypoint
│   └── test_cli.py                    # CLI integration tests (subprocess)
│
├── sample_config.py                   # Reference Python config (replaces config.yaml)
├── dq_facade_demo.ipynb               # Jupyter notebook — facade across all formats
├── sample_data.json                   # Sample JSON dataset
├── sample_data.csv                    # Sample CSV dataset
├── pyproject.toml
└── README.MD

Quick Start

# See all available checks
iki-dq-check --list

# Run Lite tier
iki-dq-check --tier lite --file data.json --config sample_config.py

# Run Standard tier (includes Lite)
iki-dq-check --tier standard --file data.json --config sample_config.py

# Run Advanced tier (includes Lite + Standard)
iki-dq-check --tier advanced --file data.json --config sample_config.py

# Run a single check
iki-dq-check --check NullCheck --file data.json --config sample_config.py

# Run multiple specific checks
iki-dq-check --check NullCheck --check RegexCheck --check ChecksumCheck \
             --file data.json --config sample_config.py

# Save a JSON report
iki-dq-check --tier advanced --file data.json --config sample_config.py \
             --output report.json

# Stop on first critical failure
iki-dq-check --tier lite --file data.json --config sample_config.py --fail-fast

# Use a CSV file instead
iki-dq-check --tier standard --file data.csv --config sample_config.py

# Custom pipeline name
iki-dq-check --tier lite --file data.json --config sample_config.py \
             --pipeline-name orders_daily

Tiers

Tiers are cumulative — each tier includes everything below it. --tier accepts exactly one value per run.

--tier lite       →  5 checks   (Lite only)
--tier standard   → 13 checks   (Lite + Standard)
--tier advanced   → 25 checks   (Lite + Standard + Advanced)

Lite — 5 checks

The foundation. Catches the most common data problems.

Check	What It Catches
`NullCheck`	NULL or None values in any column
`PrimaryKeyCheck`	Duplicate or null primary keys
`DuplicateRowCheck`	Fully identical rows
`DataTypeCheck`	Values that can't be cast to expected type
`NumericRangeCheck`	Numbers outside `[min, max]` bounds

Standard — 8 additional checks (13 total)

For production pipelines with SLAs and business rules.

Check	What It Catches
`RegexCheck`	Values that fail a regex pattern (e.g. email format)
`DomainCheck`	Values outside an allowed set (e.g. status codes)
`BusinessRuleCheck`	Row-level business logic violations
`CrossColumnCheck`	Relationships between columns (e.g. end > start)
`FreshnessCheck`	Data arriving outside expected time window
`VolumeCheck`	Row counts outside expected range
`OutlierCheck`	Statistical outliers via IQR method
`ReferentialIntegrityCheck`	Foreign key values not in parent table

Advanced — 12 additional checks (25 total)

For compliance, financial, and cross-system critical pipelines.

Check	What It Catches
`SchemaDriftCheck`	Added or removed columns vs expected schema
`DuplicateFileIngestionCheck`	Same file loaded more than once
`HierarchyCheck`	Parent → child hierarchy violations
`AuditColumnCheck`	Missing `created_by`, `updated_at`, etc.
`CrossSystemConsistencyCheck`	Row count mismatch between source and target
`ReferenceDataCheck`	Unknown codes in master / reference data
`ChecksumCheck`	SHA-256 hash mismatch between source and target
`DistributionCheck`	Mean, median, stddev report (informational)
`NegativeValueCheck`	Negative values where not allowed
`PercentageTotalCheck`	Percentages that don't sum to 100
`StringLengthCheck`	Strings outside min/max length bounds
`CompletenessCheck`	Missing expected partition keys or dates

Configuration — Python Mode

Config is a typed DQConfig dataclass defined in src/Iki_DQ_Check/config.py. No YAML, no string parsing — just Python with full IDE autocomplete on every field.

Minimal config

from Iki_DQ_Check.config import DQConfig

config = DQConfig(pk_column="id")

Full reference config (`sample_config.py`)

from datetime import datetime, timezone
from Iki_DQ_Check.config import DQConfig

config = DQConfig(

    # ── LITE ────────────────────────────────────────────────────────────
    pk_column="id",

    schema={
        "age":    "int",
        "salary": "float",
    },

    ranges={
        "age":    (0, 120),
        "salary": (0, 1_000_000),
    },

    # ── STANDARD ────────────────────────────────────────────────────────
    patterns={
        "email": r"^[^@\s]+@[^@\s]+\.[^@\s]+$",
    },

    allowed={
        "status": ["active", "inactive"],
    },

    rules={
        "salary_positive": lambda r: (r.get("salary") or 0) > 0,
        "name_not_empty":  lambda r: bool(r.get("name")),
    },

    cross_rules={
        "working_age": lambda r: 18 <= (r.get("age") or 0) <= 65,
    },

    columns=["salary", "age"],
    expected_min=1,
    expected_max=10_000,

    fk_column="dept",
    reference_values=["Eng", "HR", "Fin", "Ops"],

    latest_timestamp=datetime.now(timezone.utc),
    max_delay_hours=24.0,

    # ── ADVANCED ────────────────────────────────────────────────────────
    expected_columns=["id", "name", "age", "salary", "email", "status", "dept"],
    audit_columns=["created_by", "created_at", "updated_by", "updated_at"],

    source_count=1000,
    target_count=998,

    source_payload="snapshot-v1",
    target_payload="snapshot-v1",

    code_column="status",
    valid_codes=["active", "inactive", "pending"],

    percentage_column="pct",

    length_rules={
        "name":  (1, 50),
        "email": (5, 100),
    },

    partition_column="dept",
    expected_partitions=["Eng", "HR", "Fin"],

    valid_hierarchy={
        "Asia":   ["Japan", "India", "China"],
        "Europe": ["Germany", "France", "UK"],
    },
)

DQConfig field reference

Every field is optional and defaults to None, which causes the corresponding check to skip gracefully.

Lite fields

Field	Type	Default	Used by	Description
`pk_column`	`str`	`"id"`	`PrimaryKeyCheck`	Primary key column name
`columns`	`list[str]`	`None`	`NullCheck`, `OutlierCheck`, `NegativeValueCheck`, `DistributionCheck`	Columns to inspect. When `None`, `NullCheck` checks all columns
`schema`	`dict[str, str]`	`None`	`DataTypeCheck`	Expected Python type per column. Supported: `"int"`, `"float"`, `"str"`, `"bool"`
`ranges`	`dict[str, tuple]`	`None`	`NumericRangeCheck`	Numeric bounds `(min, max)` per column. Use `None` for open bounds: `(0, None)`
`key_columns`	`list[str]`	`None`	`DuplicateRowCheck`	Columns to use for duplicate detection. Defaults to all columns when `None`

Standard fields

Field	Type	Default	Used by	Description
`patterns`	`dict[str, str]`	`None`	`RegexCheck`	Regex pattern per column, e.g. `{"email": r"^[^@\s]+@[^@\s]+\.[^@\s]+$"}`
`allowed`	`dict[str, list]`	`None`	`DomainCheck`	Allowed value set per column, e.g. `{"status": ["active", "inactive"]}`
`rules`	`dict[str, Callable]`	`None`	`BusinessRuleCheck`	Row-level predicates. Each callable receives a row dict and returns `True` (pass) or `False` (fail)
`cross_rules`	`dict[str, Callable]`	`None`	`CrossColumnCheck`	Cross-column predicates. Same signature as `rules` but for multi-column logic
`latest_timestamp`	`datetime`	`None`	`FreshnessCheck`	Timestamp of the most recent data record. Pass `datetime.now(timezone.utc)` for "fresh right now"
`max_delay_hours`	`float`	`24.0`	`FreshnessCheck`	Maximum acceptable data delay in hours
`expected_min`	`int`	`None`	`VolumeCheck`	Minimum acceptable row count
`expected_max`	`int`	`None`	`VolumeCheck`	Maximum acceptable row count
`fk_column`	`str`	`None`	`ReferentialIntegrityCheck`	Foreign key column name
`reference_values`	`list`	`None`	`ReferentialIntegrityCheck`	Valid foreign key values (parent table values)

Advanced fields

Field	Type	Default	Used by	Description
`expected_columns`	`list[str]`	`None`	`SchemaDriftCheck`	Expected column names. Added or removed columns are reported as drift
`file_name_column`	`str`	`"file_name"`	`DuplicateFileIngestionCheck`	Column that records the ingested file name
`parent_column`	`str`	`None`	`HierarchyCheck`	Parent column name for hierarchy validation
`child_column`	`str`	`None`	`HierarchyCheck`	Child column name for hierarchy validation
`valid_hierarchy`	`dict[str, list]`	`None`	`HierarchyCheck`	Valid parent → children mapping, e.g. `{"Asia": ["Japan", "India"]}`
`audit_columns`	`list[str]`	`None`	`AuditColumnCheck`	Columns that must be present and non-null, e.g. `["created_by", "created_at"]`
`source_count`	`int`	`None`	`CrossSystemConsistencyCheck`	Source system row count
`target_count`	`int`	`None`	`CrossSystemConsistencyCheck`	Target system row count
`tolerance_pct`	`float`	`0.01`	`CrossSystemConsistencyCheck`	Acceptable count mismatch as a fraction. `0.01` = 1%
`source_payload`	`str`	`None`	`ChecksumCheck`	Source payload string for SHA-256 hashing
`target_payload`	`str`	`None`	`ChecksumCheck`	Target payload string for SHA-256 hashing
`code_column`	`str`	`None`	`ReferenceDataCheck`	Column containing reference codes
`valid_codes`	`list`	`None`	`ReferenceDataCheck`	Valid code values for `code_column`
`percentage_column`	`str`	`None`	`PercentageTotalCheck`	Column whose values must sum to 100
`expected_total`	`float`	`100.0`	`PercentageTotalCheck`	Expected percentage total
`length_rules`	`dict[str, tuple]`	`None`	`StringLengthCheck`	String length bounds `(min, max)` per column, e.g. `{"name": (1, 50)}`
`partition_column`	`str`	`None`	`CompletenessCheck`	Column that identifies data partitions (e.g. region, date)
`expected_partitions`	`list`	`None`	`CompletenessCheck`	All partition values that must be present in the data

Using the config

CLI — pass the .py file path directly:

iki-dq-check --tier advanced --file data.json --config sample_config.py

Library — pass the instance to check():

from Iki_DQ_Check import check
from sample_config import config

report = check(df, tier="advanced", config=config)

Inline — no file needed:

from Iki_DQ_Check import check, DQConfig

cfg = DQConfig(
    pk_column="order_id",
    ranges={"amount": (0, None)},
    allowed={"status": ["pending", "fulfilled", "cancelled"]},
    rules={
        "amount_positive": lambda r: (r.get("amount") or 0) > 0,
    },
)

report = check(df, tier="standard", config=cfg)

to_kwargs()

DQConfig.to_kwargs() returns a plain dict of all non-None fields, ready to unpack into pipeline.run() or check(). Always-included fields (pk_column, max_delay_hours, tolerance_pct, expected_total, file_name_column) are included even when at their defaults. Callables (rules, cross_rules) are passed through as-is.

cfg = DQConfig(pk_column="id", ranges={"salary": (0, None)})

# These are equivalent:
report = check(df, tier="lite", config=cfg)
report = check(df, tier="lite", **cfg.to_kwargs())
report = pipeline.run(data, **cfg.to_kwargs())

Convenience factory functions

from Iki_DQ_Check.config import lite_config, standard_config, advanced_config

cfg = lite_config("order_id", ranges={"amount": (0, None)})

cfg = standard_config(
    "order_id",
    patterns={"email": r"^[^@\s]+@[^@\s]+\.[^@\s]+$"},
    allowed={"status": ["active", "inactive"]},
)

cfg = advanced_config(
    "order_id",
    expected_columns=["order_id", "amount", "status"],
    audit_columns=["created_by", "created_at"],
    source_count=10_000,
    target_count=9_995,
)

Rule expressions from strings

If you prefer string expressions over lambdas (e.g. when loading rules from a database or config store), use the helper methods. Both return self for chaining.

cfg = DQConfig(pk_column="id").with_rules_from_expr(
    salary_positive="salary > 0",
    name_not_empty="name != ''",
).with_cross_rules_from_expr(
    end_after_start="end > start",
)

Expressions are compiled with Python's ast module — no eval() is ever called.

Supported operators: ==, !=, <, >, <=, >=, and, or, not

Config source types accepted by `load_config()` and `check(config=...)`

Source	How it's resolved
`DQConfig` instance	`.to_kwargs()` called directly
`"my_config.py"` path	File is imported; `config` or `cfg` variable extracted
`"config.yaml"` path	Legacy YAML load (requires `pip install -e ".[yaml]"`)
`dict`	Passed through `resolve_config()` as-is
`None`	Returns `{}` — all checks use their defaults

Legacy YAML config (deprecated, still supported)

YAML config files still work if you have existing ones. Pass the .yaml path to --config or config=:

iki-dq-check --tier lite --file data.json --config config.yaml

pip install -e ".[yaml]"   # pyyaml is now optional

Output

Terminal

══════════════════════════════════════════════════════════════
  Pipeline  : dq_pipeline
  Ran at    : 2026-05-25 11:48:52 UTC
  Total     : 5 checks
  Passed    : 2 ✅
  Failed    : 3 ❌
  Pass rate : 40%
──────────────────────────────────────────────────────────────
  ❌ [LITE][CRITICAL] NullCheck: Nulls in 1 column(s)
       ↳ null_columns: {'age': [2]}
  ❌ [LITE][CRITICAL] PrimaryKeyCheck: PK 'id' violations found
       ↳ duplicate_values: [2]
  ✅ [LITE][CRITICAL] DuplicateRowCheck: No duplicate rows found
  ✅ [LITE][CRITICAL] DataTypeCheck: All columns pass type check
  ❌ [LITE][CRITICAL] NumericRangeCheck: Range violations in 2 column(s)
       ↳ violations: {'salary': [{'row': 2, 'value': -5000}]}
══════════════════════════════════════════════════════════════

JSON Report (`--output report.json`)

{
	"pipeline_name": "dq_pipeline",
	"ran_at": "2026-05-25T11:48:52+00:00",
	"success_rate": 0.4,
	"total": 5,
	"passed": 2,
	"failed": 3,
	"results": [
		{
			"check": "NullCheck",
			"tier": "LITE",
			"passed": false,
			"severity": "CRITICAL",
			"message": "Nulls in 1 column(s)",
			"details": {
				"null_columns": { "age": [2] },
				"total": 1
			}
		}
	]
}

Exit Codes

Code	Meaning
`0`	All checks passed (or only WARNING / INFO failures)
`1`	At least one CRITICAL check failed

Use in CI/CD pipelines:

iki-dq-check --tier lite --file data.json --config sample_config.py \
  || echo "❌ Quality gate failed — pipeline blocked"

Severity Levels

Each check has a fixed severity that controls the exit code:

Severity	Checks	Exit on failure
`CRITICAL`	Most checks — data integrity issues	Yes — exits `1`
`WARNING`	Domain, regex, outlier, volume checks	No — exits `0`
`INFO`	`DistributionCheck` (stats only)	No — exits `0`

Running Tests

Tests are split by concern and live in tests/. Each file mirrors the module it covers.

# Run all tests
pytest tests/

# Verbose output
pytest tests/ -v

# Filter by keyword
pytest tests/ -k null
pytest tests/ -k checksum
pytest tests/ -k cli

# Run a single file
pytest tests/test_lite.py
pytest tests/test_cli.py

# Skip CLI integration tests (faster)
pytest tests/ --ignore=tests/test_cli.py

# Run only CLI integration tests
pytest tests/test_cli.py

Test file reference

File	What it covers
`conftest.py`	Shared fixtures, assertion helpers, sample datasets
`test_lite.py`	`NullCheck`, `PrimaryKeyCheck`, `DuplicateRowCheck`, `DataTypeCheck`, `NumericRangeCheck`
`test_standard.py`	`RegexCheck`, `DomainCheck`, `BusinessRuleCheck`, `CrossColumnCheck`, `FreshnessCheck`, `VolumeCheck`, `OutlierCheck`, `ReferentialIntegrityCheck`
`test_advanced.py`	All 12 Advanced tier checks
`test_pipeline.py`	`DataQualityPipeline`, `QualityReport`, fail-fast, error resilience
`test_registry.py`	`REGISTRY`, `TIER_MAP`, tier/severity assignments, check metadata
`test_loaders.py`	`load_data()`, `load_config()`, `coerce()`, `resolve_config()`, `safe_eval_rule()`
`test_facade.py`	`normalize()` for all input formats, `check()`, `RichQualityReport`, `DQConfig` loading
`test_cli.py`	Full CLI integration via subprocess (exit codes, flags, output)

Expected output:

tests/test_lite.py       ........  PASSED
tests/test_standard.py   ........  PASSED
tests/test_advanced.py   ............  PASSED
tests/test_pipeline.py   ...........  PASSED
tests/test_registry.py   .........  PASSED
tests/test_loaders.py    ...............  PASSED
tests/test_facade.py     ................  PASSED
tests/test_cli.py        ....................  PASSED

Facade — Library & Notebook API

src/Iki_DQ_Check/facade.py is the single-entry-point API for using the framework as a library. It accepts every data format a data engineer works with and normalizes it to the core's list[dict] format automatically.

Supported input formats

Format	Example
`pandas.DataFrame`	`check(df, tier="lite")`
`polars.DataFrame`	`check(pl_df, tier="lite")`
`polars.LazyFrame`	`check(pl.scan_parquet("data.parquet"), tier="lite")`
`pyarrow.Table`	`check(arrow_table, tier="lite")`
`duckdb.DuckDBPyRelation`	`check(conn.sql("SELECT * FROM t"), tier="lite")`
Parquet file path	`check("data.parquet", tier="lite")`
CSV file path	`check("data.csv", tier="lite")`
JSON file path	`check("data.json", tier="lite")`
SQL + SQLAlchemy engine	`check("SELECT * FROM t", engine=engine, tier="lite")`
SQL + SQLite path	`check("SELECT * FROM t", db="mydb.sqlite", tier="lite")`
`list[dict]` (native)	`check([{"id": 1, ...}], tier="lite")`

Import

# Top-level shortcut (re-exported from __init__.py)
from Iki_DQ_Check import check, check_lite, check_standard, check_advanced, normalize
from Iki_DQ_Check import DQConfig

# Explicit module import
from Iki_DQ_Check.facade import check, normalize, RichQualityReport
from Iki_DQ_Check.config import DQConfig

check()

check(
    data,                        # any supported format (see table above)
    tier="lite",                 # "lite" | "standard" | "advanced"
    # -- or --
    checks=["NullCheck", ...],   # run specific checks instead of a full tier
    pipeline_name="my_pipeline", # shown in the report (default: "dq_pipeline")
    fail_fast=False,             # stop after first CRITICAL failure
    config=cfg,                  # DQConfig instance, .py path, .yaml path, or dict
    # SQL sources
    engine=engine,               # SQLAlchemy engine (when data is a SQL string)
    db="mydb.sqlite",            # SQLite path / ":memory:" (when data is SQL)
    # any check kwargs passed directly (merged with config)
    pk_column="id",
    ranges={"salary": (0, None)},
)

Tier shortcuts

check_lite(data, **kwargs)      # 5 checks
check_standard(data, **kwargs)  # 13 checks
check_advanced(data, **kwargs)  # 25 checks

Examples

pandas with DQConfig

import pandas as pd
from Iki_DQ_Check import check, DQConfig

df = pd.read_csv("orders.csv")

cfg = DQConfig(
    pk_column="order_id",
    patterns={"email": r"^[^@\s]+@[^@\s]+\.[^@\s]+$"},
    allowed={"status": ["pending", "fulfilled", "cancelled"]},
    ranges={"amount": (0, None)},
)

report = check(df, tier="standard", config=cfg)
report.show()

Polars LazyFrame — reads Parquet without loading into memory first

import polars as pl
from Iki_DQ_Check import check_lite

report = check_lite(
    pl.scan_parquet("warehouse/orders/*.parquet"),
    pk_column="order_id",
)
print(report.success_rate)

DuckDB — query directly from a relation

import duckdb
from Iki_DQ_Check import check
from sample_config import config

conn = duckdb.connect()
report = check(
    conn.sql("SELECT * FROM read_parquet('data/orders.parquet') WHERE dt = '2026-05-26'"),
    tier="advanced",
    config=config,
)
report.show()

SQL via SQLAlchemy — works with PostgreSQL, MySQL, BigQuery, Snowflake

from sqlalchemy import create_engine
from Iki_DQ_Check import check, DQConfig

engine = create_engine("postgresql://user:pass@host/db")

cfg = DQConfig(pk_column="order_id")
report = check(
    "SELECT * FROM public.orders WHERE created_at >= current_date",
    engine=engine,
    tier="standard",
    config=cfg,
)
report.show()

Specific checks instead of a tier

from Iki_DQ_Check import check, DQConfig

cfg = DQConfig(pk_column="id", ranges={"salary": (0, 1_000_000)})
report = check(
    df,
    checks=["NullCheck", "PrimaryKeyCheck", "NumericRangeCheck"],
    config=cfg,
)

CI/CD gate

from Iki_DQ_Check import check_lite, DQConfig

cfg = DQConfig(pk_column="id", ranges={"amount": (0, None)})
report = check_lite(df, config=cfg)

if report.success_rate < 1.0:
    failed = [r.check_name for r in report.failed]
    raise RuntimeError(f"Quality gate failed: {failed}")

Export to JSON

import json

with open("report.json", "w") as f:
    json.dump(report.to_dict(), f, indent=2, default=str)

Jupyter rendering

In a Jupyter notebook, returning report as the last expression in a cell automatically renders an HTML table with a pass-rate progress bar, color-coded tier and severity badges, and inline failure details.

# Auto-renders as HTML in Jupyter
report = check(df, tier="standard", config=cfg)
report

Call .show() to force rendering — it auto-detects the environment and prints ANSI text in a terminal.

report.show()   # HTML in Jupyter, ANSI text in terminal

A full demo covering every supported format is in dq_facade_demo.ipynb.

normalize()

Converts any supported format to list[dict] — useful for inspecting what the facade feeds into the pipeline:

from Iki_DQ_Check import normalize

rows = normalize("orders.parquet")
rows = normalize(pl_df)
rows = normalize("SELECT * FROM t", engine=engine)

print(rows[0])  # {'id': 1, 'name': 'Alice', ...}

Introspection helpers

from Iki_DQ_Check.facade import list_checks, supported_formats

list_checks()        # prints all 25 checks grouped by tier
supported_formats()  # prints the full format support table

Adding a Custom Check

# my_checks.py
from Iki_DQ_Check.core.base import DataCheck, CheckTier, Severity

class CorporateEmailCheck(DataCheck):
    tier     = CheckTier.STANDARD
    severity = Severity.CRITICAL

    ALLOWED_DOMAINS = {"corp.com", "subsidiary.io"}

    def run(self, data, email_column="email", **_):
        bad = [
            {"row": i, "value": r.get(email_column)}
            for i, r in enumerate(data)
            if "@" not in str(r.get(email_column, ""))
            or str(r.get(email_column, "")).split("@")[-1]
               not in self.ALLOWED_DOMAINS
        ]
        if bad:
            return self._fail(f"{len(bad)} non-corporate email(s)", violations=bad)
        return self._pass("All emails from approved domains")

from my_checks import CorporateEmailCheck

REGISTRY["CorporateEmailCheck"] = CorporateEmailCheck
TIER_MAP["standard"].append("CorporateEmailCheck")

Then use it like any built-in:

iki-dq-check --check CorporateEmailCheck --file data.json --config sample_config.py

Using the Core Pipeline Directly

For full control without the facade — custom orchestration, Airflow tasks, programmatic pipelines:

from Iki_DQ_Check.core.pipeline import DataQualityPipeline
from Iki_DQ_Check.checks.lite import NullCheck, PrimaryKeyCheck
from Iki_DQ_Check.checks.standard import RegexCheck
from Iki_DQ_Check.config import DQConfig

cfg = DQConfig(
    pk_column="order_id",
    patterns={"email": r"^[^@\s]+@[^@\s]+\.[^@\s]+$"},
)

pipeline = (
    DataQualityPipeline("orders_daily")
    .add(NullCheck())
    .add(PrimaryKeyCheck())
    .add(RegexCheck())
)

report = pipeline.run(data, **cfg.to_kwargs())

print(report.summary())

if report.success_rate < 1.0:
    raise RuntimeError("Data quality gate failed")

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iki_dq_check-0.1.0.tar.gz (63.7 kB view details)

Uploaded Jun 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iki_dq_check-0.1.0-py3-none-any.whl (41.4 kB view details)

Uploaded Jun 18, 2026 Python 3

File details

Details for the file iki_dq_check-0.1.0.tar.gz.

File metadata

Download URL: iki_dq_check-0.1.0.tar.gz
Upload date: Jun 18, 2026
Size: 63.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for iki_dq_check-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9211d520ddc6c5761e897f31ee9ac9202eb2fd9313f02029247a281ec8ea960a`
MD5	`7e969fb5919c9d2add39599a5d970d5b`
BLAKE2b-256	`d777bcf875ed0435e5e4ba51e6e6151779f16d133142a411874b446e022636e5`

See more details on using hashes here.

File details

Details for the file iki_dq_check-0.1.0-py3-none-any.whl.

File metadata

Download URL: iki_dq_check-0.1.0-py3-none-any.whl
Upload date: Jun 18, 2026
Size: 41.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for iki_dq_check-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6115e5e1d1fe40012b7a8b91380a1bb11660692126c8e5c36a74933f5166b2a5`
MD5	`fc98578c5b4cbf580e8f1cf398ce5194`
BLAKE2b-256	`3db76c3c346ce59f4651cb092ae2b45c3f8bfff88f9a1c04219be829cb62ba39`

See more details on using hashes here.

iki-dq-check 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

iki-dq-check

Requirements

Install

Optional — facade input formats

Project Structure

Quick Start

Tiers

Lite — 5 checks

Standard — 8 additional checks (13 total)

Advanced — 12 additional checks (25 total)

Configuration — Python Mode

Minimal config

Full reference config (sample_config.py)

DQConfig field reference

Lite fields

Standard fields

Advanced fields

Using the config

to_kwargs()

Convenience factory functions

Rule expressions from strings

Config source types accepted by load_config() and check(config=...)

Legacy YAML config (deprecated, still supported)

Output

Terminal

JSON Report (--output report.json)

Exit Codes

Severity Levels

Running Tests

Test file reference

Facade — Library & Notebook API

Supported input formats

Import

check()

Tier shortcuts

Examples

Jupyter rendering

normalize()

Introspection helpers

Adding a Custom Check

Using the Core Pipeline Directly

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Full reference config (`sample_config.py`)

Config source types accepted by `load_config()` and `check(config=...)`

JSON Report (`--output report.json`)