Open, audit-grade agentic data quality framework with portable industry packs

These details have not been verified by PyPI

Project links

Project description

Aegis DQ

Aegis DQ Demo

Open-source agentic data quality: validate, diagnose, and explain data failures — with an LLM that tells you exactly why.

$ aegis run rules.yaml

Aegis DQ — loading rules from rules.yaml
Loaded 3 rules
LLM: Anthropic (claude-haiku-4-5-20251001)

╭─────────────────────────────────────────────────╮
│         Aegis Validation Report                  │
├──────────────────┬──────────────────────────────┤
│ Metric           │ Value                        │
├──────────────────┼──────────────────────────────┤
│ Rules checked    │ 3                            │
│ Passed           │ 2                            │
│ Failed           │ 1                            │
│ Pass rate        │ 66.67%                       │
│ LLM cost         │ $0.000183                    │
╰──────────────────┴──────────────────────────────╯

Failures:

  orders_no_nulls (critical) — orders
  Rows failed: 47 / 10,000
  Explanation:  47 rows have NULL order_id, violating the completeness rule.
  Likely cause: ETL pipeline failed to populate order_id for orders placed via
                the mobile API between 2024-01-14 02:00–04:00 UTC.
  Action:       Re-run the mobile-api ingestion job for that window and
                backfill the missing order_ids from the events table.

Why Aegis?

	Aegis DQ	Great Expectations / Soda	Monte Carlo / Anomalo
Open source	✅ Apache 2.0	✅	❌ Commercial
Agentic LLM diagnosis + RCA	✅	❌	✅ Proprietary
Audit trail (per-decision log)	✅	Partial	✅ Proprietary
Pluggable LLM (Anthropic, OpenAI, Ollama)	✅	❌	❌
dbt integration	✅	✅	Partial
Portable open rule standard	✅	Partial	❌

Install

pip install aegis-dq

Extra	What it adds
`aegis-dq[bigquery]`	BigQuery adapter
`aegis-dq[databricks]`	Databricks adapter
`aegis-dq[athena]`	AWS Athena adapter
`aegis-dq[openai]`	OpenAI LLM provider
`aegis-dq[ollama]`	Ollama (local) LLM provider
`aegis-dq[airflow]`	Airflow `AegisOperator`
`aegis-dq[mcp]`	MCP server for Claude Desktop

5-minute quickstart

pip install aegis-dq

Seed a demo DuckDB database:

import duckdb

con = duckdb.connect("demo.db")
con.execute("""
    CREATE TABLE orders AS
    SELECT i AS order_id, 'placed' AS status, i * 9.99 AS revenue
    FROM range(1, 10001) t(i)
""")
# introduce some bad data
con.execute("UPDATE orders SET order_id = NULL WHERE order_id % 200 = 0")
con.execute("UPDATE orders SET revenue = -5.00 WHERE order_id % 500 = 0")
con.close()

Generate a starter rules file and run:

# create rules.yaml
aegis init

# edit rules.yaml — set warehouse: duckdb and table: orders
# then run validation
export ANTHROPIC_API_KEY=sk-ant-...
aegis run rules.yaml --db demo.db

Run without an API key (validation only, no LLM diagnosis):

aegis run rules.yaml --db demo.db --no-llm

Pipeline

Every aegis run passes your data through a 7-node LangGraph pipeline:

rules.yaml
    │
    ▼
  plan → execute → reconcile → classify → diagnose → rca → report
           │                       │           │        │       │
        28 rule               heuristic    LLM asks  lineage  JSON +
        types                  + LLM       "why?"    context  Slack

plan — parse and validate rules.yaml, build an execution graph
execute — run all 28 rule types against your warehouse
reconcile — compare results against expected thresholds
classify — heuristic triage (severity, category, affected rows)
diagnose — LLM writes a plain-English explanation per failure
rca — root-cause analysis using lineage context and run history
report — structured JSON + optional Slack notification

Rule types (28 total)

Category	Types
Completeness	`not_null` `not_empty_string` `null_percentage_below`
Uniqueness	`unique` `composite_unique` `duplicate_percentage_below`
Validity	`sql_expression` `between` `min_value_check` `max_value_check` `regex_match` `accepted_values` `not_accepted_values` `no_future_dates` `column_exists`
Referential	`foreign_key` `conditional_not_null`
Statistical	`mean_between` `stddev_below` `column_sum_between`
Timeliness	`freshness` `date_order`
Volume	`row_count` `row_count_between` `custom_sql`
Cross-table	`row_count_match` `column_sum_match` `set_inclusion` `set_equality`

Example rule:

rules:
  - apiVersion: aegis.dev/v1
    kind: DataQualityRule
    metadata:
      id: orders_revenue_non_negative
      severity: critical
      owner: revenue-team
      tags: [revenue, validity]
    scope:
      warehouse: duckdb
      table: orders
    logic:
      type: sql_expression
      expression: "revenue >= 0"

Warehouse adapters

Adapter	Install	Status
DuckDB	built-in	✅
BigQuery	`aegis-dq[bigquery]`	✅
Databricks	`aegis-dq[databricks]`	✅
AWS Athena	`aegis-dq[athena]`	✅
Snowflake	`aegis-dq[snowflake]`	✅ coming v1.0
Postgres / Redshift	`aegis-dq[postgres]`	🚧 v1.0

LLM providers

Provider	Install	Default model
Anthropic (Claude)	built-in	claude-haiku-4-5
OpenAI	`aegis-dq[openai]`	gpt-4o-mini
Ollama (local)	`aegis-dq[ollama]`	llama3.2

Switch providers at the CLI:

aegis run rules.yaml --llm openai --llm-model gpt-4o
aegis run rules.yaml --llm ollama --llm-model llama3.2

Integrations

Integration	What it does
`aegis-dq[airflow]`	`AegisOperator` — drop-in Airflow task
`aegis-dq[mcp]`	MCP server for Claude Desktop / tool use
`aegis dbt generate`	Convert dbt `manifest.json` to Aegis rules
GitHub Action (#27)	CI/CD gate on PRs (coming v1.0)

CLI reference

Command	Description
`aegis init`	Generate a starter `rules.yaml`
`aegis validate <config>`	Check YAML syntax + schema (no warehouse needed)
`aegis run <config>`	Run validation, diagnose failures, produce a report
`aegis rules list`	Browse built-in rule templates
`aegis audit trajectory <run-id>`	Inspect the LLM decision trail for a past run
`aegis audit search <query>`	Full-text search across audit logs (FTS5)
`aegis dbt generate <manifest>`	Convert a dbt manifest to Aegis rules
`aegis mcp serve`	Start the MCP server for Claude Desktop

aegis run flags:

Flag	Default	Description
`--db`	`:memory:`	DuckDB file path
`--llm`	`anthropic`	LLM provider: `anthropic` \| `openai` \| `ollama`
`--llm-model`	(provider default)	Override model name
`--no-llm`	`false`	Skip LLM diagnosis entirely
`--output-json`	(none)	Write full JSON report to file
`--notify`	(none)	Slack webhook URL
`--notify-on`	`failures`	When to notify: `all` \| `failures` \| `critical`

Roadmap

Phase	Version	Items	Status
Foundation	v0.1	Core agent, DuckDB, CLI, audit trail	✅ Done
Differentiate	v0.5	BigQuery, Databricks, Athena, Airflow, Ollama, RCA, ShareGPT export, FTS5 search, dbt, MCP	✅ Done
Mature	v1.0	Postgres, REST API, GitHub Action, parallel subagents, ML anomaly detection, banking/healthcare packs	🚧 In progress

Full issue tracker: github.com/aegis-dq/aegis-dq/issues

Contributing

Contributions are welcome. See CONTRIBUTING.md to get started.

Good first issues: label:good first issue

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.0

May 12, 2026

This version

0.5.0

May 12, 2026

0.1.0

May 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aegis_dq-0.5.0.tar.gz (350.3 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aegis_dq-0.5.0-py3-none-any.whl (76.3 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file aegis_dq-0.5.0.tar.gz.

File metadata

Download URL: aegis_dq-0.5.0.tar.gz
Upload date: May 12, 2026
Size: 350.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aegis_dq-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`639534778c0da237a7606c72e0d4f379c10494330d1003e0461a4e1089628e15`
MD5	`af9fb05fc0488eb413b71f6b1c9b6755`
BLAKE2b-256	`156f5a26ebbd9332fa09ed3c37aa19773b813e2cca9398e6ec47fb137a6d91ae`

See more details on using hashes here.

File details

Details for the file aegis_dq-0.5.0-py3-none-any.whl.

File metadata

Download URL: aegis_dq-0.5.0-py3-none-any.whl
Upload date: May 12, 2026
Size: 76.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aegis_dq-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`457c470e282f4dc0c84466590172b799b2d10d3964a4472016839c107341827e`
MD5	`46fb629294b28c7db77ba158f823217e`
BLAKE2b-256	`0416f4aa81ec5a89df37c0f51bf4e047362de44120b18c6ab730388bce520ea8`

See more details on using hashes here.

aegis-dq 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Aegis DQ

Why Aegis?

Install

5-minute quickstart

Pipeline

Rule types (28 total)

Warehouse adapters

LLM providers

Integrations

CLI reference

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes