Open, audit-grade agentic data quality framework with portable industry packs
Project description
Aegis
Open, audit-grade agentic data quality framework with portable industry packs.
Aegis runs a LangGraph-orchestrated agent that validates your data, diagnoses failures with an LLM, and logs every decision to an audit trail — with every cost metered and every finding exportable.
Why Aegis?
| Aegis | Great Expectations / Soda | Monte Carlo / Anomalo | |
|---|---|---|---|
| Open source | ✅ Apache 2.0 | ✅ | ❌ Commercial |
| Agentic (LLM diagnosis + RCA) | ✅ | ❌ Rule execution only | ✅ Proprietary |
| Audit trail (per-decision log) | ✅ | Partial | ✅ Proprietary |
| Pluggable LLM (Anthropic, OpenAI) | ✅ | ❌ | ❌ |
| Industry packs | ✅ Planned | ❌ | ❌ |
| Portable open rule standard (ODCS-aligned) | ✅ | Partial | ❌ |
Install
pip install aegis-dq
For development:
git clone https://github.com/aegis-dq/aegis-dq
cd aegis-dq
pip install -e ".[dev]"
Optional extras:
pip install aegis-dq[openai] # OpenAI LLM provider
pip install aegis-dq[snowflake] # Snowflake warehouse adapter (coming in v0.5)
5-minute quickstart
# 1. Generate an example rules file
aegis init
# 2. Validate syntax before touching any warehouse (offline, no API key)
aegis validate rules.yaml
# 3. Run checks offline
aegis run rules.yaml --no-llm
# 4. Run with LLM diagnosis (Anthropic by default)
export ANTHROPIC_API_KEY=sk-ant-...
aegis run rules.yaml
# 5. Use OpenAI instead
export OPENAI_API_KEY=sk-...
aegis run rules.yaml --llm openai
# 6. Write a JSON report and notify Slack on failure
aegis run rules.yaml \
--output-json report.json \
--notify https://hooks.slack.com/services/...
Full walkthrough with real data: docs/getting-started.md
Architecture
┌──────────────────────────────────────────────────────────────┐
│ TIER 1 — INTERFACES │
│ CLI (aegis run/validate/init) • Python SDK • REST API │
│ Triggers: Airflow • dbt • Dagster • cron • webhook │
│ Outputs: Slack • Email • PagerDuty • Jira • file │
└───────────────────────────┬──────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ TIER 2 — AGENT CORE (LangGraph state machine) │
│ │
│ plan → execute → diagnose → report │
│ ↓ ↓ ↓ │
│ Rule Engine LLM Router Audit Logger │
│ 25 types Anthropic SQLite + ShareGPT export │
│ OpenAI │
│ │
│ Memory: run history • failure patterns • rule catalog │
└───────────────────────────┬──────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ TIER 3 — EXECUTION BACKENDS │
│ DuckDB (local/free) • Snowflake • BigQuery │
│ Databricks • Athena • Postgres • Redshift │
└──────────────────────────────────────────────────────────────┘
Rule format
rules:
- apiVersion: aegis.dev/v1
kind: DataQualityRule
metadata:
id: orders_revenue_non_negative
severity: critical # critical | high | medium | low | info
domain: retail
owner: revenue-team
tags: [revenue, validity]
description: Revenue must be >= 0
scope:
warehouse: duckdb
table: orders
logic:
type: sql_expression
expression: "revenue >= 0"
diagnosis:
common_causes:
- "Refund logic inverted the sign"
- "Currency conversion failure"
All 25 rule types with examples: docs/rule-schema-reference.md
Browse the 30 built-in templates:
aegis rules list
aegis rules list --category validity
CLI reference
| Command | Description |
|---|---|
aegis init |
Generate a starter rules.yaml |
aegis validate <config> |
Check YAML syntax + schema offline (no warehouse needed) |
aegis run <config> |
Run validation, diagnose failures, produce a report |
aegis rules list |
Browse the 30 built-in rule templates |
aegis audit trajectory <run-id> |
Inspect the LLM decision trail for a past run |
aegis run flags:
| Flag | Default | Description |
|---|---|---|
--db |
:memory: |
DuckDB file path |
--llm |
anthropic |
LLM provider: anthropic | openai |
--llm-model |
(provider default) | Override model name |
--no-llm |
false |
Skip LLM diagnosis entirely |
--output-json |
(none) | Write full JSON report to file |
--notify |
(none) | Slack webhook URL |
--notify-on |
failures |
When to notify: all | failures | critical |
Rule types (25 total)
| Category | Types |
|---|---|
| Completeness | not_null not_empty_string null_percentage_below |
| Uniqueness | unique composite_unique duplicate_percentage_below |
| Validity | sql_expression between min_value_check max_value_check regex_match accepted_values not_accepted_values no_future_dates column_exists |
| Referential | foreign_key conditional_not_null |
| Statistical | mean_between stddev_below column_sum_between |
| Timeliness | freshness date_order |
| Volume | row_count row_count_between custom_sql |
Roadmap
| Phase | Version | Status |
|---|---|---|
| Foundation | v0.1 | 🚧 In progress |
| Differentiate | v0.5 | Planned — RCA, reconciliation, BigQuery, Airflow, industry packs |
| Mature | v1.0 | Planned — ML rules, banking/healthcare packs, VS Code extension |
Full issue tracker: github.com/aegis-dq/aegis-dq/issues
Contributing
Contributions are welcome. See CONTRIBUTING.md to get started. Good first issues: label:good first issue
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aegis_dq-0.1.0.tar.gz.
File metadata
- Download URL: aegis_dq-0.1.0.tar.gz
- Upload date:
- Size: 45.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58ce68eeed098f48d54b1c7cbe0678383ad5a0df1704bd8277ffe6ef261de81a
|
|
| MD5 |
359dce27199a354ef7bc29e155a85474
|
|
| BLAKE2b-256 |
2eb8a7b529d907d46ca06cbb2ecd98c50e9ed3df359147cc721f5e94e8c051e0
|
File details
Details for the file aegis_dq-0.1.0-py3-none-any.whl.
File metadata
- Download URL: aegis_dq-0.1.0-py3-none-any.whl
- Upload date:
- Size: 35.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e531c8f02a06f50687cec15ad7352fb8c065d2b9c9deebcef679d070ce13eca4
|
|
| MD5 |
e921ae505c28ca299661a679700a7261
|
|
| BLAKE2b-256 |
6f10437826649eb44c063e222f0c6d4c06eaeb8e003f2550ff17b29ec01db09b
|