Autonomous Data Quality Agent: profiling, detection, explanation, and fix proposals for data quality issues.
Project description
ADQA: Autonomous Data Quality Agent
The intelligent, autonomous agent for high-performance data quality inspection, risk detection, and automated remediation.
Getting Started • Key Features • Documentation • Rust TUI • Contributing • Downloads
🧐 Why ADQA?
In the era of Data-centric AI, your models are only as good as your data. Yet, Data Scientists spend up to 80% of their time cleaning data.
ADQA solves this by providing an autonomous loop:
- Observe: Deep multi-dimensional profiling of your dataset.
- Orient: Detect complex risks like PII leakage, statistical bias, and structural anomalies.
- Decide: Generate an execution plan with prioritized remediations.
- Act: Heal the data autonomously or with human oversight.
🚀 Vision
ADQA combines a robust Python backend for seamless pipeline integration with a high-performance Rust-based TUI for interactive observability. It bridges the gap between fully automated data engineering and the critical need for human intuition in data quality.
📚 Documentation
Detailed guides, architecture deep-dives, and full API references are available at: Mohammad-Talaat7.github.io/autonomous-data-quality-agent
✨ Key Features
- 🔍 Multi-Source Ingress: Direct support for CSV, Parquet, Excel, SQL (Postgres, MySQL, etc.), S3, and 300+ SaaS sources via Airbyte.
- 🧠 Intelligent Profiling:
- Structural: Automated type inference and null-ratio analysis.
- Behavioral: Outlier detection (Z-score/IQR), skewness, and cardinality.
- Semantic: ML classifiers identify PII (Emails, SSNs, CCs) and domain-specific types.
- 🚨 Hybrid Risk Detection:
- Rule-based: Deterministic checks for drift, range violations, and duplicates.
- ML-based: Advanced anomaly detection via Isolation Forests and bias identification.
- 🛠️ Autonomous Remediation:
- Advisory Mode: Generate audit-ready reports of what should be fixed.
- Automatic Mode: Fully autonomous healing (impute, drop, clip, mask).
- Human-in-the-Loop: Interactive approval of fixes via CLI or TUI.
- 📜 Full Traceability: Industry-standard data lineage and execution traces for every transformation.
📦 Installation
Python Library & CLI
pip install adqa
# Or for full ML + Data Ingress capabilities:
pip install "adqa[all]"
Rust TUI
The TUI is distributed as a standalone binary. Install via cargo:
cargo install adqa-tui
Or download pre-compiled binaries from the Releases page.
🛠 Usage
Command Line Interface (CLI)
Quickly inspect any dataset:
adqa analyze my_data.parquet --mode advisory
Python API
Integrate into your training or ETL pipelines:
from adqa import ADQA, ADQAConfig
# High-performance profiling and detection
agent = ADQA.from_path("data.csv", config=ADQAConfig(execution_mode="automatic"))
result = agent.analyze()
# Access the healed dataframe immediately
clean_df = result.dataframe
print(result.summary())
🖥 Rust TUI
Monitor your agent's reasoning in real-time. The Rust TUI provides a zero-latency dashboard for exploring data lineages, trace events, and approving remediation plans.
adqa-tui
🤝 Contributing
We welcome contributions! Please see our Contributing Guide to get started with:
- Adding new Detectors.
- Improving the Scoring Engine.
- Enhancing the Rust TUI.
📄 License
ADQA is released under the MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adqa-0.1.2.tar.gz.
File metadata
- Download URL: adqa-0.1.2.tar.gz
- Upload date:
- Size: 62.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e507b0680ac1b5b2598b29092f8c35e1406ab2aaea2bd4c0138299904bc56c60
|
|
| MD5 |
5cfd899981d34b89d4a45f58822e34f9
|
|
| BLAKE2b-256 |
ab1a8143ed7b8b21200f3bf02bb1acc93362c06b268d04876e9eead642720755
|
Provenance
The following attestation bundles were made for adqa-0.1.2.tar.gz:
Publisher:
Release-Backend.yml on Mohammad-Talaat7/autonomous-data-quality-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
adqa-0.1.2.tar.gz -
Subject digest:
e507b0680ac1b5b2598b29092f8c35e1406ab2aaea2bd4c0138299904bc56c60 - Sigstore transparency entry: 1259083693
- Sigstore integration time:
-
Permalink:
Mohammad-Talaat7/autonomous-data-quality-agent@c7b538036d47886304b411177814771c0414e9bf -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/Mohammad-Talaat7
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
Release-Backend.yml@c7b538036d47886304b411177814771c0414e9bf -
Trigger Event:
push
-
Statement type:
File details
Details for the file adqa-0.1.2-py3-none-any.whl.
File metadata
- Download URL: adqa-0.1.2-py3-none-any.whl
- Upload date:
- Size: 105.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ed61785a6f5dc7f88661b58aefd1a3ce9aea2e343dd07b099890827520d315e
|
|
| MD5 |
2eef4a3ecf438f750ee7d148db379753
|
|
| BLAKE2b-256 |
f474f7993c58f6fac2032f3a6f901f71a0af6b19219fd5f562a607204c8707a7
|
Provenance
The following attestation bundles were made for adqa-0.1.2-py3-none-any.whl:
Publisher:
Release-Backend.yml on Mohammad-Talaat7/autonomous-data-quality-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
adqa-0.1.2-py3-none-any.whl -
Subject digest:
2ed61785a6f5dc7f88661b58aefd1a3ce9aea2e343dd07b099890827520d315e - Sigstore transparency entry: 1259083711
- Sigstore integration time:
-
Permalink:
Mohammad-Talaat7/autonomous-data-quality-agent@c7b538036d47886304b411177814771c0414e9bf -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/Mohammad-Talaat7
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
Release-Backend.yml@c7b538036d47886304b411177814771c0414e9bf -
Trigger Event:
push
-
Statement type: