Skip to main content

Autonomous Data Quality Agent: profiling, detection, explanation, and fix proposals for data quality issues.

Project description

ADQA: Autonomous Data Quality Agent

Release - Linux Release - Windows Release - macOS Tests Docs PyPI version Crates.io Python versions
License: MIT codecov

The intelligent, autonomous agent for high-performance data quality inspection, risk detection, and automated remediation.

Getting StartedKey FeaturesDocumentationRust TUIContributingDownloads


🧐 Why ADQA?

In the era of Data-centric AI, your models are only as good as your data. Yet, Data Scientists spend up to 80% of their time cleaning data.

ADQA solves this by providing an autonomous loop:

  1. Observe: Deep multi-dimensional profiling of your dataset.
  2. Orient: Detect complex risks like PII leakage, statistical bias, and structural anomalies.
  3. Decide: Generate an execution plan with prioritized remediations.
  4. Act: Heal the data autonomously or with human oversight.

🚀 Vision

ADQA combines a robust Python backend for seamless pipeline integration with a high-performance Rust-based TUI for interactive observability. It bridges the gap between fully automated data engineering and the critical need for human intuition in data quality.

📚 Documentation

Detailed guides, architecture deep-dives, and full API references are available at: Mohammad-Talaat7.github.io/autonomous-data-quality-agent

✨ Key Features

  • 🔍 Multi-Source Ingress: Direct support for CSV, Parquet, Excel, SQL (Postgres, MySQL, etc.), S3, and 300+ SaaS sources via Airbyte.
  • 🧠 Intelligent Profiling:
    • Structural: Automated type inference and null-ratio analysis.
    • Behavioral: Outlier detection (Z-score/IQR), skewness, and cardinality.
    • Semantic: ML classifiers identify PII (Emails, SSNs, CCs) and domain-specific types.
  • 🚨 Hybrid Risk Detection:
    • Rule-based: Deterministic checks for drift, range violations, and duplicates.
    • ML-based: Advanced anomaly detection via Isolation Forests and bias identification.
  • 🛠️ Autonomous Remediation:
    • Advisory Mode: Generate audit-ready reports of what should be fixed.
    • Automatic Mode: Fully autonomous healing (impute, drop, clip, mask).
    • Human-in-the-Loop: Interactive approval of fixes via CLI or TUI.
  • 📜 Full Traceability: Industry-standard data lineage and execution traces for every transformation.

📦 Installation

Python Library & CLI

pip install adqa
# Or for full ML + Data Ingress capabilities:
pip install "adqa[all]"

Rust TUI

The TUI is distributed as a standalone binary. Install via cargo:

cargo install adqa-tui

Or download pre-compiled binaries from the Releases page.

🛠 Usage

Command Line Interface (CLI)

Quickly inspect any dataset:

adqa analyze my_data.parquet --mode advisory

Python API

Integrate into your training or ETL pipelines:

from adqa import ADQA, ADQAConfig

# High-performance profiling and detection
agent = ADQA.from_path("data.csv", config=ADQAConfig(execution_mode="automatic"))
result = agent.analyze()

# Access the healed dataframe immediately
clean_df = result.dataframe
print(result.summary())

🖥 Rust TUI

Monitor your agent's reasoning in real-time. The Rust TUI provides a zero-latency dashboard for exploring data lineages, trace events, and approving remediation plans.

adqa-tui

🤝 Contributing

We welcome contributions! Please see our Contributing Guide to get started with:

  • Adding new Detectors.
  • Improving the Scoring Engine.
  • Enhancing the Rust TUI.

📄 License

ADQA is released under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adqa-0.1.3.tar.gz (62.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adqa-0.1.3-py3-none-any.whl (105.8 kB view details)

Uploaded Python 3

File details

Details for the file adqa-0.1.3.tar.gz.

File metadata

  • Download URL: adqa-0.1.3.tar.gz
  • Upload date:
  • Size: 62.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for adqa-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5a538a2f0ffbc8d2235827a207daf2cb56ddf6a8a4988e72870b0e59f6915fa8
MD5 ad36055c684fd3c7d3593968dee381a6
BLAKE2b-256 a1880c4026c1e952b66283164b7de9bfdb8f4ad026f69d36af7a5ad9126a38bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for adqa-0.1.3.tar.gz:

Publisher: Release-Backend.yml on Mohammad-Talaat7/autonomous-data-quality-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file adqa-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: adqa-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 105.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for adqa-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ce9ef632d70fd47fc6730b6ae856e6e3361abf188ceea1ad2135f44db0265563
MD5 591bdd36e41f5bf3c24967a0fecd86da
BLAKE2b-256 9bf252a37dd62db740519b15309642966a63d476f9933a85d48b5da06ffebf51

See more details on using hashes here.

Provenance

The following attestation bundles were made for adqa-0.1.3-py3-none-any.whl:

Publisher: Release-Backend.yml on Mohammad-Talaat7/autonomous-data-quality-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page