Skip to main content

Autonomous Data Quality Agent: profiling, detection, explanation, and fix proposals for data quality issues.

Project description

ADQA: Autonomous Data Quality Agent

Release - Linux Release - Windows Release - macOS Tests Docs PyPI version Crates.io Python versions
License: MIT codecov

The intelligent, autonomous agent for high-performance data quality inspection, risk detection, and automated remediation.

Getting StartedKey FeaturesDocumentationRust TUIContributingDownloads


🧐 Why ADQA?

In the era of Data-centric AI, your models are only as good as your data. Yet, Data Scientists spend up to 80% of their time cleaning data.

ADQA solves this by providing an autonomous loop:

  1. Observe: Deep multi-dimensional profiling of your dataset.
  2. Orient: Detect complex risks like PII leakage, statistical bias, and structural anomalies.
  3. Decide: Generate an execution plan with prioritized remediations.
  4. Act: Heal the data autonomously or with human oversight.

🚀 Vision

ADQA combines a robust Python backend for seamless pipeline integration with a high-performance Rust-based TUI for interactive observability. It bridges the gap between fully automated data engineering and the critical need for human intuition in data quality.

📚 Documentation

Detailed guides, architecture deep-dives, and full API references are available at: Mohammad-Talaat7.github.io/autonomous-data-quality-agent

✨ Key Features

  • 🔍 Multi-Source Ingress: Direct support for CSV, Parquet, Excel, SQL (Postgres, MySQL, etc.), S3, and 300+ SaaS sources via Airbyte.
  • 🧠 Intelligent Profiling:
    • Structural: Automated type inference and null-ratio analysis.
    • Behavioral: Outlier detection (Z-score/IQR), skewness, and cardinality.
    • Semantic: ML classifiers identify PII (Emails, SSNs, CCs) and domain-specific types.
  • 🚨 Hybrid Risk Detection:
    • Rule-based: Deterministic checks for drift, range violations, and duplicates.
    • ML-based: Advanced anomaly detection via Isolation Forests and bias identification.
  • 🛠️ Autonomous Remediation:
    • Advisory Mode: Generate audit-ready reports of what should be fixed.
    • Automatic Mode: Fully autonomous healing (impute, drop, clip, mask).
    • Human-in-the-Loop: Interactive approval of fixes via CLI or TUI.
  • 📜 Full Traceability: Industry-standard data lineage and execution traces for every transformation.

📦 Installation

Python Library & CLI

pip install adqa
# Or for full ML + Data Ingress capabilities:
pip install "adqa[all]"

Rust TUI

The TUI is distributed as a standalone binary. Install via cargo:

cargo install adqa-tui

Or download pre-compiled binaries from the Releases page.

🛠 Usage

Command Line Interface (CLI)

Quickly inspect any dataset:

adqa analyze my_data.parquet --mode advisory

Python API

Integrate into your training or ETL pipelines:

from adqa import ADQA, ADQAConfig

# High-performance profiling and detection
agent = ADQA.from_path("data.csv", config=ADQAConfig(execution_mode="automatic"))
result = agent.analyze()

# Access the healed dataframe immediately
clean_df = result.dataframe
print(result.summary())

🖥 Rust TUI

Monitor your agent's reasoning in real-time. The Rust TUI provides a zero-latency dashboard for exploring data lineages, trace events, and approving remediation plans.

adqa-tui

🤝 Contributing

We welcome contributions! Please see our Contributing Guide to get started with:

  • Adding new Detectors.
  • Improving the Scoring Engine.
  • Enhancing the Rust TUI.

📄 License

ADQA is released under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adqa-0.1.4.tar.gz (62.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adqa-0.1.4-py3-none-any.whl (105.8 kB view details)

Uploaded Python 3

File details

Details for the file adqa-0.1.4.tar.gz.

File metadata

  • Download URL: adqa-0.1.4.tar.gz
  • Upload date:
  • Size: 62.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for adqa-0.1.4.tar.gz
Algorithm Hash digest
SHA256 2737a998996c61481251d3c3076ebbbc452f26af86a644a0dda78cd76038118d
MD5 0e21d51df5ea8e07e04a693c543a38b2
BLAKE2b-256 c9c735c7a790254b901cc3fb1d53a9994b801d7710dd085905294f17be5fa183

See more details on using hashes here.

Provenance

The following attestation bundles were made for adqa-0.1.4.tar.gz:

Publisher: Release-Backend.yml on Mohammad-Talaat7/autonomous-data-quality-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file adqa-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: adqa-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 105.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for adqa-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 549888cb11c0b9b0fcf43b115c6365fb9a73f49142f67154e807db5bb7526a04
MD5 caecd56181162892951c52ba8d0aed3e
BLAKE2b-256 d78f6d9f572f646480ccd5bdf09cb215b316e025b58ec64ac0f0b9379ec94bbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for adqa-0.1.4-py3-none-any.whl:

Publisher: Release-Backend.yml on Mohammad-Talaat7/autonomous-data-quality-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page