Skip to main content

Autonomous Data Quality Agent: profiling, detection, explanation, and fix proposals for data quality issues.

Project description

ADQA: Autonomous Data Quality Agent

Release Docs PyPI version Crates.io Python versions
License: MIT Code style: black Ruff codecov

The intelligent, autonomous agent for high-performance data quality inspection, risk detection, and automated remediation.

Getting StartedKey FeaturesDocumentationRust TUIContributingDownloads


🧐 Why ADQA?

In the era of Data-centric AI, your models are only as good as your data. Yet, Data Scientists spend up to 80% of their time cleaning data.

ADQA solves this by providing an autonomous loop:

  1. Observe: Deep multi-dimensional profiling of your dataset.
  2. Orient: Detect complex risks like PII leakage, statistical bias, and structural anomalies.
  3. Decide: Generate an execution plan with prioritized remediations.
  4. Act: Heal the data autonomously or with human oversight.

🚀 Vision

ADQA combines a robust Python backend for seamless pipeline integration with a high-performance Rust-based TUI for interactive observability. It bridges the gap between fully automated data engineering and the critical need for human intuition in data quality.

📚 Documentation

Detailed guides, architecture deep-dives, and full API references are available at: Mohammad-Talaat7.github.io/autonomous-data-quality-agent

✨ Key Features

  • 🔍 Multi-Source Ingress: Direct support for CSV, Parquet, Excel, SQL (Postgres, MySQL, etc.), S3, and 300+ SaaS sources via Airbyte.
  • 🧠 Intelligent Profiling:
    • Structural: Automated type inference and null-ratio analysis.
    • Behavioral: Outlier detection (Z-score/IQR), skewness, and cardinality.
    • Semantic: ML classifiers identify PII (Emails, SSNs, CCs) and domain-specific types.
  • 🚨 Hybrid Risk Detection:
    • Rule-based: Deterministic checks for drift, range violations, and duplicates.
    • ML-based: Advanced anomaly detection via Isolation Forests and bias identification.
  • 🛠️ Autonomous Remediation:
    • Advisory Mode: Generate audit-ready reports of what should be fixed.
    • Automatic Mode: Fully autonomous healing (impute, drop, clip, mask).
    • Human-in-the-Loop: Interactive approval of fixes via CLI or TUI.
  • 📜 Full Traceability: Industry-standard data lineage and execution traces for every transformation.

📦 Installation

Python Library & CLI

pip install adqa
# Or for full ML + Data Ingress capabilities:
pip install "adqa[all]"

Rust TUI

The TUI is distributed as a standalone binary. Install via cargo:

cargo install adqa-tui

Or download pre-compiled binaries from the Releases page.

🛠 Usage

Command Line Interface (CLI)

Quickly inspect any dataset:

adqa analyze my_data.parquet --mode advisory

Python API

Integrate into your training or ETL pipelines:

from adqa import ADQA, ADQAConfig

# High-performance profiling and detection
agent = ADQA.from_path("data.csv", config=ADQAConfig(execution_mode="automatic"))
result = agent.analyze()

# Access the healed dataframe immediately
clean_df = result.dataframe
print(result.summary())

🖥 Rust TUI

Monitor your agent's reasoning in real-time. The Rust TUI provides a zero-latency dashboard for exploring data lineages, trace events, and approving remediation plans.

adqa-tui

🤝 Contributing

We welcome contributions! Please see our Contributing Guide to get started with:

  • Adding new Detectors.
  • Improving the Scoring Engine.
  • Enhancing the Rust TUI.

📄 License

ADQA is released under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adqa-0.1.2.tar.gz (62.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adqa-0.1.2-py3-none-any.whl (105.9 kB view details)

Uploaded Python 3

File details

Details for the file adqa-0.1.2.tar.gz.

File metadata

  • Download URL: adqa-0.1.2.tar.gz
  • Upload date:
  • Size: 62.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for adqa-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e507b0680ac1b5b2598b29092f8c35e1406ab2aaea2bd4c0138299904bc56c60
MD5 5cfd899981d34b89d4a45f58822e34f9
BLAKE2b-256 ab1a8143ed7b8b21200f3bf02bb1acc93362c06b268d04876e9eead642720755

See more details on using hashes here.

Provenance

The following attestation bundles were made for adqa-0.1.2.tar.gz:

Publisher: Release-Backend.yml on Mohammad-Talaat7/autonomous-data-quality-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file adqa-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: adqa-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 105.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for adqa-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2ed61785a6f5dc7f88661b58aefd1a3ce9aea2e343dd07b099890827520d315e
MD5 2eef4a3ecf438f750ee7d148db379753
BLAKE2b-256 f474f7993c58f6fac2032f3a6f901f71a0af6b19219fd5f562a607204c8707a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for adqa-0.1.2-py3-none-any.whl:

Publisher: Release-Backend.yml on Mohammad-Talaat7/autonomous-data-quality-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page