Skip to main content

Lightweight Intelligent Data Automation Engine — plug-and-play pipelines for everyone.

Project description

aidatapilot 🚀 — High-Impact Data Automation Engine

Version Build License

aidatapilot is an intelligent automation engine that transforms raw, messy datasets into production-ready signals. It is designed to bridge the gap between "Raw Data" and "Actionable Insights" by automating the most time-consuming parts of data engineering: profiling, cleaning, and preparation.


📖 Table of Contents

  1. 🚀 Quick Start
  2. 🏗️ Usage Levels
  3. 🧠 Intelligence Advisor
  4. 📊 Visualization Layer
  5. 🛠️ Built-in Processing Steps
  6. 🏗️ Technical Architecture
  7. 🔌 Extensibility: Custom Steps
  8. 📦 Installation

🚀 Quick Start

Get from messy CSV to clean data in exactly 3 lines of code:

import aidatapilot

# The "One-Line" Master Command
aidatapilot.auto_pipeline("messy_data.csv", "cleaned_data.csv", visualize=True)

This single command performs:

  1. Profiling: Detects if your data is Transactional, Tabular, or Text.
  2. Analysis: Identifies nulls, outliers, and formatting errors.
  3. Execution: Builds and runs a custom cleaning pipeline.
  4. Reporting: Generates visual charts in the reports/ folder.

🏗️ Usage Levels

Level 1: Autonomous (auto_pilot)

Perfect for unknown or highly inconsistent datasets. The engine uses a Heuristic Rules Engine to decide which cleaning template to apply.

import aidatapilot
result = aidatapilot.auto_pilot("raw_data.csv")
print(f"Algorithm Selected: {result.state}")

Level 2: Simplified (Fast Actions)

For when you know what you want. Use opinionated scripts for specific domains:

Command Best For Technical Features
auto_clean() Daily Reporting Null-filling, Deduplication, ID repair.
auto_ml_prep() Model Training Label-encoding, MinMax scaling, Outlier clipping.
auto_text_prep() LLM & RAG Contextual chunking, Text sanitization.
auto_analytics() BI & Dashboards Date formatting, KPI placeholders.

Level 3: Professional (Fluent API)

For Data Engineers who need exact control over the execution DAG.

from aidatapilot import Pipeline

(
    Pipeline(template="analytics_cleaning")
    .set_source("sales_data.csv")
    .then("normalize_columns")
    .then("format_date", columns=["order_date"])
    .then("filter_rows", condition="price > 0")
    .set_output("ready_for_bi.csv")
    .run()
)

🧠 Intelligence Advisor

The Advisor is a proactive diagnostic tool. Instead of just cleaning data, it tells you why it needs cleaning.

from aidatapilot import Advisor

advisor = Advisor("data.csv")
print(f"Health Score: {advisor.get_readiness_score()}%")
print(f"Primary Insight: {advisor.get_primary_insight()}")

# Detailed JSON report
report = advisor.analyze()
print(report.diagnostics["null_map"])

📊 Visualization Layer

Visual evidence of data health is critical for stakeholder communication. aidatapilot generates these automatically:

aidatapilot.visualize_dataset(df, report_dir="reports/")
  • Missing Data Heatmap: See exactly where gaps are clustering.
  • Correlation Matrix: Understand relationships between features.
  • Outlier Boxplots: Identify anomalies visually.

🛠️ Built-in Processing Steps

Every then() or add_step() call refers to an internal registry. Top steps include:

  • normalize_columns: Standardizes headers to snake_case.
  • infer_types: Auto-detects Dates, Integers, and Floats.
  • handle_missing_data: Smart-fills based on column semantics.
  • interpolate_ids: Repairs broken or missing sequential IDs.
  • encode_categorical: Converts text labels to numeric codes.
  • scale_numeric: Scales data using MinMax or Standard (Z-Score) methods.
  • chunk_text: Splits long text for Vector DBs with sentence-boundary awareness.

🏗️ Technical Architecture

aidatapilot is built on a modular "Factory" architecture:

  1. Connectors: Load data from CSV, Excel, or SQL (Registry-based).
  2. Compiler: Transforms your Pipeline definition into a Directed Acyclic Graph (DAG) of execution nodes.
  3. Runtime: Executes the nodes using a thread-safe engine with Memory Safety mode for large datasets.
  4. Publishers: Exports the final result to your destination (File, Cloud, or memory).

🔌 Extensibility: Custom Steps

You can easily add your own logic to the engine using the @register_step decorator:

from aidatapilot.core.registry import register_step

@register_step("my_custom_cleanup")
def my_custom_cleanup(df, **params):
    # Your custom pandas logic here
    df['new_col'] = df['old_col'] * 2
    return df

# Now it's available in any pipeline!
pipeline.then("my_custom_cleanup")

📦 Installation

# Standard Install
pip install aidatapilot

# Development Install
git clone https://github.com/aidatapilot/aidatapilot.git
cd aidatapilot
pip install -e .

“The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency.” — Bill Gates

AIDataPilot | DHS IT Solutions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aidatapilot-0.2.4.tar.gz (64.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aidatapilot-0.2.4-py3-none-any.whl (73.5 kB view details)

Uploaded Python 3

File details

Details for the file aidatapilot-0.2.4.tar.gz.

File metadata

  • Download URL: aidatapilot-0.2.4.tar.gz
  • Upload date:
  • Size: 64.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for aidatapilot-0.2.4.tar.gz
Algorithm Hash digest
SHA256 dc8072ed6143ddb898c0556340c208a1c426a7b44eee76c5d791bcde6e24596d
MD5 b5b5442a3ca11cb6240ee07569e39451
BLAKE2b-256 3b14d3cb647be48140dd3cd8bb1f1ba7a8fe70d4854259d4c65b056f880510a8

See more details on using hashes here.

File details

Details for the file aidatapilot-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: aidatapilot-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 73.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for aidatapilot-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ab24ef08c374b687cba4584447a8cbd586318f056889b0b63efd75aefff5bade
MD5 82011819345923279aec23570fa7c932
BLAKE2b-256 abfd0c3302412bc38503bb1582ca66ce90c4036aac6b90dc7836e8671b4fbf38

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page