Lightweight Intelligent Data Automation Engine — plug-and-play pipelines for everyone.
Project description
aidatapilot 🚀 — High-Impact Data Automation Engine
aidatapilot is an intelligent automation engine that transforms raw, messy datasets into production-ready signals. It is designed to bridge the gap between "Raw Data" and "Actionable Insights" by automating the most time-consuming parts of data engineering: profiling, cleaning, and preparation.
📖 Table of Contents
- 🚀 Quick Start
- 🏗️ Usage Levels
- 🧠 Intelligence Advisor
- 📊 Visualization Layer
- 🛠️ Built-in Processing Steps
- 🏗️ Technical Architecture
- 🔌 Extensibility: Custom Steps
- 📦 Installation
🚀 Quick Start
Get from messy CSV to clean data in exactly 3 lines of code:
import aidatapilot
# The "One-Line" Master Command
aidatapilot.auto_pipeline("messy_data.csv", "cleaned_data.csv", visualize=True)
This single command performs:
- Profiling: Detects if your data is Transactional, Tabular, or Text.
- Analysis: Identifies nulls, outliers, and formatting errors.
- Execution: Builds and runs a custom cleaning pipeline.
- Reporting: Generates visual charts in the
reports/folder.
🏗️ Usage Levels
Level 1: Autonomous (auto_pilot)
Perfect for unknown or highly inconsistent datasets. The engine uses a Heuristic Rules Engine to decide which cleaning template to apply.
import aidatapilot
result = aidatapilot.auto_pilot("raw_data.csv")
print(f"Algorithm Selected: {result.state}")
Level 2: Simplified (Fast Actions)
For when you know what you want. Use opinionated scripts for specific domains:
| Command | Best For | Technical Features |
|---|---|---|
auto_clean() |
Daily Reporting | Null-filling, Deduplication, ID repair. |
auto_ml_prep() |
Model Training | Label-encoding, MinMax scaling, Outlier clipping. |
auto_text_prep() |
LLM & RAG | Contextual chunking, Text sanitization. |
auto_analytics() |
BI & Dashboards | Date formatting, KPI placeholders. |
Level 3: Professional (Fluent API)
For Data Engineers who need exact control over the execution DAG.
from aidatapilot import Pipeline
(
Pipeline(template="analytics_cleaning")
.set_source("sales_data.csv")
.then("normalize_columns")
.then("format_date", columns=["order_date"])
.then("filter_rows", condition="price > 0")
.set_output("ready_for_bi.csv")
.run()
)
🧠 Intelligence Advisor
The Advisor is a proactive diagnostic tool. Instead of just cleaning data, it tells you why it needs cleaning.
from aidatapilot import Advisor
advisor = Advisor("data.csv")
print(f"Health Score: {advisor.get_readiness_score()}%")
print(f"Primary Insight: {advisor.get_primary_insight()}")
# Detailed JSON report
report = advisor.analyze()
print(report.diagnostics["null_map"])
📊 Visualization Layer
Visual evidence of data health is critical for stakeholder communication. aidatapilot generates these automatically:
aidatapilot.visualize_dataset(df, report_dir="reports/")
- Missing Data Heatmap: See exactly where gaps are clustering.
- Correlation Matrix: Understand relationships between features.
- Outlier Boxplots: Identify anomalies visually.
🛠️ Built-in Processing Steps
Every then() or add_step() call refers to an internal registry. Top steps include:
normalize_columns: Standardizes headers tosnake_case.infer_types: Auto-detects Dates, Integers, and Floats.handle_missing_data: Smart-fills based on column semantics.interpolate_ids: Repairs broken or missing sequential IDs.encode_categorical: Converts text labels to numeric codes.scale_numeric: Scales data using MinMax or Standard (Z-Score) methods.chunk_text: Splits long text for Vector DBs with sentence-boundary awareness.
🏗️ Technical Architecture
aidatapilot is built on a modular "Factory" architecture:
- Connectors: Load data from CSV, Excel, or SQL (Registry-based).
- Compiler: Transforms your
Pipelinedefinition into a Directed Acyclic Graph (DAG) of execution nodes. - Runtime: Executes the nodes using a thread-safe engine with Memory Safety mode for large datasets.
- Publishers: Exports the final result to your destination (File, Cloud, or memory).
🔌 Extensibility: Custom Steps
You can easily add your own logic to the engine using the @register_step decorator:
from aidatapilot.core.registry import register_step
@register_step("my_custom_cleanup")
def my_custom_cleanup(df, **params):
# Your custom pandas logic here
df['new_col'] = df['old_col'] * 2
return df
# Now it's available in any pipeline!
pipeline.then("my_custom_cleanup")
📦 Installation
# Standard Install
pip install aidatapilot
# Development Install
git clone https://github.com/aidatapilot/aidatapilot.git
cd aidatapilot
pip install -e .
“The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency.” — Bill Gates
AIDataPilot | DHS IT Solutions
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aidatapilot-0.2.4.tar.gz.
File metadata
- Download URL: aidatapilot-0.2.4.tar.gz
- Upload date:
- Size: 64.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc8072ed6143ddb898c0556340c208a1c426a7b44eee76c5d791bcde6e24596d
|
|
| MD5 |
b5b5442a3ca11cb6240ee07569e39451
|
|
| BLAKE2b-256 |
3b14d3cb647be48140dd3cd8bb1f1ba7a8fe70d4854259d4c65b056f880510a8
|
File details
Details for the file aidatapilot-0.2.4-py3-none-any.whl.
File metadata
- Download URL: aidatapilot-0.2.4-py3-none-any.whl
- Upload date:
- Size: 73.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab24ef08c374b687cba4584447a8cbd586318f056889b0b63efd75aefff5bade
|
|
| MD5 |
82011819345923279aec23570fa7c932
|
|
| BLAKE2b-256 |
abfd0c3302412bc38503bb1582ca66ce90c4036aac6b90dc7836e8671b4fbf38
|