Lightweight Intelligent Data Automation Engine — plug-and-play pipelines for everyone.
Project description
aidatapilot 🚀 — Your Partner in Data Automation
"The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency." — Bill Gates
Welcome to aidatapilot. I'm here to guide you from raw, messy datasets to production-ready signals in just one line of code. aidatapilot isn't just a library; it's an intelligent engine designed to handle the heavy lifting of data engineering so you can focus on the insight.
🎓 The aidatapilot Way (Mentoring Guide)
As your guide, I recommend starting with the "Simple API." It's designed to give you professional-grade results without the complexity of manual boilerplate.
1. The "Master Brain": auto_pipeline (Adaptive)
The most powerful command in aidatapilot. It analyzes your data, detects quality issues, and dynamically constructs a custom pipeline without any manual configuration. It also prints a beautiful Automation Decision Report explaining its choices.
import aidatapilot
# One command to rule them all
result = aidatapilot.auto_pipeline("messy_raw_data.csv")
2. General Data Cleaning: auto_clean
The "Gold Standard" for standard tabular data. It normalizes your column names, infers data types, fills missing values (0 for ints, "null" for strings), and removes duplicates.
# Perfect for daily reporting and BI
aidatapilot.auto_clean("sales_raw.xlsx", "sales_final.csv")
3. Machine Learning Ready: auto_ml_prep
Preparing data for a model? This command does everything auto_clean does, plus Outlier Clipping, Categorical Encoding, and MinMax Scaling.
# From raw data to 'model.fit()' ready
aidatapilot.auto_ml_prep("users.csv", "training_data.csv")
4. Specialized: auto_analytics & auto_text_prep
- Use
auto_analyticsfor time-series and BI reports. - Use
auto_text_prepfor LLM and RAG workflows (it handles text cleaning and chunking).
🧠 Intelligence Advisor
Before you clean, you might want to understand what's wrong. Run the Intelligence Advisor to get a proactive report on your data health:
aidatapilot.analyze_dataset("mysterious_data.csv")
🛠 Becoming a Pro: The Pipeline Class
For those who need granular control, the Pipeline class is your cockpit. You can mix and match templates or define custom steps.
from aidatapilot import Pipeline
# Craft a custom journey
pipe = Pipeline(template="ml_preprocess")
pipe.set_source("raw.csv")
pipe.set_output("ready.csv")
# Execute with performance tracking
result = pipe.run()
📦 Installation
pip install aidatapilot
🏗 Why aidatapilot?
- Production-Ready: Built with registry patterns and robust error handling.
- Memory Safe: Designed to handle large datasets without crashing your environment.
- Intelligent: Heuristic-based suggestions that improve over time.
Happy Automating! Feel free to reach out if you need help navigating your data pipelines.
🔍 Deep Dive: Understanding the Operations
Every "auto" command in aidatapilot is carefully designed to handle specific business and data needs. Here is exactly what happens under the hood:
🚀 auto_pilot (The Smart Choice)
- Components:
IntelligenceAdvisor+ Recommended Template. - What it does: Dynamically analyzes data patterns (null counts, skewness, text length, dates) and selects the best cleaning path.
- Why it's useful: Eliminates guesswork. Ideal for unknown or messy data when you don't know where to start. It's the "set it and forget it" tool.
✨ auto_clean (The Gold Standard)
- Components:
normalize_columns,infer_types,handle_missing_data,deduplicate. - What it does: Cleans column names, casts types, interpolates IDs while filling other missing values (0 for integers, "null" for strings), and removes duplicates.
- Why it's useful: The perfect daily cleaning tool. Ensures your data is tidy and error-free for most general tasks.
🤖 auto_ml_prep (Model Readiness)
- Components:
auto_cleansteps +outlier_detection,encode_categorical,scale_numeric. - What it does: Beyond basic cleaning, it handles numeric outliers (via clipping), encodes text categories to numbers, and scales values (MinMax 0–1).
- Why it's useful: High-speed preparation for model training. Most ML models (Scikit-Learn, PyTorch) require numeric, scaled data with no missing values.
📊 auto_analytics (BI & Reporting)
- Components:
normalize_columns,format_date,handle_missing_data,deduplicate,basic_aggregation. - What it does: Special focus on Universal Date Parsing and deduplication. Includes optional aggregation for quick reporting.
- Why it's useful: Best for time-series data and business dashboards where consistency across dates and low redundancy is critical.
📄 auto_text_prep (LLM & RAG)
- Components:
normalize_columns,clean_text,generate_metadata,chunk_text. - What it does: Cleans document text, calculates word counts/lengths, and splits long text into overlapping chunks.
- Why it's useful: Essential for AI applications. Prepares documents for embedding and storage in Vector Databases (like Pinecone or Chroma).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aidatapilot-0.1.1.tar.gz.
File metadata
- Download URL: aidatapilot-0.1.1.tar.gz
- Upload date:
- Size: 44.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53195efbe4c5bbeefb32ca3d25db5005df32988cfc3ed33a78653d8a4a5e1bcf
|
|
| MD5 |
e406e66872a841c9f8d213da4c38b111
|
|
| BLAKE2b-256 |
62b8146d85572dbac8205072e0acb53facf4dcda87e15e9aabb487d3afa9df74
|
File details
Details for the file aidatapilot-0.1.1-py3-none-any.whl.
File metadata
- Download URL: aidatapilot-0.1.1-py3-none-any.whl
- Upload date:
- Size: 54.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d72e16e03ad699b85fca51e8f1c439a77369adde522f7392dcef53ce7718294a
|
|
| MD5 |
ab163012e458d9e2334bf6febcb87a1e
|
|
| BLAKE2b-256 |
277d151b9e23cddb2ed6dd6e03fdee607bc0cab5eea2a0c41480004be13d6479
|