Skip to main content

Lightweight Intelligent Data Automation Engine — plug-and-play pipelines for everyone.

Project description

aidatapilot 🚀 — Your Partner in Data Automation

"The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency." — Bill Gates

Welcome to aidatapilot. I'm here to guide you from raw, messy datasets to production-ready signals in just one line of code. aidatapilot isn't just a library; it's an intelligent engine designed to handle the heavy lifting of data engineering so you can focus on the insight.


🎓 The aidatapilot Way (Mentoring Guide)

As your guide, I recommend starting with the "Simple API." It's designed to give you professional-grade results without the complexity of manual boilerplate.

1. The "Master Brain": auto_pipeline (Adaptive)

The most powerful command in aidatapilot. It analyzes your data, detects quality issues, and dynamically constructs a custom pipeline without any manual configuration. It also prints a beautiful Automation Decision Report explaining its choices.

import aidatapilot

# One command to rule them all
result = aidatapilot.auto_pipeline("messy_raw_data.csv")

2. General Data Cleaning: auto_clean

The "Gold Standard" for standard tabular data. It normalizes your column names, infers data types, fills missing values (0 for ints, "null" for strings), and removes duplicates.

# Perfect for daily reporting and BI
aidatapilot.auto_clean("sales_raw.xlsx", "sales_final.csv")

3. Machine Learning Ready: auto_ml_prep

Preparing data for a model? This command does everything auto_clean does, plus Outlier Clipping, Categorical Encoding, and MinMax Scaling.

# From raw data to 'model.fit()' ready
aidatapilot.auto_ml_prep("users.csv", "training_data.csv")

4. Specialized: auto_analytics & auto_text_prep

  • Use auto_analytics for time-series and BI reports.
  • Use auto_text_prep for LLM and RAG workflows (it handles text cleaning and chunking).

🧠 Intelligence Advisor

Before you clean, you might want to understand what's wrong. Run the Intelligence Advisor to get a proactive report on your data health:

aidatapilot.analyze_dataset("mysterious_data.csv")

🛠 Becoming a Pro: The Pipeline Class

For those who need granular control, the Pipeline class is your cockpit. You can mix and match templates or define custom steps.

from aidatapilot import Pipeline

# Craft a custom journey
pipe = Pipeline(template="ml_preprocess")
pipe.set_source("raw.csv")
pipe.set_output("ready.csv")

# Execute with performance tracking
result = pipe.run()

📦 Installation

pip install aidatapilot

🏗 Why aidatapilot?

  • Production-Ready: Built with registry patterns and robust error handling.
  • Memory Safe: Designed to handle large datasets without crashing your environment.
  • Intelligent: Heuristic-based suggestions that improve over time.

Happy Automating! Feel free to reach out if you need help navigating your data pipelines.


🔍 Deep Dive: Understanding the Operations

Every "auto" command in aidatapilot is carefully designed to handle specific business and data needs. Here is exactly what happens under the hood:

🚀 auto_pilot (The Smart Choice)

  • Components: IntelligenceAdvisor + Recommended Template.
  • What it does: Dynamically analyzes data patterns (null counts, skewness, text length, dates) and selects the best cleaning path.
  • Why it's useful: Eliminates guesswork. Ideal for unknown or messy data when you don't know where to start. It's the "set it and forget it" tool.

auto_clean (The Gold Standard)

  • Components: normalize_columns, infer_types, handle_missing_data, deduplicate.
  • What it does: Cleans column names, casts types, interpolates IDs while filling other missing values (0 for integers, "null" for strings), and removes duplicates.
  • Why it's useful: The perfect daily cleaning tool. Ensures your data is tidy and error-free for most general tasks.

🤖 auto_ml_prep (Model Readiness)

  • Components: auto_clean steps + outlier_detection, encode_categorical, scale_numeric.
  • What it does: Beyond basic cleaning, it handles numeric outliers (via clipping), encodes text categories to numbers, and scales values (MinMax 0–1).
  • Why it's useful: High-speed preparation for model training. Most ML models (Scikit-Learn, PyTorch) require numeric, scaled data with no missing values.

📊 auto_analytics (BI & Reporting)

  • Components: normalize_columns, format_date, handle_missing_data, deduplicate, basic_aggregation.
  • What it does: Special focus on Universal Date Parsing and deduplication. Includes optional aggregation for quick reporting.
  • Why it's useful: Best for time-series data and business dashboards where consistency across dates and low redundancy is critical.

📄 auto_text_prep (LLM & RAG)

  • Components: normalize_columns, clean_text, generate_metadata, chunk_text.
  • What it does: Cleans document text, calculates word counts/lengths, and splits long text into overlapping chunks.
  • Why it's useful: Essential for AI applications. Prepares documents for embedding and storage in Vector Databases (like Pinecone or Chroma).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aidatapilot-0.1.1.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aidatapilot-0.1.1-py3-none-any.whl (54.5 kB view details)

Uploaded Python 3

File details

Details for the file aidatapilot-0.1.1.tar.gz.

File metadata

  • Download URL: aidatapilot-0.1.1.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for aidatapilot-0.1.1.tar.gz
Algorithm Hash digest
SHA256 53195efbe4c5bbeefb32ca3d25db5005df32988cfc3ed33a78653d8a4a5e1bcf
MD5 e406e66872a841c9f8d213da4c38b111
BLAKE2b-256 62b8146d85572dbac8205072e0acb53facf4dcda87e15e9aabb487d3afa9df74

See more details on using hashes here.

File details

Details for the file aidatapilot-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: aidatapilot-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 54.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for aidatapilot-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d72e16e03ad699b85fca51e8f1c439a77369adde522f7392dcef53ce7718294a
MD5 ab163012e458d9e2334bf6febcb87a1e
BLAKE2b-256 277d151b9e23cddb2ed6dd6e03fdee607bc0cab5eea2a0c41480004be13d6479

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page