Skip to main content

Lightweight Intelligent Data Automation Engine — plug-and-play pipelines for everyone.

Project description

aidatapilot 🚀 — Your Partner in Data Automation

"The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency." — Bill Gates

Welcome to aidatapilot. I'm here to guide you from raw, messy datasets to production-ready signals in just one line of code. aidatapilot isn't just a library; it's an intelligent engine designed to handle the heavy lifting of data engineering so you can focus on the insight.


🎓 The aidatapilot Way (Mentoring Guide)

As your guide, I recommend starting with the "Simple API." It's designed to give you professional-grade results without the complexity of manual boilerplate.

1. The "Master Brain": auto_pipeline (Adaptive)

The most powerful command in aidatapilot. It analyzes your data, detects quality issues, and dynamically constructs a custom pipeline without any manual configuration. It also prints a beautiful Automation Decision Report explaining its choices.

import aidatapilot

# One command to rule them all
result = aidatapilot.auto_pipeline("messy_raw_data.csv")

2. General Data Cleaning: auto_clean

The "Gold Standard" for standard tabular data. It normalizes your column names, infers data types, fills missing values (0 for ints, "null" for strings), and removes duplicates.

v0.2.0 Upgrade: Now includes Intelligent Numeric Cleaning (converts "four hundred" to 400, removes currency symbols, and extracts numbers from mixed text).

# Perfect for daily reporting and BI
aidatapilot.auto_clean("sales_raw.xlsx", "sales_final.csv")

3. Machine Learning Ready: auto_ml_prep

Preparing data for a model? This command does everything auto_clean does, plus Outlier Clipping, Categorical Encoding, and MinMax Scaling. It also leverages the new Intelligent Numeric Cleaning to ensure your features are clean and ready for training.

# From raw data to 'model.fit()' ready
aidatapilot.auto_ml_prep("users.csv", "training_data.csv")

4. Specialized: auto_analytics & auto_text_prep

  • Use auto_analytics for time-series and BI reports.
  • Use auto_text_prep for LLM and RAG workflows (it handles text cleaning and chunking).

🧠 Intelligence Advisor (Proactive Analysis)

Before you clean, you might want to understand what's wrong. The Advisor provides a high-precision diagnostic report on your data health (Nulls, Outliers, ID gaps):

from aidatapilot import Advisor

# Get actionable suggestions
report = Advisor().analyze("mysterious_data.csv")
print(report.suggestions)

🛠 The Fluent Builder: Custom Pipelines (v0.2.0)

For ultimate control, use the Fluent Builder API. You can now chain processing steps one-by-one using human-friendly names:

from aidatapilot import Pipeline

# Build a custom journey step-by-step
pipe = Pipeline()
pipe.set_source("raw.csv")

# Chain operations with semantic 'then()'
pipe.add_step("normalize") \
    .then("types") \
    .then("missing_nulls") \
    .then("outliers") \
    .then("analytics", group_by="source")

# Execute with performance tracking
result = pipe.run()

📦 Installation

pip install aidatapilot

🏗 Why aidatapilot?

  • Production-Ready: Built with registry patterns and robust error handling.
  • Memory Safe: Designed to handle large datasets without crashing your environment.
  • Intelligent: Heuristic-based suggestions that improve over time.

Happy Automating! Feel free to reach out if you need help navigating your data pipelines.


🔍 Deep Dive: Understanding the Operations

Every "auto" command in aidatapilot is carefully designed to handle specific business and data needs. Here is exactly what happens under the hood:

🚀 auto_pilot (The Smart Choice)

  • Components: IntelligenceAdvisor + Recommended Template.
  • What it does: Dynamically analyzes data patterns (null counts, skewness, text length, dates) and selects the best cleaning path.
  • Why it's useful: Eliminates guesswork. Ideal for unknown or messy data when you don't know where to start. It's the "set it and forget it" tool.

auto_clean (The Gold Standard)

  • Components: normalize_columns, infer_types, handle_missing_data, deduplicate.
  • What it does: Cleans column names, casts types, interpolates IDs while filling other missing values (0 for integers, "null" for strings), and removes duplicates.
  • Why it's useful: The perfect daily cleaning tool. Ensures your data is tidy and error-free for most general tasks.

🤖 auto_ml_prep (Model Readiness)

  • Components: auto_clean steps + outlier_detection, encode_categorical, scale_numeric.
  • What it does: Beyond basic cleaning, it handles numeric outliers (via clipping), encodes text categories to numbers, and scales values (MinMax 0–1).
  • Why it's useful: High-speed preparation for model training. Most ML models (Scikit-Learn, PyTorch) require numeric, scaled data with no missing values.

📊 auto_analytics (BI & Reporting)

  • Components: normalize_columns, format_date, handle_missing_data, deduplicate, basic_aggregation.
  • What it does: Special focus on Universal Date Parsing and deduplication. Includes optional aggregation for quick reporting.
  • Why it's useful: Best for time-series data and business dashboards where consistency across dates and low redundancy is critical.

📄 auto_text_prep (LLM & RAG)

  • Components: normalize_columns, clean_text, generate_metadata, chunk_text.
  • What it does: Cleans document text, calculates word counts/lengths, and splits long text into overlapping chunks.
  • Why it's useful: Essential for AI applications. Prepares documents for embedding and storage in Vector Databases (like Pinecone or Chroma).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aidatapilot-0.2.3.tar.gz (61.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aidatapilot-0.2.3-py3-none-any.whl (70.6 kB view details)

Uploaded Python 3

File details

Details for the file aidatapilot-0.2.3.tar.gz.

File metadata

  • Download URL: aidatapilot-0.2.3.tar.gz
  • Upload date:
  • Size: 61.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for aidatapilot-0.2.3.tar.gz
Algorithm Hash digest
SHA256 a5c780607c8bcdaa22b1f7db20efcfafee8d29268e9562e95b63ce6e7ba1638e
MD5 ffa888a757dfe2a6f1dc8b507d40c2fb
BLAKE2b-256 3a7991fdd116c4abf32b97d1ccdc3fcaa44c7a677b4448f4b05a8909b18d797a

See more details on using hashes here.

File details

Details for the file aidatapilot-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: aidatapilot-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 70.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for aidatapilot-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 358f19edfb97b2604c662ff097b70d972307992dbce1308db97b842718f5d228
MD5 314835080c286f95ca9383f8c6852632
BLAKE2b-256 057fc45b02454e673f4534fce0684398feb9aa2a54ae6255ffde069f83bdb2bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page