Lightweight Intelligent Data Automation Engine — plug-and-play pipelines for everyone.

These details have not been verified by PyPI

Project description

DataPilot 🚀 — Your Partner in Data Automation

"The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency." — Bill Gates

Welcome to DataPilot. I'm here to guide you from raw, messy datasets to production-ready signals in just one line of code. DataPilot isn't just a library; it's an intelligent engine designed to handle the heavy lifting of data engineering so you can focus on the insight.

🎓 The DataPilot Way (Mentoring Guide)

As your guide, I recommend starting with the "Simple API." It's designed to give you professional-grade results without the complexity of manual boilerplate.

1. The "Master Brain": `auto_pipeline` (Adaptive)

The most powerful command in DataPilot. It analyzes your data, detects quality issues, and dynamically constructs a custom pipeline without any manual configuration. It also prints a beautiful Automation Decision Report explaining its choices.

import datapilot

# One command to rule them all
result = datapilot.auto_pipeline("messy_raw_data.csv")

2. General Data Cleaning: `auto_clean`

The "Gold Standard" for standard tabular data. It normalizes your column names, infers data types, fills missing values (0 for ints, "null" for strings), and removes duplicates.

# Perfect for daily reporting and BI
datapilot.auto_clean("sales_raw.xlsx", "sales_final.csv")

3. Machine Learning Ready: `auto_ml_prep`

Preparing data for a model? This command does everything auto_clean does, plus Outlier Clipping, Categorical Encoding, and MinMax Scaling.

# From raw data to 'model.fit()' ready
datapilot.auto_ml_prep("users.csv", "training_data.csv")

4. Specialized: `auto_analytics` & `auto_text_prep`

Use auto_analytics for time-series and BI reports.
Use auto_text_prep for LLM and RAG workflows (it handles text cleaning and chunking).

🧠 Intelligence Advisor

Before you clean, you might want to understand what's wrong. Run the Intelligence Advisor to get a proactive report on your data health:

datapilot.analyze_dataset("mysterious_data.csv")

🛠 Becoming a Pro: The Pipeline Class

For those who need granular control, the Pipeline class is your cockpit. You can mix and match templates or define custom steps.

from datapilot import Pipeline

# Craft a custom journey
pipe = Pipeline(template="ml_preprocess")
pipe.set_source("raw.csv")
pipe.set_output("ready.csv")

# Execute with performance tracking
result = pipe.run()

📦 Installation

pip install datapilot

🏗 Why DataPilot?

Production-Ready: Built with registry patterns and robust error handling.
Memory Safe: Designed to handle large datasets without crashing your environment.
Intelligent: Heuristic-based suggestions that improve over time.

Happy Automating! Feel free to reach out if you need help navigating your data pipelines.

🔍 Deep Dive: Understanding the Operations

Every "auto" command in DataPilot is carefully designed to handle specific business and data needs. Here is exactly what happens under the hood:

🚀 `auto_pilot` (The Smart Choice)

Components: IntelligenceAdvisor + Recommended Template.
What it does: Dynamically analyzes data patterns (null counts, skewness, text length, dates) and selects the best cleaning path.
Why it's useful: Eliminates guesswork. Ideal for unknown or messy data when you don't know where to start. It's the "set it and forget it" tool.

✨ `auto_clean` (The Gold Standard)

Components: normalize_columns, infer_types, handle_missing_data, deduplicate.
What it does: Cleans column names, casts types, interpolates IDs while filling other missing values (0 for integers, "null" for strings), and removes duplicates.
Why it's useful: The perfect daily cleaning tool. Ensures your data is tidy and error-free for most general tasks.

🤖 `auto_ml_prep` (Model Readiness)

Components: auto_clean steps + outlier_detection, encode_categorical, scale_numeric.
What it does: Beyond basic cleaning, it handles numeric outliers (via clipping), encodes text categories to numbers, and scales values (MinMax 0–1).
Why it's useful: High-speed preparation for model training. Most ML models (Scikit-Learn, PyTorch) require numeric, scaled data with no missing values.

📊 `auto_analytics` (BI & Reporting)

Components: normalize_columns, format_date, handle_missing_data, deduplicate, basic_aggregation.
What it does: Special focus on Universal Date Parsing and deduplication. Includes optional aggregation for quick reporting.
Why it's useful: Best for time-series data and business dashboards where consistency across dates and low redundancy is critical.

📄 `auto_text_prep` (LLM & RAG)

Components: normalize_columns, clean_text, generate_metadata, chunk_text.
What it does: Cleans document text, calculates word counts/lengths, and splits long text into overlapping chunks.
Why it's useful: Essential for AI applications. Prepares documents for embedding and storage in Vector Databases (like Pinecone or Chroma).

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.4

Apr 6, 2026

0.2.3

Apr 3, 2026

This version

0.2.0

Apr 3, 2026

0.1.1

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aidatapilot-0.2.0-py3-none-any.whl (50.6 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file aidatapilot-0.2.0-py3-none-any.whl.

File metadata

Download URL: aidatapilot-0.2.0-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 50.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for aidatapilot-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`25dace05ccbc228104c18a4c1c3b6e2878fcbdedbfa107bbccda58e14e27a1a9`
MD5	`b724e19c4b95f4e1dd3357aa16be8ee2`
BLAKE2b-256	`3afc04f657394f36656f0f49df449fa4d43dda306f1963373c051ec9646498e7`

See more details on using hashes here.

aidatapilot 0.2.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

DataPilot 🚀 — Your Partner in Data Automation

🎓 The DataPilot Way (Mentoring Guide)

1. The "Master Brain": `auto_pipeline` (Adaptive)

2. General Data Cleaning: `auto_clean`

3. Machine Learning Ready: `auto_ml_prep`

4. Specialized: `auto_analytics` & `auto_text_prep`

🧠 Intelligence Advisor

🛠 Becoming a Pro: The Pipeline Class

📦 Installation

🏗 Why DataPilot?

🔍 Deep Dive: Understanding the Operations

🚀 `auto_pilot` (The Smart Choice)

✨ `auto_clean` (The Gold Standard)

🤖 `auto_ml_prep` (Model Readiness)

📊 `auto_analytics` (BI & Reporting)

📄 `auto_text_prep` (LLM & RAG)

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

aidatapilot 0.2.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

DataPilot 🚀 — Your Partner in Data Automation

🎓 The DataPilot Way (Mentoring Guide)

1. The "Master Brain": auto_pipeline (Adaptive)

2. General Data Cleaning: auto_clean

3. Machine Learning Ready: auto_ml_prep

4. Specialized: auto_analytics & auto_text_prep

🧠 Intelligence Advisor

🛠 Becoming a Pro: The Pipeline Class

📦 Installation

🏗 Why DataPilot?

🔍 Deep Dive: Understanding the Operations

🚀 auto_pilot (The Smart Choice)

✨ auto_clean (The Gold Standard)

🤖 auto_ml_prep (Model Readiness)

📊 auto_analytics (BI & Reporting)

📄 auto_text_prep (LLM & RAG)

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

1. The "Master Brain": `auto_pipeline` (Adaptive)

2. General Data Cleaning: `auto_clean`

3. Machine Learning Ready: `auto_ml_prep`

4. Specialized: `auto_analytics` & `auto_text_prep`

🚀 `auto_pilot` (The Smart Choice)

✨ `auto_clean` (The Gold Standard)

🤖 `auto_ml_prep` (Model Readiness)

📊 `auto_analytics` (BI & Reporting)

📄 `auto_text_prep` (LLM & RAG)