No project description provided
Project description
🚀 AAPrepflow
Integrated preprocessing tool combining an interactive UI/UX interface and a modular Python library to simplify data cleaning and exploration — allowing both visual experimentation and direct implementation to ensure efficient, consistent, and scalable preprocessing workflows.
1. Installation & Import
pip install AAPrepflow
import pandas as pd
import numpy as np
from aaprepflow import AAPrepflowCrossSection, AAPrepflowTimeSeries, AAPrepflowPanel, AAPrepflowBase
2. API Function
A. Direct Method (Quick)
Use this method for quick exploration. This method combines fit and transform in a single call.
# 1. Initialize Flow based on data type
flow_cs = AAPrepflowCrossSection(data_cs)
flow_ts = AAPrepflowTimeSeries(data_ts, col_time='date')
flow_panel = AAPrepflowPanel(data_panel, col_time='date', group_by='city')
# 2. Handling Missing Value (replace <method_name> from the table below)
data_cleaned = flow_cs.mv.clean_<method_name>()
# 3. Handling Outlier (replace <method_name> from the table below)
data_cleaned = flow_ts.outlier.apply_<method_name>()
B. Fit/Transform Method (Production Ready)
Use this pattern for production pipelines so you can fit on training data and transform on both training and testing data separately.
# 1. Initialize Flow with TRAINING data
flow = AAPrepflowTimeSeries(data_train_ts, col_time='date')
# 2. FIT on TRAINING data
# Learn parameters from train data (e.g.: median, IQR bounds, etc.)
flow.mv.fit_<method_name>()
flow.outlier.fit_<method_tame>()
# 3. TRANSFORM on Train and Test data
# Apply the learned parameters
# --- Transform Train Data ---
# (Using internal fitted data)
cleaned_train = flow.mv.transform()
cleaned_train = flow.outlier.transform_<handler_name>(cleaned_train)
# --- Transform Test Data ---
# (Using external test data)
cleaned_test = flow.mv.transform(test_data)
cleaned_test = flow.outlier.transform_<handler_name>(cleaned_test)
3. API Interactive Lab (Gradio)
You can launch an interactive lab to visually test strategies.
A. Lab with DataFrame
This method will directly load your data into the Gradio application.
# Initialize with data, group_by, and col_time
lab = AAPrepflowBase(data=df_panel, group_by='City', col_time='Date')
# Launch Interactive Lab
lab.flow_lab()
B. Lab without DataFrame
This method will launch an empty Gradio application, and you can upload a CSV file from the browser.
# Initialize AAPrepflowBase
lab = AAPrepflowBase()
# Launch Interactive Lab
lab.flow_lab()
4. API Reference <method_name>
Use the method names from the following table to replace <...> in the code examples above.
A. Missing Value (.mv)
| Data Type | Method (Quick) | Method (Fit) | Strategy Description |
|---|---|---|---|
| Cross-Section | clean_listwise_deletion |
fit_listwise_deletion |
Remove rows with NA. |
clean_mean_median_mode_imputer |
fit_mean_median_mode_imputer |
Fill with mean, median, or mode. | |
clean_hot_deck_imputer |
fit_hot_deck_imputer |
Fill with a random sample. | |
clean_regression_imputer |
fit_regression_imputer |
Fill with regression prediction. | |
clean_knn_imputer |
fit_knn_imputer |
Fill with K-Nearest Neighbors. | |
| Time-Series / Panel | clean_locf |
fit_locf |
Last Observation Carried Forward. |
clean_nocb |
fit_nocb |
Next Observation Carried Backward. | |
clean_interpolation |
fit_interpolation |
Interpolation (e.g.: linear). | |
clean_seasonal_imputer |
fit_seasonal_imputer |
Fill with seasonal average. | |
clean_arima_imputer |
fit_arima_imputer |
Fill with ARIMA prediction. | |
clean_kalman_imputer |
fit_kalman_imputer |
Fill with Kalman Filter. | |
clean_moving_average_imputer |
fit_moving_average_imputer |
Fill with MA average. | |
clean_cagr_imputer |
fit_cagr_imputer |
Fill with geometric interpolation. |
B. Outlier (.outlier)
Step 1: Detection (Quick apply_... or Fit fit_...)
| Data Type | Method (Quick) | Method (Fit) | Detection Description |
|---|---|---|---|
| Cross-Section | apply_iqr_... |
fit_iqr |
Interquartile Range (IQR). |
apply_zscore_... |
fit_zscore |
Z-Score (Standard Deviation). | |
| Time-Series / Panel | apply_iqr_... |
fit_iqr |
Interquartile Range (IQR). |
apply_zscore_... |
fit_zscore |
Z-Score (Standard Deviation). | |
apply_rolling_iqr_... |
fit_rolling_iqr |
Rolling IQR (adaptive). |
Step 2: Handling (Quick ..._capping or Transform transform_...)
| Method (Quick) | Method (Transform) | Handling Description |
|---|---|---|
apply_..._capping |
transform_capping |
Replace outlier with the boundary value. |
apply_..._imputation |
transform_imputation |
Replace outlier with 'mean'/'median'. |
apply_..._locf |
transform_locf |
Replace outlier with the last valid value. |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aa_prepflow-0.1.tar.gz.
File metadata
- Download URL: aa_prepflow-0.1.tar.gz
- Upload date:
- Size: 54.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bcd36cffac175bc136149c4867c8252f3464f1ddec1e5723c528c2ef5bfb6fc
|
|
| MD5 |
b5056425708c6e5e423bb9c8e7a13cb3
|
|
| BLAKE2b-256 |
62586c1ed33c06d20c22bb04a05026ff29e4525f73b8bb595a9c1883b3680e87
|
File details
Details for the file aa_prepflow-0.1-py3-none-any.whl.
File metadata
- Download URL: aa_prepflow-0.1-py3-none-any.whl
- Upload date:
- Size: 58.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d77e758e63f3ae660968512344a89c898954e2c49c401310ed5805bc83f30c9e
|
|
| MD5 |
39972b2557f63460db15003372e92f9d
|
|
| BLAKE2b-256 |
42e2766a510fb31dd63c3bc86fa1240fce89f893ed514c96bd4b1def278d0420
|