Skip to main content

No project description provided

Project description

🚀 AAPrepflow

Integrated preprocessing tool combining an interactive UI/UX interface and a modular Python library to simplify data cleaning and exploration — allowing both visual experimentation and direct implementation to ensure efficient, consistent, and scalable preprocessing workflows.

1. Installation & Import

pip install AAPrepflow
import pandas as pd
import numpy as np
from aaprepflow import AAPrepflowCrossSection, AAPrepflowTimeSeries, AAPrepflowPanel, AAPrepflowBase

2. API Function

A. Direct Method (Quick)

Use this method for quick exploration. This method combines fit and transform in a single call.

# 1. Initialize Flow based on data type
flow_cs = AAPrepflowCrossSection(data_cs)
flow_ts = AAPrepflowTimeSeries(data_ts, col_time='date')
flow_panel = AAPrepflowPanel(data_panel, col_time='date', group_by='city')

# 2. Handling Missing Value (replace <method_name> from the table below)
data_cleaned = flow_cs.mv.clean_<method_name>()

# 3. Handling Outlier (replace <method_name> from the table below)
data_cleaned = flow_ts.outlier.apply_<method_name>()

B. Fit/Transform Method (Production Ready)

Use this pattern for production pipelines so you can fit on training data and transform on both training and testing data separately.

# 1. Initialize Flow with TRAINING data
flow = AAPrepflowTimeSeries(data_train_ts, col_time='date')

# 2. FIT on TRAINING data
# Learn parameters from train data (e.g.: median, IQR bounds, etc.)
flow.mv.fit_<method_name>()
flow.outlier.fit_<method_tame>()

# 3. TRANSFORM on Train and Test data
# Apply the learned parameters

# --- Transform Train Data ---
# (Using internal fitted data)
cleaned_train = flow.mv.transform()
cleaned_train = flow.outlier.transform_<handler_name>(cleaned_train)

# --- Transform Test Data ---
# (Using external test data)
cleaned_test = flow.mv.transform(test_data)
cleaned_test = flow.outlier.transform_<handler_name>(cleaned_test)

3. API Interactive Lab (Gradio)

You can launch an interactive lab to visually test strategies.

A. Lab with DataFrame

This method will directly load your data into the Gradio application.

# Initialize with data, group_by, and col_time
lab = AAPrepflowBase(data=df_panel, group_by='City', col_time='Date')

# Launch Interactive Lab
lab.flow_lab()

B. Lab without DataFrame

This method will launch an empty Gradio application, and you can upload a CSV file from the browser.

# Initialize AAPrepflowBase
lab = AAPrepflowBase()

# Launch Interactive Lab
lab.flow_lab()

4. API Reference <method_name>

Use the method names from the following table to replace <...> in the code examples above.

A. Missing Value (.mv)

Data Type Method (Quick) Method (Fit) Strategy Description
Cross-Section clean_listwise_deletion fit_listwise_deletion Remove rows with NA.
clean_mean_median_mode_imputer fit_mean_median_mode_imputer Fill with mean, median, or mode.
clean_hot_deck_imputer fit_hot_deck_imputer Fill with a random sample.
clean_regression_imputer fit_regression_imputer Fill with regression prediction.
clean_knn_imputer fit_knn_imputer Fill with K-Nearest Neighbors.
Time-Series / Panel clean_locf fit_locf Last Observation Carried Forward.
clean_nocb fit_nocb Next Observation Carried Backward.
clean_interpolation fit_interpolation Interpolation (e.g.: linear).
clean_seasonal_imputer fit_seasonal_imputer Fill with seasonal average.
clean_arima_imputer fit_arima_imputer Fill with ARIMA prediction.
clean_kalman_imputer fit_kalman_imputer Fill with Kalman Filter.
clean_moving_average_imputer fit_moving_average_imputer Fill with MA average.
clean_cagr_imputer fit_cagr_imputer Fill with geometric interpolation.

B. Outlier (.outlier)

Step 1: Detection (Quick apply_... or Fit fit_...)

Data Type Method (Quick) Method (Fit) Detection Description
Cross-Section apply_iqr_... fit_iqr Interquartile Range (IQR).
apply_zscore_... fit_zscore Z-Score (Standard Deviation).
Time-Series / Panel apply_iqr_... fit_iqr Interquartile Range (IQR).
apply_zscore_... fit_zscore Z-Score (Standard Deviation).
apply_rolling_iqr_... fit_rolling_iqr Rolling IQR (adaptive).

Step 2: Handling (Quick ..._capping or Transform transform_...)

Method (Quick) Method (Transform) Handling Description
apply_..._capping transform_capping Replace outlier with the boundary value.
apply_..._imputation transform_imputation Replace outlier with 'mean'/'median'.
apply_..._locf transform_locf Replace outlier with the last valid value.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aa_prepflow-0.1.tar.gz (54.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aa_prepflow-0.1-py3-none-any.whl (58.5 kB view details)

Uploaded Python 3

File details

Details for the file aa_prepflow-0.1.tar.gz.

File metadata

  • Download URL: aa_prepflow-0.1.tar.gz
  • Upload date:
  • Size: 54.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for aa_prepflow-0.1.tar.gz
Algorithm Hash digest
SHA256 0bcd36cffac175bc136149c4867c8252f3464f1ddec1e5723c528c2ef5bfb6fc
MD5 b5056425708c6e5e423bb9c8e7a13cb3
BLAKE2b-256 62586c1ed33c06d20c22bb04a05026ff29e4525f73b8bb595a9c1883b3680e87

See more details on using hashes here.

File details

Details for the file aa_prepflow-0.1-py3-none-any.whl.

File metadata

  • Download URL: aa_prepflow-0.1-py3-none-any.whl
  • Upload date:
  • Size: 58.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for aa_prepflow-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d77e758e63f3ae660968512344a89c898954e2c49c401310ed5805bc83f30c9e
MD5 39972b2557f63460db15003372e92f9d
BLAKE2b-256 42e2766a510fb31dd63c3bc86fa1240fce89f893ed514c96bd4b1def278d0420

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page