Skip to main content

No project description provided

Project description

AA Library Preparation Flow

Integrated preprocessing tool combining an interactive UI/UX interface and a modular Python library to simplify data cleaning and exploration — allowing both visual experimentation and direct implementation to ensure efficient, consistent, and scalable preprocessing workflows.

1. Installation & Import

pip install aa_prepflow
import pandas as pd
import numpy as np
from aa_prepflow import AAPrepflowCrossSection, AAPrepflowTimeSeries, AAPrepflowPanel, AAPrepflowBase

2. API Function

A. Direct Method (Quick)

Use this method for quick exploration. This method combines fit and transform in a single call.

# 1. Initialize Flow based on data type
flow_cs = AAPrepflowCrossSection(data_cs)
flow_ts = AAPrepflowTimeSeries(data_ts, col_time='date')
flow_panel = AAPrepflowPanel(data_panel, col_time='date', group_by='city')

# 2. Handling Missing Value (replace <method_name> from the table below)
data_cleaned = flow_cs.mv.clean_<method_name>()

# 3. Handling Outlier (replace <method_name> from the table below)
data_cleaned = flow_ts.outlier.apply_<method_name>()

B. Fit/Transform Method (Production Ready)

Use this pattern for production pipelines so you can fit on training data and transform on both training and testing data separately.

# 1. Initialize Flow with TRAINING data
flow = AAPrepflowTimeSeries(data_train_ts, col_time='date')

# 2. FIT on TRAINING data
# Learn parameters from train data (e.g.: median, IQR bounds, etc.)
flow.mv.fit_<method_name>()
flow.outlier.fit_<method_tame>()

# 3. TRANSFORM on Train and Test data
# Apply the learned parameters

# --- Transform Train Data ---
# (Using internal fitted data)
cleaned_train = flow.mv.transform()
cleaned_train = flow.outlier.transform_<handler_name>(cleaned_train)

# --- Transform Test Data ---
# (Using external test data)
cleaned_test = flow.mv.transform(test_data)
cleaned_test = flow.outlier.transform_<handler_name>(cleaned_test)

3. API Interactive Lab (Gradio)

You can launch an interactive lab to visually test strategies.

A. Lab with DataFrame

This method will directly load your data into the Gradio application.

# Initialize with data, group_by, and col_time
lab = AAPrepflowBase(data=df_panel, group_by='City', col_time='Date')

# Launch Interactive Lab
lab.flow_lab()

B. Lab without DataFrame

This method will launch an empty Gradio application, and you can upload a CSV file from the browser.

# Initialize AAPrepflowBase
lab = AAPrepflowBase()

# Launch Interactive Lab
lab.flow_lab()

4. API Reference <method_name>

Use the method names from the following table to replace <...> in the code examples above.

A. Missing Value (.mv)

Data Type Method (Quick) Method (Fit) Strategy Description
Cross-Section clean_listwise_deletion fit_listwise_deletion Remove rows with NA.
clean_mean_median_mode_imputer fit_mean_median_mode_imputer Fill with mean, median, or mode.
clean_hot_deck_imputer fit_hot_deck_imputer Fill with a random sample.
clean_regression_imputer fit_regression_imputer Fill with regression prediction.
clean_knn_imputer fit_knn_imputer Fill with K-Nearest Neighbors.
Time-Series / Panel clean_locf fit_locf Last Observation Carried Forward.
clean_nocb fit_nocb Next Observation Carried Backward.
clean_interpolation fit_interpolation Interpolation (e.g.: linear).
clean_seasonal_imputer fit_seasonal_imputer Fill with seasonal average.
clean_arima_imputer fit_arima_imputer Fill with ARIMA prediction.
clean_kalman_imputer fit_kalman_imputer Fill with Kalman Filter.
clean_moving_average_imputer fit_moving_average_imputer Fill with MA average.
clean_cagr_imputer fit_cagr_imputer Fill with geometric interpolation.

B. Outlier (.outlier)

Step 1: Detection (Quick apply_... or Fit fit_...)

Data Type Method (Quick) Method (Fit) Detection Description
Cross-Section apply_iqr_... fit_iqr Interquartile Range (IQR).
apply_zscore_... fit_zscore Z-Score (Standard Deviation).
Time-Series / Panel apply_iqr_... fit_iqr Interquartile Range (IQR).
apply_zscore_... fit_zscore Z-Score (Standard Deviation).
apply_rolling_iqr_... fit_rolling_iqr Rolling IQR (adaptive).

Step 2: Handling (Quick ..._capping or Transform transform_...)

Method (Quick) Method (Transform) Handling Description
apply_..._capping transform_capping Replace outlier with the boundary value.
apply_..._imputation transform_imputation Replace outlier with 'mean'/'median'.
apply_..._locf transform_locf Replace outlier with the last valid value.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aa_prepflow-0.1.1.tar.gz (54.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aa_prepflow-0.1.1-py3-none-any.whl (58.6 kB view details)

Uploaded Python 3

File details

Details for the file aa_prepflow-0.1.1.tar.gz.

File metadata

  • Download URL: aa_prepflow-0.1.1.tar.gz
  • Upload date:
  • Size: 54.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for aa_prepflow-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1b814623a4512fc60172a2475673d08e5149875234ef09316fd94b28995683b7
MD5 90e5fa6e5695a9debc881338b18d4a52
BLAKE2b-256 06455b9d91dd7f3b083508256e57fd19a0d0669b44a9e4a64730edbc00b2bd86

See more details on using hashes here.

File details

Details for the file aa_prepflow-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: aa_prepflow-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 58.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for aa_prepflow-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f7291a711b92b9321160f1cce801276f621a221bd91ca4f64bfc70553fd5bf80
MD5 b5ecc1dddd5c5191539d21cecd3def66
BLAKE2b-256 6bb1a81547570a67cabec68523a3f427dfb3c846deb355b08a5e5db204bda5a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page