Skip to main content

An end-to-end Python library for automated data preprocessing and model selection, designed to streamline ML workflows

Project description

AutoFlowML

Automated data preprocessing and model selection for production and analytics.

Overview

AutoFlowML automates the tedious parts of machine learning:

Data Cleaning (missing values, outliers, duplicates)

Feature Engineering (encoding, scaling, selection)

Model Selection (AutoML with optimized hyperparameters)

Built for data scientists who want to focus on insights, not boilerplate code.


Installation

pip install autoflowml

Getting Started

from autoflowml import CleanIt, NullFixer, run_tiny_automl
import pandas as pd

# Load your dataset
df = pd.read_csv("data.csv")

# Step 1: Clean the data
df_clean = CleanIt(df).full_clean()

# Step 2: Fix missing values
df_imputed = NullFixer(df_clean).nullfix_knn(n_neighbors=5)

# Step 3: Train the best model
run_tiny_automl(df_imputed, target_column="target", problem_type="regression")

Show Off More Features!

AutoFlowML is built for production. Here's how to remove outliers, encode features, and save models.

from autoflowml import AutoOutlier, CategoricalMaster
from joblib import dump, load

# Step 1: Outlier removal
df_no_outliers = AutoOutlier(method="isolation_forest").fit_transform(df_imputed)

# Step 2: Categorical encoding
encoded_df = CategoricalMaster(df_no_outliers, target_column="target").encode_auto()

# Step 3: Train the model
model = run_tiny_automl(encoded_df, target_column="target", problem_type="classification")

# Step 4: Save the model
dump(model, "best_model.pkl")

# Step 5: Load and predict
model = load("best_model.pkl")
predictions = model.predict(encoded_df.drop("target", axis=1))

3rd Party Integrations

AutoFlowML plays well with popular ML libraries:

  • evalml for automated model selection and hyperparameter tuning
  • category_encoders for powerful encoding techniques
  • Fully compatible with scikit-learn pipelines

Feature Highlights

Analytics + Cleaning

  • Rename messy columns
  • Remove duplicates
  • Fix null values with KNN, MICE, or time-aware methods

Outlier Detection

  • Supports Z-score, IQR, Isolation Forest

Encoding Support

  • One-Hot, Binary, and Target encoding built-in

AutoML

  • Regression and Classification
  • Feature selection and model tuning with evalml

Ready for Deployment

  • Serialize models using joblib
  • Predict on individual rows or full DataFrames

Feature Learning (Advanced)

You can use deep learning to extract useful features, then combine them with traditional models for superior performance. Use feature_learning=True in run_tiny_automl() (coming soon).


Categorical Ensembling (Coming Soon)

Train one model per category with train_categorical_ensemble(), perfect for use-cases like one model per region/store/customer.


Documentation

Full documentation coming soon. Until then, refer to examples in this README or explore the API directly.


What This Project Automates

  • Column cleanup, outlier detection, and data imputation
  • Encoding categorical variables
  • Automated model selection and evaluation
  • Feature importance and selection
  • Serialization for production

Running Tests

If contributing or debugging, run:

pytest tests/

Why AutoFlowML?

  • Saves hours of preprocessing
  • Production-ready pipelines
  • Transparent, extensible design
  • Lightning-fast single-row predictions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoflowml-0.1.0.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autoflowml-0.1.0-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file autoflowml-0.1.0.tar.gz.

File metadata

  • Download URL: autoflowml-0.1.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for autoflowml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3e45aae25da01d9c9a58bbcc4b0b70717f648f267f847f90fd3c45336dca0611
MD5 d9c650c97cd628be1e10cbabb3f309a6
BLAKE2b-256 5d2ec0f07b3ce7022c5f5bbd0453a10c787f234ce1fbf1d47f729247942ba3aa

See more details on using hashes here.

File details

Details for the file autoflowml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: autoflowml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for autoflowml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fc96d56e7ea1103f1676dcee4b57c0170b26a7af229a37873efe15c9ff71ce70
MD5 93a9fca837720136b7b758a9db9cb6b3
BLAKE2b-256 ab9bbe3e47032231bfaf19981869676a4667cb1878d27a3c837c97cebd51f1dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page