An end-to-end Python library for automated data preprocessing and model selection, designed to streamline ML workflows
Project description
AutoFlowML
Automated data preprocessing and model selection for production and analytics.
Overview
AutoFlowML automates the tedious parts of machine learning:
Data Cleaning (missing values, outliers, duplicates)
Feature Engineering (encoding, scaling, selection)
Model Selection (AutoML with optimized hyperparameters)
Built for data scientists who want to focus on insights, not boilerplate code.
Installation
pip install autoflowml
Getting Started
from autoflowml import CleanIt, NullFixer, run_tiny_automl
import pandas as pd
# Load your dataset
df = pd.read_csv("data.csv")
# Step 1: Clean the data
df_clean = CleanIt(df).full_clean()
# Step 2: Fix missing values
df_imputed = NullFixer(df_clean).nullfix_knn(n_neighbors=5)
# Step 3: Train the best model
run_tiny_automl(df_imputed, target_column="target", problem_type="regression")
Show Off More Features!
AutoFlowML is built for production. Here's how to remove outliers, encode features, and save models.
from autoflowml import AutoOutlier, CategoricalMaster
from joblib import dump, load
# Step 1: Outlier removal
df_no_outliers = AutoOutlier(method="isolation_forest").fit_transform(df_imputed)
# Step 2: Categorical encoding
encoded_df = CategoricalMaster(df_no_outliers, target_column="target").encode_auto()
# Step 3: Train the model
model = run_tiny_automl(encoded_df, target_column="target", problem_type="classification")
# Step 4: Save the model
dump(model, "best_model.pkl")
# Step 5: Load and predict
model = load("best_model.pkl")
predictions = model.predict(encoded_df.drop("target", axis=1))
3rd Party Integrations
AutoFlowML plays well with popular ML libraries:
evalmlfor automated model selection and hyperparameter tuningcategory_encodersfor powerful encoding techniques- Fully compatible with
scikit-learnpipelines
Feature Highlights
Analytics + Cleaning
- Rename messy columns
- Remove duplicates
- Fix null values with KNN, MICE, or time-aware methods
Outlier Detection
- Supports Z-score, IQR, Isolation Forest
Encoding Support
- One-Hot, Binary, and Target encoding built-in
AutoML
- Regression and Classification
- Feature selection and model tuning with
evalml
Ready for Deployment
- Serialize models using
joblib - Predict on individual rows or full DataFrames
Feature Learning (Advanced)
You can use deep learning to extract useful features, then combine them with traditional models for superior performance. Use feature_learning=True in run_tiny_automl() (coming soon).
Categorical Ensembling (Coming Soon)
Train one model per category with train_categorical_ensemble(), perfect for use-cases like one model per region/store/customer.
Documentation
Full documentation coming soon. Until then, refer to examples in this README or explore the API directly.
What This Project Automates
- Column cleanup, outlier detection, and data imputation
- Encoding categorical variables
- Automated model selection and evaluation
- Feature importance and selection
- Serialization for production
Running Tests
If contributing or debugging, run:
pytest tests/
Why AutoFlowML?
- Saves hours of preprocessing
- Production-ready pipelines
- Transparent, extensible design
- Lightning-fast single-row predictions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autoflowml-0.1.0.tar.gz.
File metadata
- Download URL: autoflowml-0.1.0.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e45aae25da01d9c9a58bbcc4b0b70717f648f267f847f90fd3c45336dca0611
|
|
| MD5 |
d9c650c97cd628be1e10cbabb3f309a6
|
|
| BLAKE2b-256 |
5d2ec0f07b3ce7022c5f5bbd0453a10c787f234ce1fbf1d47f729247942ba3aa
|
File details
Details for the file autoflowml-0.1.0-py3-none-any.whl.
File metadata
- Download URL: autoflowml-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc96d56e7ea1103f1676dcee4b57c0170b26a7af229a37873efe15c9ff71ce70
|
|
| MD5 |
93a9fca837720136b7b758a9db9cb6b3
|
|
| BLAKE2b-256 |
ab9bbe3e47032231bfaf19981869676a4667cb1878d27a3c837c97cebd51f1dc
|