Model Selection Tool

Project description

🧠 Universal ML Model Explorer Pro

Python Platform License Maintained Status Build Downloads

One-line ML pipeline that preprocesses, trains, compares, and visualizes the best model — automatically.

Automatically train, evaluate, compare, and visualize multiple machine learning models — all with one command.

🚀 Features

Auto detection: Classification or Regression
Auto preprocessing: Scaling, Encoding, Imputation, PCA
Parallel model training on all cores
SHAP interpretability plots
Beautiful visual reports (Confusion Matrix, ROC, Residuals, etc.)
CLI + Notebook compatible

📦 Installation

pip install -r requirements.txt

🧪 CLI Usage

python main.py path/to/dataset.csv target_column_name

Optional flags:

--output_dir: Folder to save results (default: results)
--pca_components: Apply PCA on numeric features
--no_shap: Disable SHAP plot (faster)

🧬 Python Usage

from lazybrains import run_pipeline_in_notebook

run_pipeline_in_notebook(
    dataset_path="data.csv",
    target_column="target",
    pca_components=5,
    no_shap=False
)

📂 Output

best_model.pkl: Trained model
Plots: Confusion Matrix, ROC, Residuals, SHAP
model_report.txt: Full model comparison

🛠️ Supported Models

Linear, Tree-based, Ensemble (RF, GB, AdaBoost, XGBoost), KNN, SVM, Stacking
Auto selection of best based on Accuracy / R²

Run this in your terminal to install all dependencies

pip install pandas numpy matplotlib seaborn scikit-learn xgboost shap joblib rich

🔍 AutoFeatSelect

A Lightweight Python Library for Automatic Feature Selection
Smart. Fast. Interpretable.

🚀 What is AutoFeatSelect?

AutoFeatSelect is a fully automated feature selection tool that cleans your dataset by removing irrelevant, redundant, or low-value features—all with just one line of code. Whether you’re building a classification model or regression model, this tool will help you improve model performance and training speed without the hassle of manual preprocessing.

✨ Why AutoFeatSelect is Cool

✅ Zero manual inspection — It decides what to drop based on solid math.
🔄 Handles both numeric & categorical features
📉 Drops features using:
- Missing value ratio
- Low variance
- Correlation (pairwise & clustered)
- VIF (multicollinearity)
- Mutual Information
- Tree-based feature importance
📄 Detailed drop report (feature + reason)
🪶 Lightweight: Only uses pandas, numpy, scikit-learn, statsmodels, scipy

📦 Installation

pip install -U pandas numpy scikit-learn statsmodels scipy

🛠️ How to Use

from lazybrains import AutoFeatSelect

selector = AutoFeatSelect(
    target_col='target',     # Optional if you want supervised feature selection
    verbose=True             # Optional for progress logs
)

# Fit + transform in one line
df_cleaned = selector.fit_transform(df, drop=True)

# Or separately
selector.fit(df)
df_cleaned = selector.transform(df)

# See what got dropped and why
report = selector.get_report()
print(report)

🧠 When to Use

Before training ML models, especially with many features
When data has potential noise, ID columns, or redundancy
To reduce overfitting and improve model interpretability
During automated pipelines or pre-model sanity checks

📝 Example Output

[AutoFeatSelect] Running: Drop high missing values...
[AutoFeatSelect]   Dropped: ['unimportant_column']
[AutoFeatSelect] Running: Drop single value columns...
[AutoFeatSelect]   Dropped: ['constant_feature']
...
[AutoFeatSelect] Finished selection. Kept 22 out of 48 features.

📊 Feature Drop Criteria

Technique	Purpose
Missing Ratio	Drops features with mostly nulls
Unique Ratio (ID-like)	Removes fake IDs or row-wise unique cols
Variance Threshold	Removes constant or near-constant cols
Pearson Correlation	Drops highly correlated pairs
Hierarchical Clustering	Smarter groupwise redundancy pruning
VIF (Variance Inflation)	Drops multicollinear features
Mutual Information	Measures info contribution to target
Tree Importance	Uses ExtraTrees to measure signal power

🔍 AutoEDAPro

AutoEDAPro is a powerful, plug-and-play Python library for automated Exploratory Data Analysis (EDA).
It takes a pandas DataFrame and gives you a full, beautiful report — with stats, visuals, and deep insights — either inline (Jupyter) or as an HTML file.

🚀 Features

📦 One-line EDA: Pass a DataFrame, get full analysis
🔍 Missing values, constant features, outliers detection
📊 Univariate & Bivariate visualizations (histograms, boxplots, KDE, correlation heatmaps)
🎯 Optional target column analysis for classification & regression
📁 HTML report export with optional logging
✅ Jupyter inline display or standalone HTML output
✨ Built using pandas, seaborn, matplotlib, plotly, numpy

📦 Installation

First, make sure you have Python 3.7+

Install required dependencies:

pip install pandas numpy matplotlib seaborn plotly scikit-learn

Here’s a complete README.md 📘 for your AutoEDAPro library that covers everything a user needs:

🧪 Example Usage

from lazybrains import AutoEDA
import seaborn as sns

# Load sample dataset
df = sns.load_dataset('titanic')

# Run EDA inline (Jupyter)
eda = AutoEDA(target_col='survived')
eda.run(df)

# Run EDA and save report as HTML with logging
eda_html = AutoEDA(target_col='survived', save_report=True, enable_logging=True)
eda_html.run(df)

You can also test the library via CLI by running the script directly:

python autoeda.py

It will:

Try to load Titanic dataset via seaborn
Fall back to a dummy dataset if that fails
Run both inline and saved HTML reports

🧠 Parameters

Parameter	Type	Default	Description
`target_col`	`str`	`None`	Target column for supervised EDA
`save_report`	`bool`	`False`	If True, saves output as HTML
`output_filename`	`str`	`None`	Custom filename for saved HTML
`enable_logging`	`bool`	`False`	If True, creates a log of EDA steps

📁 Output

Inline Display: Shows report directly in Jupyter notebooks
HTML Report: If save_report=True, saves full interactive report with visualizations

🛠 Structure

Main file: autoeda.py Main class: AutoEDA

Each report contains:

📄 DataFrame shape, column types
❓ Missing values overview
🔁 Duplicate/constant columns
📊 Univariate plots for all features
⚠️ Outlier detection using IQR
🔗 Bivariate correlation heatmap + pairplots
🎯 Feature vs Target analysis

⚠️ Notes

For full display in script (not Jupyter), report is saved as HTML.
Uses Plotly CDN — make sure you're online for full interactivity.
Logging is optional but useful for debugging long processes.

💡 Ideas for Future

Auto feature selection preview
Optional modeling report (LazyPredict-style)
Model explainability (SHAP, LIME)
CLI and web interface

AutoClean 🧼

An advanced, scikit-learn style tabular data preprocessing pipeline.

AutoClean simplifies and automates the process of preparing tabular data for machine learning. From imputing missing values to handling outliers, encoding categoricals, and scaling features — all steps are neatly handled in a single pipeline.

🔧 Features

scikit-learn compatible: fit, transform, fit_transform
Customizable config-based preprocessing
Missing value imputation (mean, median, mode, constant, predictive)
Outlier detection and capping (IQR, Z-score)
Encoding (OneHot, Ordinal)
Feature Scaling (Standard, MinMax, Robust)
Detailed transformation summary (with optional Rich UI)

📦 Installation

pip install -r requirements.txt

Make sure scikit-learn, pandas, numpy are installed. For rich logs:

pip install rich

🚀 Quick Start

from lazybrains import AutoClean
import pandas as pd

# Sample data
df = pd.DataFrame({
    'age': [25, 30, None, 45, 50],
    'salary': [1000, 2000, 300000, 4000, None],
    'gender': ['M', 'F', None, 'F', 'M'],
    'city': ['Delhi', 'Mumbai', 'Delhi', 'Bangalore', 'Delhi']
})

# Configuration (optional)
config = {
    'impute': {'age': 'mean', 'gender': 'mode'},
    'outliers': {'salary': {'method': 'iqr', 'capping': True}},
    'encode': {'gender': 'ordinal', 'city': 'ohe'},
    'scale': {'salary': 'StandardScaler'}
}

# Use AutoClean
cleaner = AutoClean(config=config, verbose=True)
cleaned_df = cleaner.fit_transform(df)
print(cleaned_df.head())

⚙️ Configuration Options

config = {
    'impute': {
        'age': 'mean',            # or median, mode, constant, predictive
        'gender': 'mode'
    },
    'outliers': {
        'salary': {'method': 'iqr', 'capping': True}
    },
    'encode': {
        'gender': 'ordinal',      # or 'ohe'
        'city': 'ohe'
    },
    'scale': {
        'salary': 'StandardScaler', # or MinMaxScaler, RobustScaler
    }
}

✅ Output

Transformed DataFrame ready for ML.
Rich summary of all preprocessing steps.
Compatible with any sklearn pipeline.

🧠 Internals

Uses IterativeImputer + RandomForestRegressor for predictive imputation.
Rich logging with progress bars using the rich package.
Modular & extensible design for future enhancements.

🧑‍💻 Author

Made with ❤️ by a passionate Data Scientist.

📄 License

MIT License

Project details

Release history Release notifications | RSS feed

This version

4.0.1

Oct 10, 2025

4.0.0

Aug 1, 2025

3.0.0

Jul 30, 2025

2.0.0

Jul 30, 2025

1.0.8

Jul 29, 2025

1.0.7

Jul 29, 2025

1.0.5

Jul 29, 2025

1.0.4

Jul 29, 2025

1.0.2

Jul 29, 2025

1.0.1

Jul 29, 2025

1.0.0

Jul 29, 2025

0.2.0

Jul 28, 2025

0.1.0

Jul 27, 2025

0.0.5

Jul 24, 2025

0.0.3

Jul 22, 2025

0.0.2

Jul 22, 2025

0.0.1

Jul 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazybrains-4.0.1.tar.gz (27.7 kB view details)

Uploaded Oct 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lazybrains-4.0.1-py3-none-any.whl (23.6 kB view details)

Uploaded Oct 10, 2025 Python 3

File details

Details for the file lazybrains-4.0.1.tar.gz.

File metadata

Download URL: lazybrains-4.0.1.tar.gz
Upload date: Oct 10, 2025
Size: 27.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for lazybrains-4.0.1.tar.gz
Algorithm	Hash digest
SHA256	`8489613fba5eef8d89330889f1575b7634f83d9e46a60d789dc360214db14016`
MD5	`0df326f364eb6fbcec5f9a004438798e`
BLAKE2b-256	`d2fe30d3c87e19bad574775b246e9e676e2b8fdfa91cf09115979a40d1c31c94`

See more details on using hashes here.

File details

Details for the file lazybrains-4.0.1-py3-none-any.whl.

File metadata

Download URL: lazybrains-4.0.1-py3-none-any.whl
Upload date: Oct 10, 2025
Size: 23.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for lazybrains-4.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`855fb0365dcc055e1562da4c115b68264176efa85f4c905a87ff69550b292c3e`
MD5	`70315b0233c867e76f962442dd1d56af`
BLAKE2b-256	`344cb88ec13b3a154682624ceab7955c3104ecbbcc5e6583bbfbfc92d00e8b12`

See more details on using hashes here.

lazybrains 4.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

🧠 Universal ML Model Explorer Pro

🚀 Features

📦 Installation

🧪 CLI Usage

Optional flags:

🧬 Python Usage

📂 Output

🛠️ Supported Models

Run this in your terminal to install all dependencies

🔍 AutoFeatSelect

🚀 What is AutoFeatSelect?

✨ Why AutoFeatSelect is Cool

📦 Installation

🛠️ How to Use

🧠 When to Use

📝 Example Output

📊 Feature Drop Criteria

🔍 AutoEDAPro

🚀 Features

📦 Installation

🧪 Example Usage

🧠 Parameters

📁 Output

🛠 Structure

⚠️ Notes

💡 Ideas for Future

AutoClean 🧼

🔧 Features

📦 Installation

🚀 Quick Start

⚙️ Configuration Options

✅ Output

🧠 Internals

🧑‍💻 Author

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes