Skip to main content

Model Selection Tool

Project description

๐Ÿง  Universal ML Model Explorer Pro

Python Platform License Maintained Status Build Downloads

One-line ML pipeline that preprocesses, trains, compares, and visualizes the best model โ€” automatically.

Automatically train, evaluate, compare, and visualize multiple machine learning models โ€” all with one command.

๐Ÿš€ Features

  • Auto detection: Classification or Regression
  • Auto preprocessing: Scaling, Encoding, Imputation, PCA
  • Parallel model training on all cores
  • SHAP interpretability plots
  • Beautiful visual reports (Confusion Matrix, ROC, Residuals, etc.)
  • CLI + Notebook compatible

๐Ÿ“ฆ Installation

pip install -r requirements.txt

๐Ÿงช CLI Usage

python main.py path/to/dataset.csv target_column_name

Optional flags:

  • --output_dir: Folder to save results (default: results)
  • --pca_components: Apply PCA on numeric features
  • --no_shap: Disable SHAP plot (faster)

๐Ÿงฌ Python Usage

from lazybrains import run_pipeline_in_notebook

run_pipeline_in_notebook(
    dataset_path="data.csv",
    target_column="target",
    pca_components=5,
    no_shap=False
)

๐Ÿ“‚ Output

  • best_model.pkl: Trained model
  • Plots: Confusion Matrix, ROC, Residuals, SHAP
  • model_report.txt: Full model comparison

๐Ÿ› ๏ธ Supported Models

  • Linear, Tree-based, Ensemble (RF, GB, AdaBoost, XGBoost), KNN, SVM, Stacking
  • Auto selection of best based on Accuracy / Rยฒ

Run this in your terminal to install all dependencies

pip install pandas numpy matplotlib seaborn scikit-learn xgboost shap joblib rich

๐Ÿ” AutoFeatSelect

A Lightweight Python Library for Automatic Feature Selection
Smart. Fast. Interpretable.


๐Ÿš€ What is AutoFeatSelect?

AutoFeatSelect is a fully automated feature selection tool that cleans your dataset by removing irrelevant, redundant, or low-value featuresโ€”all with just one line of code. Whether youโ€™re building a classification model or regression model, this tool will help you improve model performance and training speed without the hassle of manual preprocessing.


โœจ Why AutoFeatSelect is Cool

  • โœ… Zero manual inspection โ€” It decides what to drop based on solid math.
  • ๐Ÿ”„ Handles both numeric & categorical features
  • ๐Ÿ“‰ Drops features using:
    • Missing value ratio
    • Low variance
    • Correlation (pairwise & clustered)
    • VIF (multicollinearity)
    • Mutual Information
    • Tree-based feature importance
  • ๐Ÿ“„ Detailed drop report (feature + reason)
  • ๐Ÿชถ Lightweight: Only uses pandas, numpy, scikit-learn, statsmodels, scipy

๐Ÿ“ฆ Installation

pip install -U pandas numpy scikit-learn statsmodels scipy

Clone this repo or copy AutoFeatSelect into your project.


๐Ÿ› ๏ธ How to Use

from lazybrains import AutoFeatSelect

selector = AutoFeatSelect(
    target_col='target',     # Optional if you want supervised feature selection
    verbose=True             # Optional for progress logs
)

# Fit + transform in one line
df_cleaned = selector.fit_transform(df, drop=True)

# Or separately
selector.fit(df)
df_cleaned = selector.transform(df)

# See what got dropped and why
report = selector.get_report()
print(report)

๐Ÿง  When to Use

  • Before training ML models, especially with many features
  • When data has potential noise, ID columns, or redundancy
  • To reduce overfitting and improve model interpretability
  • During automated pipelines or pre-model sanity checks

๐Ÿ“ Example Output

[AutoFeatSelect] Running: Drop high missing values...
[AutoFeatSelect]   Dropped: ['unimportant_column']
[AutoFeatSelect] Running: Drop single value columns...
[AutoFeatSelect]   Dropped: ['constant_feature']
...
[AutoFeatSelect] Finished selection. Kept 22 out of 48 features.

๐Ÿ“Š Feature Drop Criteria

Technique Purpose
Missing Ratio Drops features with mostly nulls
Unique Ratio (ID-like) Removes fake IDs or row-wise unique cols
Variance Threshold Removes constant or near-constant cols
Pearson Correlation Drops highly correlated pairs
Hierarchical Clustering Smarter groupwise redundancy pruning
VIF (Variance Inflation) Drops multicollinear features
Mutual Information Measures info contribution to target
Tree Importance Uses ExtraTrees to measure signal power

๐Ÿค Author

Built by Gemini Version: 1.0.0


โค๏ธ Contribute / Fork

Feel free to fork and extend this library โ€” make it smarter, add plotting, or wrap it into a full AutoML pipeline!


๐Ÿ”“ License

MIT โ€” Use freely, just don't claim it's yours ๐Ÿ˜„


Let me know if you want a logo, GitHub structure, or demo notebook too! ๐Ÿ“๐Ÿ“ˆ

๐Ÿ” AutoEDAPro

AutoEDAPro is a powerful, plug-and-play Python library for automated Exploratory Data Analysis (EDA).
It takes a pandas DataFrame and gives you a full, beautiful report โ€” with stats, visuals, and deep insights โ€” either inline (Jupyter) or as an HTML file.


๐Ÿš€ Features

  • ๐Ÿ“ฆ One-line EDA: Pass a DataFrame, get full analysis
  • ๐Ÿ” Missing values, constant features, outliers detection
  • ๐Ÿ“Š Univariate & Bivariate visualizations (histograms, boxplots, KDE, correlation heatmaps)
  • ๐ŸŽฏ Optional target column analysis for classification & regression
  • ๐Ÿ“ HTML report export with optional logging
  • โœ… Jupyter inline display or standalone HTML output
  • โœจ Built using pandas, seaborn, matplotlib, plotly, numpy

๐Ÿ“ฆ Installation

First, make sure you have Python 3.7+

Install required dependencies:

pip install pandas numpy matplotlib seaborn plotly scikit-learn

Hereโ€™s a complete README.md ๐Ÿ“˜ for your AutoEDAPro library that covers everything a user needs:



๐Ÿงช Example Usage

from lazybrains import AutoEDA
import seaborn as sns

# Load sample dataset
df = sns.load_dataset('titanic')

# Run EDA inline (Jupyter)
eda = AutoEDA(target_col='survived')
eda.run(df)

# Run EDA and save report as HTML with logging
eda_html = AutoEDA(target_col='survived', save_report=True, enable_logging=True)
eda_html.run(df)

You can also test the library via CLI by running the script directly:

python autoeda.py

It will:

  • Try to load Titanic dataset via seaborn
  • Fall back to a dummy dataset if that fails
  • Run both inline and saved HTML reports

๐Ÿง  Parameters

Parameter Type Default Description
target_col str None Target column for supervised EDA
save_report bool False If True, saves output as HTML
output_filename str None Custom filename for saved HTML
enable_logging bool False If True, creates a log of EDA steps

๐Ÿ“ Output

  • Inline Display: Shows report directly in Jupyter notebooks
  • HTML Report: If save_report=True, saves full interactive report with visualizations

๐Ÿ›  Structure

Main file: autoeda.py Main class: AutoEDA

Each report contains:

  1. ๐Ÿ“„ DataFrame shape, column types
  2. โ“ Missing values overview
  3. ๐Ÿ” Duplicate/constant columns
  4. ๐Ÿ“Š Univariate plots for all features
  5. โš ๏ธ Outlier detection using IQR
  6. ๐Ÿ”— Bivariate correlation heatmap + pairplots
  7. ๐ŸŽฏ Feature vs Target analysis

โš ๏ธ Notes

  • For full display in script (not Jupyter), report is saved as HTML.
  • Uses Plotly CDN โ€” make sure you're online for full interactivity.
  • Logging is optional but useful for debugging long processes.

๐Ÿ“ฌ License

Free to use and modify. Credits appreciated!


๐Ÿ’ก Ideas for Future

  • Auto feature selection preview
  • Optional modeling report (LazyPredict-style)
  • Model explainability (SHAP, LIME)
  • CLI and web interface

AutoClean ๐Ÿงผ

An advanced, scikit-learn style tabular data preprocessing pipeline.

AutoClean simplifies and automates the process of preparing tabular data for machine learning. From imputing missing values to handling outliers, encoding categoricals, and scaling features โ€” all steps are neatly handled in a single pipeline.


๐Ÿ”ง Features

  • scikit-learn compatible: fit, transform, fit_transform
  • Customizable config-based preprocessing
  • Missing value imputation (mean, median, mode, constant, predictive)
  • Outlier detection and capping (IQR, Z-score)
  • Encoding (OneHot, Ordinal)
  • Feature Scaling (Standard, MinMax, Robust)
  • Detailed transformation summary (with optional Rich UI)

๐Ÿ“ฆ Installation

pip install -r requirements.txt

Make sure scikit-learn, pandas, numpy are installed. For rich logs:

pip install rich

๐Ÿš€ Quick Start

from lazybrains import AutoClean
import pandas as pd

# Sample data
df = pd.DataFrame({
    'age': [25, 30, None, 45, 50],
    'salary': [1000, 2000, 300000, 4000, None],
    'gender': ['M', 'F', None, 'F', 'M'],
    'city': ['Delhi', 'Mumbai', 'Delhi', 'Bangalore', 'Delhi']
})

# Configuration (optional)
config = {
    'impute': {'age': 'mean', 'gender': 'mode'},
    'outliers': {'salary': {'method': 'iqr', 'capping': True}},
    'encode': {'gender': 'ordinal', 'city': 'ohe'},
    'scale': {'salary': 'StandardScaler'}
}

# Use AutoClean
cleaner = AutoClean(config=config, verbose=True)
cleaned_df = cleaner.fit_transform(df)
print(cleaned_df.head())

โš™๏ธ Configuration Options

config = {
    'impute': {
        'age': 'mean',            # or median, mode, constant, predictive
        'gender': 'mode'
    },
    'outliers': {
        'salary': {'method': 'iqr', 'capping': True}
    },
    'encode': {
        'gender': 'ordinal',      # or 'ohe'
        'city': 'ohe'
    },
    'scale': {
        'salary': 'StandardScaler', # or MinMaxScaler, RobustScaler
    }
}

โœ… Output

  • Transformed DataFrame ready for ML.
  • Rich summary of all preprocessing steps.
  • Compatible with any sklearn pipeline.

๐Ÿง  Internals

  • Uses IterativeImputer + RandomForestRegressor for predictive imputation.
  • Rich logging with progress bars using the rich package.
  • Modular & extensible design for future enhancements.

๐Ÿง‘โ€๐Ÿ’ป Author

Made with โค๏ธ by a passionate Data Scientist.


๐Ÿ“„ License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazybrains-4.0.0.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazybrains-4.0.0-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file lazybrains-4.0.0.tar.gz.

File metadata

  • Download URL: lazybrains-4.0.0.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for lazybrains-4.0.0.tar.gz
Algorithm Hash digest
SHA256 2940093dfbe4fb44c543f9c67eae5166e53a9ba3608f9b344b67cf7ab7f1c820
MD5 95e3f461155550c38fc4295583a7adda
BLAKE2b-256 d1b9af4cab4577d648ab99038c1162148e16dc93a2ae13940dd76fbddc93fd89

See more details on using hashes here.

File details

Details for the file lazybrains-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: lazybrains-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for lazybrains-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5193a99d2bc13c65d2a14056a0131a5b83852ccb2d677776bbc77d9dab43aa86
MD5 e1f217216b789b4491968f99e83b8076
BLAKE2b-256 d5219141cfc25c5522847af05b27ac8a822f70a49dacbd4072006abe343442ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page