Model Selection Tool
Project description
๐ง Universal ML Model Explorer Pro
One-line ML pipeline that preprocesses, trains, compares, and visualizes the best model โ automatically.
Automatically train, evaluate, compare, and visualize multiple machine learning models โ all with one command.
๐ Features
- Auto detection: Classification or Regression
- Auto preprocessing: Scaling, Encoding, Imputation, PCA
- Parallel model training on all cores
- SHAP interpretability plots
- Beautiful visual reports (Confusion Matrix, ROC, Residuals, etc.)
- CLI + Notebook compatible
๐ฆ Installation
pip install -r requirements.txt
๐งช CLI Usage
python main.py path/to/dataset.csv target_column_name
Optional flags:
--output_dir: Folder to save results (default:results)--pca_components: Apply PCA on numeric features--no_shap: Disable SHAP plot (faster)
๐งฌ Python Usage
from lazybrains import run_pipeline_in_notebook
run_pipeline_in_notebook(
dataset_path="data.csv",
target_column="target",
pca_components=5,
no_shap=False
)
๐ Output
best_model.pkl: Trained model- Plots: Confusion Matrix, ROC, Residuals, SHAP
model_report.txt: Full model comparison
๐ ๏ธ Supported Models
- Linear, Tree-based, Ensemble (RF, GB, AdaBoost, XGBoost), KNN, SVM, Stacking
- Auto selection of best based on Accuracy / Rยฒ
Run this in your terminal to install all dependencies
pip install pandas numpy matplotlib seaborn scikit-learn xgboost shap joblib rich
๐ AutoFeatSelect
A Lightweight Python Library for Automatic Feature Selection
Smart. Fast. Interpretable.
๐ What is AutoFeatSelect?
AutoFeatSelect is a fully automated feature selection tool that cleans your dataset by removing irrelevant, redundant, or low-value featuresโall with just one line of code. Whether youโre building a classification model or regression model, this tool will help you improve model performance and training speed without the hassle of manual preprocessing.
โจ Why AutoFeatSelect is Cool
- โ Zero manual inspection โ It decides what to drop based on solid math.
- ๐ Handles both numeric & categorical features
- ๐ Drops features using:
- Missing value ratio
- Low variance
- Correlation (pairwise & clustered)
- VIF (multicollinearity)
- Mutual Information
- Tree-based feature importance
- ๐ Detailed drop report (feature + reason)
- ๐ชถ Lightweight: Only uses
pandas,numpy,scikit-learn,statsmodels,scipy
๐ฆ Installation
pip install -U pandas numpy scikit-learn statsmodels scipy
๐ ๏ธ How to Use
from lazybrains import AutoFeatSelect
selector = AutoFeatSelect(
target_col='target', # Optional if you want supervised feature selection
verbose=True # Optional for progress logs
)
# Fit + transform in one line
df_cleaned = selector.fit_transform(df, drop=True)
# Or separately
selector.fit(df)
df_cleaned = selector.transform(df)
# See what got dropped and why
report = selector.get_report()
print(report)
๐ง When to Use
- Before training ML models, especially with many features
- When data has potential noise, ID columns, or redundancy
- To reduce overfitting and improve model interpretability
- During automated pipelines or pre-model sanity checks
๐ Example Output
[AutoFeatSelect] Running: Drop high missing values...
[AutoFeatSelect] Dropped: ['unimportant_column']
[AutoFeatSelect] Running: Drop single value columns...
[AutoFeatSelect] Dropped: ['constant_feature']
...
[AutoFeatSelect] Finished selection. Kept 22 out of 48 features.
๐ Feature Drop Criteria
| Technique | Purpose |
|---|---|
| Missing Ratio | Drops features with mostly nulls |
| Unique Ratio (ID-like) | Removes fake IDs or row-wise unique cols |
| Variance Threshold | Removes constant or near-constant cols |
| Pearson Correlation | Drops highly correlated pairs |
| Hierarchical Clustering | Smarter groupwise redundancy pruning |
| VIF (Variance Inflation) | Drops multicollinear features |
| Mutual Information | Measures info contribution to target |
| Tree Importance | Uses ExtraTrees to measure signal power |
๐ AutoEDAPro
AutoEDAPro is a powerful, plug-and-play Python library for automated Exploratory Data Analysis (EDA).
It takes a pandas DataFrame and gives you a full, beautiful report โ with stats, visuals, and deep insights โ either inline (Jupyter) or as an HTML file.
๐ Features
- ๐ฆ One-line EDA: Pass a DataFrame, get full analysis
- ๐ Missing values, constant features, outliers detection
- ๐ Univariate & Bivariate visualizations (histograms, boxplots, KDE, correlation heatmaps)
- ๐ฏ Optional target column analysis for classification & regression
- ๐ HTML report export with optional logging
- โ Jupyter inline display or standalone HTML output
- โจ Built using pandas, seaborn, matplotlib, plotly, numpy
๐ฆ Installation
First, make sure you have Python 3.7+
Install required dependencies:
pip install pandas numpy matplotlib seaborn plotly scikit-learn
Hereโs a complete README.md ๐ for your AutoEDAPro library that covers everything a user needs:
๐งช Example Usage
from lazybrains import AutoEDA
import seaborn as sns
# Load sample dataset
df = sns.load_dataset('titanic')
# Run EDA inline (Jupyter)
eda = AutoEDA(target_col='survived')
eda.run(df)
# Run EDA and save report as HTML with logging
eda_html = AutoEDA(target_col='survived', save_report=True, enable_logging=True)
eda_html.run(df)
You can also test the library via CLI by running the script directly:
python autoeda.py
It will:
- Try to load Titanic dataset via seaborn
- Fall back to a dummy dataset if that fails
- Run both inline and saved HTML reports
๐ง Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
target_col |
str |
None |
Target column for supervised EDA |
save_report |
bool |
False |
If True, saves output as HTML |
output_filename |
str |
None |
Custom filename for saved HTML |
enable_logging |
bool |
False |
If True, creates a log of EDA steps |
๐ Output
- Inline Display: Shows report directly in Jupyter notebooks
- HTML Report: If
save_report=True, saves full interactive report with visualizations
๐ Structure
Main file: autoeda.py
Main class: AutoEDA
Each report contains:
- ๐ DataFrame shape, column types
- โ Missing values overview
- ๐ Duplicate/constant columns
- ๐ Univariate plots for all features
- โ ๏ธ Outlier detection using IQR
- ๐ Bivariate correlation heatmap + pairplots
- ๐ฏ Feature vs Target analysis
โ ๏ธ Notes
- For full display in script (not Jupyter), report is saved as HTML.
- Uses Plotly CDN โ make sure you're online for full interactivity.
- Logging is optional but useful for debugging long processes.
๐ก Ideas for Future
- Auto feature selection preview
- Optional modeling report (LazyPredict-style)
- Model explainability (SHAP, LIME)
- CLI and web interface
AutoClean ๐งผ
An advanced, scikit-learn style tabular data preprocessing pipeline.
AutoClean simplifies and automates the process of preparing tabular data for machine learning. From imputing missing values to handling outliers, encoding categoricals, and scaling features โ all steps are neatly handled in a single pipeline.
๐ง Features
- scikit-learn compatible:
fit,transform,fit_transform - Customizable config-based preprocessing
- Missing value imputation (mean, median, mode, constant, predictive)
- Outlier detection and capping (IQR, Z-score)
- Encoding (OneHot, Ordinal)
- Feature Scaling (Standard, MinMax, Robust)
- Detailed transformation summary (with optional Rich UI)
๐ฆ Installation
pip install -r requirements.txt
Make sure scikit-learn, pandas, numpy are installed. For rich logs:
pip install rich
๐ Quick Start
from lazybrains import AutoClean
import pandas as pd
# Sample data
df = pd.DataFrame({
'age': [25, 30, None, 45, 50],
'salary': [1000, 2000, 300000, 4000, None],
'gender': ['M', 'F', None, 'F', 'M'],
'city': ['Delhi', 'Mumbai', 'Delhi', 'Bangalore', 'Delhi']
})
# Configuration (optional)
config = {
'impute': {'age': 'mean', 'gender': 'mode'},
'outliers': {'salary': {'method': 'iqr', 'capping': True}},
'encode': {'gender': 'ordinal', 'city': 'ohe'},
'scale': {'salary': 'StandardScaler'}
}
# Use AutoClean
cleaner = AutoClean(config=config, verbose=True)
cleaned_df = cleaner.fit_transform(df)
print(cleaned_df.head())
โ๏ธ Configuration Options
config = {
'impute': {
'age': 'mean', # or median, mode, constant, predictive
'gender': 'mode'
},
'outliers': {
'salary': {'method': 'iqr', 'capping': True}
},
'encode': {
'gender': 'ordinal', # or 'ohe'
'city': 'ohe'
},
'scale': {
'salary': 'StandardScaler', # or MinMaxScaler, RobustScaler
}
}
โ Output
- Transformed
DataFrameready for ML. - Rich summary of all preprocessing steps.
- Compatible with any sklearn pipeline.
๐ง Internals
- Uses
IterativeImputer+RandomForestRegressorfor predictive imputation. - Rich logging with progress bars using the
richpackage. - Modular & extensible design for future enhancements.
๐งโ๐ป Author
Made with โค๏ธ by a passionate Data Scientist.
๐ License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lazybrains-4.0.1.tar.gz.
File metadata
- Download URL: lazybrains-4.0.1.tar.gz
- Upload date:
- Size: 27.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8489613fba5eef8d89330889f1575b7634f83d9e46a60d789dc360214db14016
|
|
| MD5 |
0df326f364eb6fbcec5f9a004438798e
|
|
| BLAKE2b-256 |
d2fe30d3c87e19bad574775b246e9e676e2b8fdfa91cf09115979a40d1c31c94
|
File details
Details for the file lazybrains-4.0.1-py3-none-any.whl.
File metadata
- Download URL: lazybrains-4.0.1-py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
855fb0365dcc055e1562da4c115b68264176efa85f4c905a87ff69550b292c3e
|
|
| MD5 |
70315b0233c867e76f962442dd1d56af
|
|
| BLAKE2b-256 |
344cb88ec13b3a154682624ceab7955c3104ecbbcc5e6583bbfbfc92d00e8b12
|