Skip to main content

Model Selection Tool

Project description

🧠 Universal ML Model Explorer Pro

Python Platform License Maintained Status Build Downloads

One-line ML pipeline that preprocesses, trains, compares, and visualizes the best model — automatically.

Automatically train, evaluate, compare, and visualize multiple machine learning models — all with one command.

🚀 Features

  • Auto detection: Classification or Regression
  • Auto preprocessing: Scaling, Encoding, Imputation, PCA
  • Parallel model training on all cores
  • SHAP interpretability plots
  • Beautiful visual reports (Confusion Matrix, ROC, Residuals, etc.)
  • CLI + Notebook compatible

📦 Installation

pip install -r requirements.txt

🧪 CLI Usage

python main.py path/to/dataset.csv target_column_name

Optional flags:

  • --output_dir: Folder to save results (default: results)
  • --pca_components: Apply PCA on numeric features
  • --no_shap: Disable SHAP plot (faster)

🧬 Python Usage

from yourlib import run_pipeline_in_notebook

run_pipeline_in_notebook(
    dataset_path="data.csv",
    target_column="target",
    pca_components=5,
    no_shap=False
)

📂 Output

  • best_model.pkl: Trained model
  • Plots: Confusion Matrix, ROC, Residuals, SHAP
  • model_report.txt: Full model comparison

🛠️ Supported Models

  • Linear, Tree-based, Ensemble (RF, GB, AdaBoost, XGBoost), KNN, SVM, Stacking
  • Auto selection of best based on Accuracy / R²

Run this in your terminal to install all dependencies

pip install pandas numpy matplotlib seaborn scikit-learn xgboost shap joblib rich

🔍 AutoFeatSelect

A Lightweight Python Library for Automatic Feature Selection
Smart. Fast. Interpretable.


🚀 What is AutoFeatSelect?

AutoFeatSelect is a fully automated feature selection tool that cleans your dataset by removing irrelevant, redundant, or low-value features—all with just one line of code. Whether you’re building a classification model or regression model, this tool will help you improve model performance and training speed without the hassle of manual preprocessing.


✨ Why AutoFeatSelect is Cool

  • Zero manual inspection — It decides what to drop based on solid math.
  • 🔄 Handles both numeric & categorical features
  • 📉 Drops features using:
    • Missing value ratio
    • Low variance
    • Correlation (pairwise & clustered)
    • VIF (multicollinearity)
    • Mutual Information
    • Tree-based feature importance
  • 📄 Detailed drop report (feature + reason)
  • 🪶 Lightweight: Only uses pandas, numpy, scikit-learn, statsmodels, scipy

📦 Installation

pip install -U pandas numpy scikit-learn statsmodels scipy

Clone this repo or copy AutoFeatSelect into your project.


🛠️ How to Use

from autofeatselect import AutoFeatSelect

selector = AutoFeatSelect(
    target_col='target',     # Optional if you want supervised feature selection
    verbose=True             # Optional for progress logs
)

# Fit + transform in one line
df_cleaned = selector.fit_transform(df, drop=True)

# Or separately
selector.fit(df)
df_cleaned = selector.transform(df)

# See what got dropped and why
report = selector.get_report()
print(report)

🧠 When to Use

  • Before training ML models, especially with many features
  • When data has potential noise, ID columns, or redundancy
  • To reduce overfitting and improve model interpretability
  • During automated pipelines or pre-model sanity checks

📝 Example Output

[AutoFeatSelect] Running: Drop high missing values...
[AutoFeatSelect]   Dropped: ['unimportant_column']
[AutoFeatSelect] Running: Drop single value columns...
[AutoFeatSelect]   Dropped: ['constant_feature']
...
[AutoFeatSelect] Finished selection. Kept 22 out of 48 features.

📊 Feature Drop Criteria

Technique Purpose
Missing Ratio Drops features with mostly nulls
Unique Ratio (ID-like) Removes fake IDs or row-wise unique cols
Variance Threshold Removes constant or near-constant cols
Pearson Correlation Drops highly correlated pairs
Hierarchical Clustering Smarter groupwise redundancy pruning
VIF (Variance Inflation) Drops multicollinear features
Mutual Information Measures info contribution to target
Tree Importance Uses ExtraTrees to measure signal power

🤝 Author

Built by Gemini Version: 1.0.0


❤️ Contribute / Fork

Feel free to fork and extend this library — make it smarter, add plotting, or wrap it into a full AutoML pipeline!


🔓 License

MIT — Use freely, just don't claim it's yours 😄


Let me know if you want a logo, GitHub structure, or demo notebook too! 📁📈

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazybrains-2.0.0.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazybrains-2.0.0-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file lazybrains-2.0.0.tar.gz.

File metadata

  • Download URL: lazybrains-2.0.0.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for lazybrains-2.0.0.tar.gz
Algorithm Hash digest
SHA256 da763929b5deff430204252260ff8d3d5913e11362d79df9f8ea3d0f04d9215c
MD5 9cfb515ef5e41398692c52eb9a0524bd
BLAKE2b-256 28638087d92c942309fc3c581eba5779544d90e821ca67386667bd7217d484c8

See more details on using hashes here.

File details

Details for the file lazybrains-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: lazybrains-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for lazybrains-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d1690eb1bd8fbc334ef5fc9e2f3f18f8001efe6176f2d6b3aef86bbbbaa40d3d
MD5 4db0631162fd98cbf9252743c1c24808
BLAKE2b-256 2e9d6825d768d96ec2b9eb8982700571c309c893ab9cdc045bedb51eededbb4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page