Skip to main content

An explainable, modular machine learning toolbox — a glass-box alternative to AutoML.

Project description

🧰 Machine Learning Toolbox

This repository provides a collection of modular, production-ready tools for building, evaluating, and interpreting machine learning models. Each tool is built with a focus on clean design, extensibility, and practical utility.


🧠 Philosophy: Explainable, Transparent, Glass-Box ML

Unlike traditional AutoML tools that act as "black boxes", this toolbox is designed to be a glass box — every decision, transformation, and output is fully transparent and controllable.

This toolbox is ideal for:

  • Data scientists who value interpretable models
  • Regulated industries requiring auditability
  • Educators and learners exploring machine learning foundations
  • Teams that prioritize trust and understanding over automation

We emphasize explainability through SHAP, feature importances, coefficients, and detailed diagnostics at every stage.


📦 Contents

Module Description
DataExplorer.py Exploratory Data Analysis (EDA) and VIF calculation
ML_pipeline.py Full preprocessing + modeling pipeline with cross-validation & diagnostics
ModelInterpreter.py SHAP, feature importance, and coefficient visualizations
RecommendationEngine.py Rank-based and segment-based recommendation strategies
Clustering.py Customer clustering with KMeans and cluster visualization

🔍 Module Overviews

DataExplorer.py

A lightweight class for quick exploratory analysis:

  • Displays dataset shape, dtypes, and missing values
  • Plots target distribution (auto-detects regression vs classification)
  • Correlation heatmap and Variance Inflation Factor (VIF)
  • Returns median-imputed numeric-only DataFrame for diagnostics

ML_pipeline.py

A complete scikit-learn-based pipeline manager:

  • Auto-detects numerical and categorical columns
  • Builds preprocessing pipeline (scaling, imputation, encoding)
  • Supports both regression and classification
  • Cross-validation with metrics, ROC, F1-thresholds, and confusion matrix
  • Built-in visualizations for:
    • Predicted vs. Actual
    • Residual plots
    • Error distribution
    • ROC curve and F1-threshold optimization

ModelInterpreter.py

Interpret model behavior post-training:

  • Works with pipelines and standalone models
  • Tree-based models: Feature importances
  • Linear models: Coefficients (with optional plot)
  • Universal SHAP summary plot (auto-handles pipelines)

RecommendationEngine.py

Simple framework for personalized customer targeting:

  • Identify top-N high-value customers by prediction scores
  • Recommend segments based on quantiles (e.g., LTV)
    • High-value → Retention
    • Low-value → Acquisition

Clustering.py

KMeans-based customer segmentation:

  • Automatically scales numeric features
  • Assigns cluster labels
  • Visualizes clusters with seaborn scatter plots

🚀 Getting Started

Each module can be used independently. Example usage:

from ML_pipeline import MLPipeline
pipeline = MLPipeline()
X_train, X_test, y_train, y_test = pipeline.split_data(df, 'target')
pipeline.fit(X_train, y_train)
pipeline.plot_roc_curve(X_test, y_test)

Or for model interpretation:

from ModelInterpreter import ModelInterpreter
interpreter = ModelInterpreter(model, X_train, task='classification')
interpreter.shap_summary()

📎 Requirements

  • scikit-learn
  • pandas, numpy
  • matplotlib, seaborn
  • shap
  • statsmodels (for VIF)

📌 Notes

  • SHAP is optimized for tree-based models; linear models are also supported
  • Pipelines handle preprocessing internally—no need to do it manually
  • Modules follow sklearn conventions for compatibility and ease of use

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glazzbocks-0.1.0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glazzbocks-0.1.0-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file glazzbocks-0.1.0.tar.gz.

File metadata

  • Download URL: glazzbocks-0.1.0.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for glazzbocks-0.1.0.tar.gz
Algorithm Hash digest
SHA256 803055fd1a12b86986055de8a8a403594f4964fc47faf64138c5ec2b93c612cc
MD5 d283b34104b828971190dd547a165895
BLAKE2b-256 c72b42e4466e9fb083bffcf025a45ab277e81d6a7c155287f961283738272e77

See more details on using hashes here.

File details

Details for the file glazzbocks-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: glazzbocks-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for glazzbocks-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1a01d9592d28638067929f1b407cbe34a41fb3cc901b3059b86edd1d9e582d0c
MD5 7ab26b639431dbe4064ba785d76d2576
BLAKE2b-256 3a6504310ed9d60d349d0b4e21310c40eb67616d54bbf1ea79914d37dd07e9d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page