A glass-box machine learning toolbox for interpretable pipelines
Project description
Machine Learning Toolbox
This repository provides a collection of modular, production-ready tools for building, evaluating, and interpreting machine learning models. Each tool is built with a focus on clean design, extensibility, and practical utility.
Philosophy: Explainable, Transparent, Glass-Box ML
Unlike traditional AutoML tools that act as "black boxes", this toolbox is designed to be a glass box — every decision, transformation, and output is fully transparent and controllable.
This toolbox is ideal for:
- Data scientists who value interpretable models
- Regulated industries requiring auditability
- Educators and learners exploring machine learning foundations
- Teams that prioritize trust and understanding over automation
We emphasize explainability through SHAP, feature importances, coefficients, and detailed diagnostics at every stage.
Contents
| Module | Description |
|---|---|
DataExplorer.py |
Exploratory Data Analysis (EDA) and VIF calculation |
ML_pipeline.py |
Full preprocessing + modeling pipeline with cross-validation & diagnostics |
ModelInterpreter.py |
SHAP, feature importance, and coefficient visualizations |
RecommendationEngine.py |
Rank-based and segment-based recommendation strategies |
Clustering.py |
Customer clustering with KMeans and cluster visualization |
Module Overviews
DataExplorer.py
A lightweight class for quick exploratory analysis:
- Displays dataset shape, dtypes, and missing values
- Plots target distribution (auto-detects regression vs classification)
- Correlation heatmap and Variance Inflation Factor (VIF)
- Returns median-imputed numeric-only DataFrame for diagnostics
ML_pipeline.py
A complete scikit-learn-based pipeline manager:
- Auto-detects numerical and categorical columns
- Builds preprocessing pipeline (scaling, imputation, encoding)
- Supports both regression and classification
- Cross-validation with metrics, ROC, F1-thresholds, and confusion matrix
- Built-in visualizations for:
- Predicted vs. Actual
- Residual plots
- Error distribution
- ROC curve and F1-threshold optimization
ModelInterpreter.py
Interpret model behavior post-training:
- Works with pipelines and standalone models
- Tree-based models: Feature importances
- Linear models: Coefficients (with optional plot)
- Universal SHAP summary plot (auto-handles pipelines)
RecommendationEngine.py
Simple framework for personalized customer targeting:
- Identify top-N high-value customers by prediction scores
- Recommend segments based on quantiles (e.g., LTV)
- High-value → Retention
- Low-value → Acquisition
Clustering.py
KMeans-based customer segmentation:
- Automatically scales numeric features
- Assigns cluster labels
- Visualizes clusters with seaborn scatter plots
Getting Started
Each module can be used independently. Example usage:
from ML_pipeline import MLPipeline
pipeline = MLPipeline()
X_train, X_test, y_train, y_test = pipeline.split_data(df, 'target')
pipeline.fit(X_train, y_train)
pipeline.plot_roc_curve(X_test, y_test)
Or for model interpretation:
from ModelInterpreter import ModelInterpreter
interpreter = ModelInterpreter(model, X_train, task='classification')
interpreter.shap_summary()
Requirements
scikit-learnpandas,numpymatplotlib,seabornshapstatsmodels(for VIF)
Notes
- SHAP is optimized for tree-based models; linear models are also supported
- Pipelines handle preprocessing internally—no need to do it manually
- Modules follow sklearn conventions for compatibility and ease of use
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file glazzbocks-0.1.3.tar.gz.
File metadata
- Download URL: glazzbocks-0.1.3.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8586f6e90aa32ea400fbb0c502fd6005f33698be28253a45a75e979c780e4011
|
|
| MD5 |
8336e1706181879dfdbfff6750acc3a2
|
|
| BLAKE2b-256 |
234bb785ce574948ca036fd6a5c9cc115a31f2fed309c6888cd76c4fe9fec35c
|
File details
Details for the file glazzbocks-0.1.3-py3-none-any.whl.
File metadata
- Download URL: glazzbocks-0.1.3-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37f12830f1b6e9e9c40a1739e6a6687da3463bf7fd4afe6f053f1252d378379a
|
|
| MD5 |
56c8c92460a8de2916d847f52b034fac
|
|
| BLAKE2b-256 |
ae031d64017a8c1a6fd0228b87c41bdf5aa0f3f1358958563e8ebac22f177e95
|