A powerful stacking classifier framework that integrates advanced machine learning techniques, overfitting prevention, and explainability features such as LIME, SHAP, and model interpretation dashboards.

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
- Science/Research
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

volkanoban

The volkanoban library is a robust stacking framework that integrates models such as Random Forest, XGBoost, LightGBM, CatBoost, and other advanced techniques like Extra Trees, Bagging, and HistGradientBoosting. It also includes deep learning methods like MLPClassifier (a multi-layer neural network). These models are combined using stacking and voting to improve accuracy and performance across various datasets.

volkanoban provides a variety of powerful functions for model training, evaluation, and explainability.

Key Features of the volkanoban Library

Stacking Classifier Combines multiple models using a stacking approach, where a meta-learner is trained to make the final prediction based on the output of base learners.
Voting Classifier Uses soft voting to combine the predictions of multiple models, improving overall classification accuracy by leveraging the strengths of different algorithms.
Advanced Model Support

Tree-based models: Random Forest, XGBoost, LightGBM, CatBoost, Extra Trees, and HistGradientBoosting.

Gradient-based models: MLPClassifier (multi-layer perceptron). B agging models: BaggingClassifier for enhanced stability and accuracy.

Automatic Feature Scaling Automatically scales features when using gradient-based models (such as MLPClassifier) but skips scaling for tree-based models (such as XGBoost, LightGBM), which don't require it. This scaling is determined dynamically based on the models being used.

Overfitting Prevention

Limiting Tree Depth: In models like Random Forest, XGBoost, LightGBM, and CatBoost, the max_depth parameter is used to control overfitting by limiting the complexity of each tree.

Regularization (L1 and L2): Implemented in XGBoost, LightGBM, and MLPClassifier using reg_alpha (L1) and reg_lambda (L2) to penalize large weights and prevent overfitting.

Subsampling and Feature Sampling: Used in XGBoost, LightGBM, and CatBoost to improve generalization by sampling both data points and features.

Learning Rate: A lower learning rate helps models like XGBoost, LightGBM, and CatBoost generalize better by controlling how much the model adjusts with each learning step.

Data Imputation

Handles missing values using SimpleImputer and KNNImputer.

Explainability

Provides model explainability through:

LIME: Local Interpretable Model-agnostic Explanations.

Feature Importance Visualization: Visualizes feature importance from different models.

Cross-Validation

Evaluates model performance through k-fold cross-validation, ensuring robustness and generalizability.

Hyperparameter Tuning

Searches for optimal model parameters using GridSearchCV.

Advanced Metrics

Calculates metrics like Matthews Correlation Coefficient (MCC), ROC AUC, Accuracy, Precision, Recall, and F1 Score for comprehensive performance evaluation.

Installation

You can install the package using pip:

pip install volkanoban

Usage Example 1: Breast Cancer Dataset

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from volkanoban import volkanobanClassifier

# Load the breast cancer dataset

data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Initialize the volkanobanClassifier

classifier = volkanobanClassifier()

# Train the classifier

X_train, X_test, y_train, y_test = classifier.train(X, y)

# Predict on new data

y_pred = classifier.predict(X_test)

# Evaluate model performance

num_classes = len(np.unique(y_test))
classifier.evaluate_performance(y_test, y_pred, num_classes)

# Plot ROC curve

y_pred_proba = classifier.stacking_model.predict_proba(X_test)
classifier.plot_roc_curve(y_test, y_pred_proba)

# Perform LIME analysis

classifier.lime_analysis(X_train, X_test, index=0, feature_names=data.feature_names, class_names=data.target_names)


# Perform cross-validation

classifier.cross_validate(X, y, cv=5)

# Perform hyperparameter tuning

classifier.hyperparameter_tuning(X, y)

# Plot feature importance

classifier.plot_feature_importance(X.columns)

# Run the ExplainerDashboard

classifier.run_explainer_dashboard(X_train, X_test, y_test, X.columns)

Usage Example 2: Forest Cover Type Dataset

import numpy as np
import pandas as pd
from sklearn.datasets import fetch_covtype
from volkanoban import volkanobanClassifier

# Load the forest cover type dataset

data = fetch_covtype()
X = pd.DataFrame(data.data)  # Feature matrix
y = pd.Series(data.target)   # Target variable

# Initialize the volkanobanClassifier
classifier = volkanobanClassifier()

# Train the classifier (Train the dataset)
X_train, X_test, y_train, y_test = classifier.train(X, y)

# Predict on new data (make predictions on the test set)
y_pred = classifier.predict(X_test)

# 1. Evaluate performance (Evaluate model performance)
num_classes = len(np.unique(y_test))  # Determine the number of classes in the test set
classifier.evaluate_performance(y_test, y_pred, num_classes)

# 2. Plot ROC Curve (Visualize the ROC curve with AUC)
y_pred_proba = classifier.stacking_model.predict_proba(X_test)
classifier.plot_roc_curve(y_test, y_pred_proba)

# 3. Calculate Extra Metrics (Matthews Correlation Coefficient and ROC AUC Score)
classifier.extra_metrics(y_test, y_pred)

# 4. Perform LIME analysis (Analyze a single instance for explainability)
feature_names = X.columns  # Get feature names
class_names = [str(i) for i in np.unique(y)]  # Define class names
classifier.lime_analysis(X_train, X_test, index=0, feature_names=feature_names, class_names=class_names)

# 5. Cross-Validation (Evaluate model performance using cross-validation)
classifier.cross_validate(X, y, cv=5)

# 6. Hyperparameter Tuning (Perform hyperparameter tuning using GridSearchCV)
classifier.hyperparameter_tuning(X, y)

# 7. Plot Feature Importance (Visualize the importance of features from different models)
classifier.plot_feature_importance(X.columns)

# 8. Run the ExplainerDashboard (Launch an interactive dashboard to explore model insights)
classifier.run_explainer_dashboard(X_train, X_test, y_test, X.columns)

Usage Example 3: Wine Dataset (Multi-class Classification)

import numpy as np
import pandas as pd
from sklearn.datasets import load_wine
from volkanoban import volkanobanClassifier

# Load the wine dataset

data = load_wine()
X = pd.DataFrame(data.data, columns=data.feature_names)  # Feature matrix
y = pd.Series(data.target)  # Target variable

# Initialize the volkanobanClassifier
classifier = volkanobanClassifier()

# Train the classifier (Train the dataset)
X_train, X_test, y_train, y_test = classifier.train(X, y)

# Predict on new data (make predictions on the test set)
y_pred = classifier.predict(X_test)

# 1. Evaluate performance (evaluate model performance)
num_classes = len(np.unique(y_test))  # Determine the number of classes in the test set
classifier.evaluate_performance(y_test, y_pred, num_classes)

# 2. Plot ROC Curve (Visualize the ROC curve with AUC)
y_pred_proba = classifier.stacking_model.predict_proba(X_test)
classifier.plot_roc_curve(y_test, y_pred_proba)

# 3. Calculate Extra Metrics (Matthews Correlation Coefficient and ROC AUC Score)
classifier.extra_metrics(y_test, y_pred)

# 4. Perform LIME analysis (Analyze a single instance for explainability)
feature_names = X.columns  # Get feature names
class_names = [str(i) for i in np.unique(y)]  # Define class names
classifier.lime_analysis(X_train, X_test, index=0, feature_names=feature_names, class_names=class_names)


# 5. Cross-Validation (Evaluate model performance using cross-validation)
classifier.cross_validate(X, y, cv=5)

# 6. Hyperparameter Tuning (Perform hyperparameter tuning using GridSearchCV)
classifier.hyperparameter_tuning(X, y)

# 7. Plot Feature Importance (Visualize the importance of features from different models)
classifier.plot_feature_importance(X.columns)

# 8. Run the ExplainerDashboard (Launch an interactive dashboard to explore model insights)
classifier.run_explainer_dashboard(X_train, X_test, y_test, feature_names)

Predict Function Description

The predict function in volkanoban allows for generating predictions on unseen data. It ensures that the input data provided for prediction matches the feature set used during model training, thereby maintaining consistency. This function is highly flexible, enabling predictions on single or batch inputs.

The predict function allows making predictions for new input data. It supports making predictions on unseen data and ensures the input data matches the expected features used by the model.

Example Usage:

from volkanoban import volkanobanClassifier
import pandas as pd

# Initialize the classifier
classifier = volkanobanClassifier()

# Example input data: replace with actual feature values
input_data = {"mean radius": 14.2, "mean texture": 15.6, "mean perimeter": 89.0, "mean area": 530.0}

# Convert input data to a DataFrame matching the model's expected feature format
df_input = pd.DataFrame([input_data])

# Make predictions
y_pred = classifier.predict(df_input)

# Output the predicted class
print("Predicted class:", y_pred)

Function Descriptions

evaluate_performance

This function evaluates the model's performance using metrics like accuracy, precision, recall, F1 score, and confusion matrix. It prints a well-formatted table for easy interpretation.

Arguments:

y_true: Ground truth labels.
y_pred: Predicted labels by the model.
num_classes: Number of unique classes in the dataset.

Example Usage:

classifier.evaluate_performance(y_test, y_pred, num_classes)

lime_analysis

This function generates a LIME explanation for a specific test instance, showing how individual features influence the model's prediction.

Arguments:

X_train: The scaled training dataset.
X_test: The scaled testing dataset.
index: Index of the test instance to analyze.
feature_names: List of feature names from the dataset.
class_names: List of class names corresponding to the target variable.

Example Usage:

classifier.lime_analysis(X_train, X_test, 0, feature_names, class_names)

plot_feature_importance

This function visualizes feature importance across base models in the stacking classifier.

Arguments:

feature_names: List of feature names from the dataset.

Example Usage:

classifier.plot_feature_importance(feature_names)

run_explainer_dashboard

This function launches an interactive dashboard using explainerdashboard, allowing exploration of the model's predictions, feature importance, and more.

Arguments:

X_train: The scaled training dataset.
X_test: The scaled testing dataset.
y_test: Ground truth labels for the testing dataset.
feature_names: List of feature names from the dataset.
dashboard_title: Optional title for the dashboard.

Additional Functions

Predict Function

The predict function allows you to make predictions for new input data. It supports predictions on unseen data and ensures the input data matches the expected features used by the model.

Example input features

input_data = {"mean radius": 12.0, "mean texture": 18.0, "mean perimeter": 80.0, "mean area": 450.0}

Convert to DataFrame

df_input = pd.DataFrame([input_data])

Predict the behavior

y_pred = classifier.predict(df_input) print("Predicted behavior:", y_pred)

Evaluate performance

This function evaluates the model's performance using metrics like accuracy, precision, recall, F1 score, and confusion matrix. It prints a well-formatted table for easy interpretation.

Perform

classifier.evaluate_performance(y_test, y_pred, num_classes=2)

LIME analysis

The lime_analysis function generates a LIME explanation for a specific test instance, showing how individual features influence the model's prediction.

Perform

classifier.lime_analysis(X_train, X_test, index=0, feature_names=data.feature_names, class_names=data.target_names)

plot_feature_importance

The plot_feature_importance function visualizes the feature importance across the base models in the stacking classifier.

Perform

classifier.plot_feature_importance(X.columns)

run_explainer_dashboard

The run_explainer_dashboard function launches an interactive dashboard using explainerdashboard, allowing exploration of the model's predictions, feature importance, and more.

Run

classifier.run_explainer_dashboard(X_train, X_test, y_test, X.columns)

Overfitting Prevention Strategies Applied:

Limiting Tree Depth: In models like RandomForest, XGBoost, and LightGBM, the max_depth parameter is used to control overfitting by limiting the complexity of each tree.
Regularization (L1 and L2): Implemented in XGBoost, LightGBM, and MLPClassifier using reg_alpha (L1) and reg_lambda (L2) to penalize large weights and prevent overfitting.
Early Stopping: Applied in XGBoost, LightGBM, and CatBoost to stop training when the model's performance on the validation set no longer improves.
Gradient-based Models Scaling: Automatically scales input features when using models like MLPClassifier, ensuring that gradient-based algorithms perform optimally. Tree-based models like XGBoost and LightGBM, which do not require scaling, are excluded from this process.
Cross-Validation: Ensures model generalization by evaluating performance across different splits of the data, reducing the likelihood of overfitting to a specific training set.

This implementation effectively reduces the risk of overfitting while maintaining the flexibility and performance of various machine learning models in the volkanoban framework.

Project details

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
- Science/Research
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

This version

0.1.23

Oct 6, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

volkanoban-0.1.23.tar.gz (12.9 kB view details)

Uploaded Oct 6, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

volkanoban-0.1.23-py3-none-any.whl (9.2 kB view details)

Uploaded Oct 6, 2024 Python 3

File details

Details for the file volkanoban-0.1.23.tar.gz.

File metadata

Download URL: volkanoban-0.1.23.tar.gz
Upload date: Oct 6, 2024
Size: 12.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for volkanoban-0.1.23.tar.gz
Algorithm	Hash digest
SHA256	`d5236f339fd6dd10b7017250b435b93d3ba327e109ec64c34543de5ba9265733`
MD5	`229a7928f53eb5ede6d55386bd44a0cf`
BLAKE2b-256	`35f86c331677150f5f6977894b55f0b1c5e72f9efc17f43c891b19db295e1de9`

See more details on using hashes here.

File details

Details for the file volkanoban-0.1.23-py3-none-any.whl.

File metadata

Download URL: volkanoban-0.1.23-py3-none-any.whl
Upload date: Oct 6, 2024
Size: 9.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for volkanoban-0.1.23-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1d030a773cf8bce5b817834f82abc6f9a55f1a934a876eefd5f016d625e447dc`
MD5	`0591677a4f98519ee322339f5a69a6ae`
BLAKE2b-256	`c399d3ad7283da6cda2935df0ce37792b4afe616409234f4be6c14742ebc4508`

See more details on using hashes here.

volkanoban 0.1.23

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Key Features of the volkanoban Library

Installation

Usage Example 1: Breast Cancer Dataset

Usage Example 2: Forest Cover Type Dataset

Usage Example 3: Wine Dataset (Multi-class Classification)

Predict Function Description

Function Descriptions

evaluate_performance

lime_analysis

plot_feature_importance

run_explainer_dashboard

Additional Functions

Example input features

Convert to DataFrame

Predict the behavior

Evaluate performance

Perform

LIME analysis

Perform

plot_feature_importance

Perform

run_explainer_dashboard

Run

Overfitting Prevention Strategies Applied:

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes