A Python package for simultaneous regression and binary classification for educational analytics.
Project description
dualPredictor
by D
dualPredictor
is a Python package that can provide simultaneous regression and binary classification results for tabular datasets.
- Simultaneous Predictions: A model that perform regression and binary classification tasks simultaneously
- Regressor Selection (choose one): Choose from Lasso, Ridge, or LinearRegression(OLS) as the base regression model.
- Dynamic Cutoff Tuning metrics (choose one): Automatically tunes the cutoff value to maximize the Youden index, F1, or F2 score. Users can choose a metrics type.
1. Youden Index (J)
$$J= Recall + Specificity - 1$$ J is a measure of the overall performance of a binary classifier. It is calculated as the sum of the recall and specificity minus 1. A high J statistic indicates that the classifier performs well on positive and negative cases.
- Recall measures a classifier's ability to identify positive cases correctly. A high recall means that the classifier is avoiding miss detects.
- Specificity measures the ability of a classifier to identify negative cases correctly. A high specificity means that the classifier is avoiding false alarms.
2. F scores (Option F1, F2 in Package)
$$F1 = 2 \cdot \frac{precision \cdot recall}{precision + recall}$$
F1 score is another measure of the overall performance of a binary classifier. It is calculated as the harmonic mean of the precision and recall. A high F1 score indicates that the classifier is performing well on both positive and negative cases.
$$F_\beta = (1 + \beta^2) \cdot \frac{precision \cdot recall}{\beta^2 \cdot precision + recall}$$ F-score with factor beta is a generalization of the F1 score that allows for different weights to be given to precision and recall. A beta value less than 1 indicates that the F-score is prone to precision, while a beta value greater than 1 indicates that the F-score is prone to recall.
Installation
Install dualPredictor
directly from PyPI using pip:
pip install dualPredictor
or Directly install from the Github Repo:
pip install git+https://github.com/098765d/dualPredictor.git
Dependencies dualPredictor requires:
- numpy
- scikit-learn
- matplotlib
- seaborn
DualModel
The DualModel
class is a custom regressor that combines a base regression model (lasso, ridge, or OLS) with a dual classification approach. It allows for tuning an optimal cut-off value to classify samples into two classes based on the predicted regression values.
Parameters
-
model_type
(str, default='lasso'): The base regression model to use. Supported options are 'lasso', 'ridge', and 'ols' (Ordinary Least Squares). -
metric
(str, default='youden_index'): The metric used to tune the optimal cut-off value. Supported options are 'f1_score', 'f2_score', and 'youden_index'. -
default_cut_off
(float, default=0.5): The default cut-off value used to create binary labels. Samples with regression values below the cut-off are labeled as 0, and samples above or equal to the cut-off are labeled as 1.
Methods
-
fit(X, y)
: Fit the DualModel to the training data.-
Parameters:
X
(array-like of shape (n_samples, n_features)): The input training data.y
(array-like of shape (n_samples,)): The target values.
-
Returns:
self
: Fitted DualModel instance.
-
-
predict(X)
: Predict the input data's regression values and binary classifications.-
Parameters:
X
(array-like of shape (n_samples, n_features)): The input data for prediction.
-
Returns:
grade_predictions
(array-like of shape (n_samples,)): The predicted regression values.class_predictions
(array-like of shape (n_samples,)): The predicted binary classifications based on the optimal cut-off.
-
Attributes
alpha_
: The alpha value of the model. This value is only available if the model is a Lasso or Ridge regression model. (OLS do not have alpha)coef_
: The coefficients of the model.intercept_
: The intercept of the model.feature_names_in_
**: The names of the features used to train the model.optimal_cut_off
: The optimal cut-off value determined by the specified metric.y_label_true_
: The true binary labels are generated using the default cut-off value.
Example
# Import the DualModel class
from dual_model import DualModel
# Initializing and fitting the DualModel
# 'ols' for Ordinary Least Squares, a default cut-off value is provided
# The metric parameter specifies the method to tune the optimal cut-off
dual_clf = DualModel(model_type='ols', metric='youden_index', default_cut_off=1)
dual_clf.fit(X, y)
# Accessing the true binary labels generated based on the default cut-off
y_label_true = dual_clf.y_label_true_
# Retrieving the optimal cut-off value tuned based on the Youden Index
optimal_cut_off = dual_clf.optimal_cut_off
# Predicting grades (y_pred) and binary classification (at-risk or not) based on the optimal cut-off (y_label_pred)
y_pred, y_label_pred = dual_clf.predict(X)
Examples of Model Performances Plot
# Visualizations
# Plotting the actual vs. predicted values to assess regression performance
scatter_plot_fig = plot_scatter(y_pred, y)
# Plotting the confusion matrix to evaluate binary classification performance
cm_plot = plot_cm(y_label_true, y_label_pred)
# Plotting the non-zero coefficients of the regression model to interpret feature importance
feature_plot = plot_feature_coefficients(coef=dual_clf.coef_, feature_names=dual_clf.feature_names_in_)
References:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dualPredictor-0.0.7.tar.gz
.
File metadata
- Download URL: dualPredictor-0.0.7.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 746bff25df43e17ceff2d1251678c74e9301e92927b379a9a5eead650a2f377f |
|
MD5 | 388d15e0e8519661abe5a889f39e9fb9 |
|
BLAKE2b-256 | e75451274cbeebc7d04b762bc473f017365fc936576735d4f90470be00204845 |
File details
Details for the file dualPredictor-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: dualPredictor-0.0.7-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad1d21b0b8daddd282152dc20ca36d83478cb8f5a7f2c59c2098cf664b0c377e |
|
MD5 | e5deba8f01ca02ba0cb6ce8fc29c003d |
|
BLAKE2b-256 | 466b19a6d3a6ba13a7bdfd3b691bb7a0f32f85f2d907fa0a03d29e5c17a82d28 |