Skip to main content

A Python package for simultaneous regression and binary classification for educational analytics.

Project description

dualPredictor

by D

dualPredictor is a Python package that can provide simultaneous regression and binary classification results for tabular datasets.

  • Simultaneous Predictions: A model that perform regression and binary classification tasks simultaneously
  • Regressor Selection (choose one): Choose from Lasso, Ridge, or LinearRegression(OLS) as the base regression model.
  • Dynamic Cutoff Tuning metrics (choose one): Automatically tunes the cutoff value to maximize the Youden index, F1, or F2 score. Users can choose a metrics type.

1. Youden Index (J)

$$J= Recall + Specificity - 1$$ J is a measure of the overall performance of a binary classifier. It is calculated as the sum of the recall and specificity minus 1. A high J statistic indicates that the classifier performs well on positive and negative cases.

  • Recall measures a classifier's ability to identify positive cases correctly. A high recall means that the classifier is avoiding miss detects.
  • Specificity measures the ability of a classifier to identify negative cases correctly. A high specificity means that the classifier is avoiding false alarms.

2. F scores (Option F1, F2 in Package)

$$F1 = 2 \cdot \frac{precision \cdot recall}{precision + recall}$$

F1 score is another measure of the overall performance of a binary classifier. It is calculated as the harmonic mean of the precision and recall. A high F1 score indicates that the classifier is performing well on both positive and negative cases.

$$F_\beta = (1 + \beta^2) \cdot \frac{precision \cdot recall}{\beta^2 \cdot precision + recall}$$ F-score with factor beta is a generalization of the F1 score that allows for different weights to be given to precision and recall. A beta value less than 1 indicates that the F-score is prone to precision, while a beta value greater than 1 indicates that the F-score is prone to recall.

Installation

Install dualPredictor directly from PyPI using pip:

pip install dualPredictor

or Directly install from the Github Repo:

pip install git+https://github.com/098765d/dualPredictor.git

Dependencies dualPredictor requires:

  • numpy
  • scikit-learn
  • matplotlib
  • seaborn

DualModel

The DualModel class is a custom regressor that combines a base regression model (lasso, ridge, or OLS) with a dual classification approach. It allows for tuning an optimal cut-off value to classify samples into two classes based on the predicted regression values.

Parameters

  • model_type (str, default='lasso'): The base regression model to use. Supported options are 'lasso', 'ridge', and 'ols' (Ordinary Least Squares).

  • metric (str, default='youden_index'): The metric used to tune the optimal cut-off value. Supported options are 'f1_score', 'f2_score', and 'youden_index'.

  • default_cut_off (float, default=0.5): The default cut-off value used to create binary labels. Samples with regression values below the cut-off are labeled as 0, and samples above or equal to the cut-off are labeled as 1.

Methods

  • fit(X, y): Fit the DualModel to the training data.

    • Parameters:

      • X (array-like of shape (n_samples, n_features)): The input training data.
      • y (array-like of shape (n_samples,)): The target values.
    • Returns:

      • self: Fitted DualModel instance.
  • predict(X): Predict the input data's regression values and binary classifications.

    • Parameters:

      • X (array-like of shape (n_samples, n_features)): The input data for prediction.
    • Returns:

      • grade_predictions (array-like of shape (n_samples,)): The predicted regression values.
      • class_predictions (array-like of shape (n_samples,)): The predicted binary classifications based on the optimal cut-off.

Attributes

  • alpha_: The alpha value of the model. This value is only available if the model is a Lasso or Ridge regression model. (OLS do not have alpha)
  • coef_: The coefficients of the model.
  • intercept_: The intercept of the model.
  • feature_names_in_**: The names of the features used to train the model.
  • optimal_cut_off: The optimal cut-off value determined by the specified metric.
  • y_label_true_: The true binary labels are generated using the default cut-off value.

Example

# Import the DualModel class
from dual_model import DualModel

# Initializing and fitting the DualModel
# 'ols' for Ordinary Least Squares, a default cut-off value is provided
# The metric parameter specifies the method to tune the optimal cut-off
dual_clf = DualModel(model_type='ols', metric='youden_index', default_cut_off=1)
dual_clf.fit(X, y)

# Accessing the true binary labels generated based on the default cut-off
y_label_true = dual_clf.y_label_true_

# Retrieving the optimal cut-off value tuned based on the Youden Index
optimal_cut_off = dual_clf.optimal_cut_off

# Predicting grades (y_pred) and binary classification (at-risk or not) based on the optimal cut-off (y_label_pred)
y_pred, y_label_pred = dual_clf.predict(X)

Examples of Model Performances Plot

# Visualizations
# Plotting the actual vs. predicted values to assess regression performance
scatter_plot_fig = plot_scatter(y_pred, y)

# Plotting the confusion matrix to evaluate binary classification performance
cm_plot = plot_cm(y_label_true, y_label_pred)

# Plotting the non-zero coefficients of the regression model to interpret feature importance
feature_plot = plot_feature_coefficients(coef=dual_clf.coef_, feature_names=dual_clf.feature_names_in_)

References:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dualPredictor-0.0.7.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

dualPredictor-0.0.7-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file dualPredictor-0.0.7.tar.gz.

File metadata

  • Download URL: dualPredictor-0.0.7.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for dualPredictor-0.0.7.tar.gz
Algorithm Hash digest
SHA256 746bff25df43e17ceff2d1251678c74e9301e92927b379a9a5eead650a2f377f
MD5 388d15e0e8519661abe5a889f39e9fb9
BLAKE2b-256 e75451274cbeebc7d04b762bc473f017365fc936576735d4f90470be00204845

See more details on using hashes here.

File details

Details for the file dualPredictor-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for dualPredictor-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 ad1d21b0b8daddd282152dc20ca36d83478cb8f5a7f2c59c2098cf664b0c377e
MD5 e5deba8f01ca02ba0cb6ce8fc29c003d
BLAKE2b-256 466b19a6d3a6ba13a7bdfd3b691bb7a0f32f85f2d907fa0a03d29e5c17a82d28

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page