Skip to main content

A Python package for simultaneous regression and binary classification for educational analytics.

Project description

dualPredictor

dualPredictor is an innovative Python package designed for educational analytics, offering a novel approach by providing simultaneous regression and binary classification results. Utilizing well-established regression models such as Lasso, Ridge, and OLS (Ordinary Least Squares) from scikit-learn, dualPredictor goes a step further by introducing a cutoff-based binary classification. This dual approach allows users not only to predict student grades but also to identify at-risk students efficiently, bridging the gap between traditional regression and classification methods.

Features

  • Simultaneous Predictions: Seamlessly perform regression and binary classification in a single step.
  • Flexible Model Selection: Choose from LassoCV, RidgeCV, or LinearRegression as the base regression model.
  • Dynamic Cutoff Tuning: Automatically tunes the cutoff value to maximize the Youden index, F1 score, or F2 score, making it particularly suited for educational settings where identifying at-risk students is crucial.
  • Ease of Use: Designed to follow scikit-learn's familiar API, making it accessible for both beginners and experts in machine learning.

Installation

Install dualPredictor directly from PyPI using pip:

pip install dualPredictor

Example Usage

from sklearn.datasets import fetch_california_housing
from dualPredictor.dual_model import DualModel
from dualPredictor.model_plot import plot_scatter,plot_feature_coefficients,plot_cm

# Fetching a dataset from scikit-learn for demonstration purposes
housing = fetch_california_housing(as_frame=True)
y = housing.target  # Target variable (e.g., housing prices)
X = housing.data  # Feature matrix

# Initializing and fitting the DualModel
# 'ols' for Ordinary Least Squares, a default cut-off value is provided
# The metric parameter specifies the method to tune the optimal cut-off
dual_clf = DualModel(model_type='ols', default_cut_off=2.5)
dual_clf.fit(X, y, metric='youden_index')

# Accessing the true binary labels generated based on the default cut-off
y_label_true = dual_clf.y_label_true_
# Retrieving the optimal cut-off value tuned based on the Youden Index
optimal_cut_off = dual_clf.optimal_cut_off

# Predicting grades and binary classification (at-risk or not) based on the optimal cut-off
y_pred, y_label_pred = dual_clf.predict(X)

# Visualizations
# Plotting the actual vs. predicted values to assess regression performance
scatter_plot_fig = plot_scatter(y_pred, y)
# Plotting the confusion matrix to evaluate binary classification performance
cm_plot = plot_cm(y_label_true, y_label_pred)
# Plotting the non-zero coefficients of the regression model to interpret feature importance
feature_plot = plot_feature_coefficients(coef=dual_clf.coef_, feature_names=dual_clf.feature_names_in_)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dualPredictor-0.0.2.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

dualPredictor-0.0.2-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file dualPredictor-0.0.2.tar.gz.

File metadata

  • Download URL: dualPredictor-0.0.2.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for dualPredictor-0.0.2.tar.gz
Algorithm Hash digest
SHA256 6ee2a86317b595987deaba04c3672d342ea8792e95841c19851262381117eb00
MD5 7770efc988d90ec9c6a6e700000b5ae5
BLAKE2b-256 15286c120d3e848e1dea1706ab3d256295db2db1d7cfbd538c878b6547321f4c

See more details on using hashes here.

File details

Details for the file dualPredictor-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for dualPredictor-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 94f3d2c60eca180c885e05b722604e8501b9625599f8f68e18bfab1305fc0496
MD5 e70f9ad8bb19fdbcbafa915c484cfca6
BLAKE2b-256 a7b2db5c5f0e414a10cf50989b4c682671db747482763c8c70465417536a8e9d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page