A Python package for simultaneous regression and binary classification for educational analytics.

These details have not been verified by PyPI

Project links

Homepage

Project description

An Open-Source Tool for Simultaneous Grade Prediction and At-risk Student Identification

GitHub forks GitHub top language

This Python package, based on the research paper "Early Detecting and Supporting At-Risk University Students through Data Analytics and Intervention", integrates regression analysis with binary classification to predict student academic outcomes. Designed for ease of use, this package allows educators to train models, make predictions, and visualize results with just one line of code using their own datasets. This accessibility ensures that sophisticated algorithms are readily available to users with varying levels of IT expertise.

Package Links:

Python Package Index (PyPI): dualPredictor on PyPI
GitHub Repository: dualPredictor on GitHub

0. Package Installation

This package requires:

Python (>= 3.9)
NumPy
scikit-learn
Matplotlib
Seaborn

Install dependencies:

pip install numpy scikit-learn matplotlib seaborn

Install the package via PyPI or GitHub (Recommended):

pip install dualPredictor

pip install git+https://github.com/098765d/dualPredictor.git

1. Introduction

The package enables educators to predict student academic outcomes and identify at-risk students with ease. The following steps outline the process:

Step 1: Grade Prediction Using the Trained Regressor (Fig 1, Step 1) fit the linear model f(x) using the training data, and grade prediction can be generated from the fitted model
```
    y\_pred = f(x) = \sum_{j=1}^{M} w_j x_j + b 
```
Step 2: Determining the Optimal Cut-off (Fig 1, Step 2)

The goal is to find the cut-off (c) that maximizes the binary classification accuracy. Firstly, the user specifies the metric type used for the model (e.g., Youden index) and denotes the metric function as g(y_true_label, y_pred_label), where:
```
\text{optimal\_cut\_off} = \arg\max_c g(y_{\text{true\_label}}, y_{\text{pred\_label}}(c))
```
This formula searches for the cut-off value that produces the highest value of the metric function g, where:
- c: The tunned cut-off that determines the y_pred_label
- y_true_label: True label of the data point based on the default cut-off (e.g., 1 for at-risk, 0 for normal)
- y_pred_label: Predicted label of the data point based on the tunned cut-off value
Step 3: Binary Label Prediction: (Fig 1, Step 3)
- y_pred_label = 1 (at-risk): if y_pred < optimal_cut_off
- y_pred_label = 0 (normal): if y_pred >= optimal_cut_off

Fig 1: How does dualPredictor provide dual prediction output?

2. The Model Object (Parameters, Methods, and Attributes)

The dualPredictor package aims to simplify complex models for users of all coding levels. It adheres to the syntax of the scikit-learn library and simplifies model training by allowing you to fit the model with just one line of code. The core part of the package is the model object called DualModel, which can be imported from the dualPredictor library.

Table 1: Model Parameters, Methods, and Attributes

Category	Name	Description
Parameters	`model_type`	Type of regression model to use. For example: - `'lasso'` (Lasso regression)
	`metric`	Metric is used to optimize the cut-off value. For example: - `'youden_index'` (Youden's Index)
	`default_cut_off`	Initial cut-off value used for binary classification. For example: 2.50
Methods	`fit(X, y)`	- X: The input training data, pandas data frame. - y: The target values (predicted grade). - Returns: Fitted DualModel instance
	`predict(X)`	- X: The input data for predeiction, pandas data frame.
Attributes	`alpha_`	The value of penalization in Lasso model
	`coef_`	The coefficients of the model
	`intercept_`	The intercept value of the model
	`feature_names_in_`	Names of features during model training
	`optimal_cut_off`	The optimal cut-off value that maximizes the metric

Demonstration of Model Object Usage

from dualPredictor import DualModel

# Initialize the model and specify the parameters
model = DualModel(model_type='lasso', metric='youden_index', default_cut_off=2.5)

# Using model methods for training and predicting
# Simplify model training by calling fit method with one line of code
model.fit(X_train, y_train)
grade_predictions, class_predictions = model.predict(X_train)

# Accessing model attributes (synthetic result for demo only)
print("Alpha (regularization strength):", model.alpha_)
Alpha (regularization strength): 0.12

print("Model coefficients:", model.coef_)
Model coefficients: [0.2, -0.1, 0.3, 0.4]

print("Model intercept:", model.intercept_)
Model intercept: 2.5

print("Feature names:", model.feature_names_in_)
Feature names: ['feature1', 'feature2', 'feature3', 'feature4']

print("Optimal cut-off value:", model.optimal_cut_off)
Optimal cut-off value: 2.56

3. Quick Start

Note: Results are synthetic and for demonstration purposes only

Step 0. Prepare your Dataset: Prepare the X_train, X_test, y_train, y_test

Step 1. Import the Package: Import the dualPredictor package into your Python environment.

from dualPredictor import DualModel, model_plot

Step 2. Model Initialization: Create a DualModel instance

model = DualModel(model_type='lasso', metric='youden_index', default_cut_off=2.5)

Step 3. Model Training: Fit the model using X_train & y_train

model.fit(X_train, y_train)

Step 4. Model Predictions: Generate predictions on X_test

# example for demo only, model prediction dual output
y_test_pred,y_test_label_pred = model.predict(X_test)

# Example of model's 1st output = predicted scores (regression result)
y_test_pred
array([3.11893389, 3.06013236, 3.05418893, 3.09776197, 3.14898782,
     2.37679417, 2.99367804, 2.77202421, 2.9603209 , 3.01052573])

# Example of model's 2nd output = predicted at-risk status (binary label)
y_test_label_pred
array([0, 0, 0, 0, 0, 1, 0, 0, 1, 0])

Step 5.Visualization: Visualize the model's performance with just one line of code

# Scatter plot for regression analysis 
model_plot.plot_scatter(y_pred, y_true)

# Confusion matrix for binary classification 
model_plot.plot_cm(y_label_true, y_label_pred)

# Model's global explanation: Feature importance plot
model_plot.plot_feature_coefficients(coef=model.coef_, feature_names=model.feature_names_in_)

Fig 2: Visualization Module Sample Outputs

Additional Demonstration

Applied on Kaggle Dataset: Object Oriented Programming Class Student Grades data from Mugla Sitki Kocman University ('19 OOP Class Student Grades).

References

[1] Fluss, R., Faraggi, D., & Reiser, B. (2005). Estimation of the Youden Index and its associated cutoff point. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 47(4), 458-472.

[2] Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.

[3] Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.

[4] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825-2830.

[5] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.31

Sep 9, 2024

0.0.30

Jul 25, 2024

0.0.29

Jul 22, 2024

0.0.28

Jul 22, 2024

0.0.27

Jul 17, 2024

This version

0.0.26

Jul 16, 2024

0.0.25

Jul 15, 2024

0.0.23

Jul 15, 2024

0.0.22

Jul 14, 2024

0.0.21

Jul 14, 2024

0.0.20

Jul 14, 2024

0.0.19

Jul 11, 2024

0.0.18

Jul 11, 2024

0.0.17

Jul 11, 2024

0.0.16

Apr 29, 2024

0.0.15

Apr 19, 2024

0.0.14

Apr 18, 2024

0.0.13

Apr 18, 2024

0.0.12

Apr 16, 2024

0.0.11

Apr 16, 2024

0.0.10

Apr 12, 2024

0.0.9

Apr 12, 2024

0.0.8

Apr 10, 2024

0.0.7

Mar 23, 2024

0.0.6

Mar 20, 2024

0.0.5

Mar 20, 2024

0.0.4

Mar 19, 2024

0.0.3

Mar 19, 2024

0.0.2

Mar 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dualpredictor-0.0.26.tar.gz (15.3 kB view details)

Uploaded Jul 16, 2024 Source

Built Distribution

dualPredictor-0.0.26-py3-none-any.whl (12.9 kB view details)

Uploaded Jul 16, 2024 Python 3

File details

Details for the file dualpredictor-0.0.26.tar.gz.

File metadata

Download URL: dualpredictor-0.0.26.tar.gz
Upload date: Jul 16, 2024
Size: 15.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for dualpredictor-0.0.26.tar.gz
Algorithm	Hash digest
SHA256	`84db24d1a4c595e6fb3b71c13ab2943db50a74b1ebab0bc8743b38afd220e6f0`
MD5	`bc0345112c65f4815ddc0a74cbc325b4`
BLAKE2b-256	`1a3b963ea0fd863c727e845ba545751b6fb533e9457d574b98676681437a4a14`

See more details on using hashes here.

File details

Details for the file dualPredictor-0.0.26-py3-none-any.whl.

File metadata

Download URL: dualPredictor-0.0.26-py3-none-any.whl
Upload date: Jul 16, 2024
Size: 12.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for dualPredictor-0.0.26-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dc130ac50403fbf1122daf03597e68ecb4b463dec41262586e392e55d5f45610`
MD5	`15ad7a2ea1a8fbf49326486f39a86700`
BLAKE2b-256	`92daaf74bbebf0b1ef28effa39811b15705e06af78605758e4fccc30eac208e7`