simple linear regression quality

Project description

Simple Linear Regression:

An analysis of the quality of the regression is carried out. First, simple linear regression assumptions are implemented. The assumption of outliers is made by eliminating samples whose absolute value of standardized and studentized residuals is greater than 3. The assumption of normality is carried out with the Shapiro-Wilk statistical test. The Breusch-Pagan statistical test is used for homoscedasticity. The Durbin-Watson test is used for the assumption of independence and the F test for linearity. Regarding the quality of the simple linear regression, the dynamic range is calculated as the difference of the highest and lowest value of the response variable y. The sensitivity is obtained from ordinary least squares (OLS). The resolution is known through the statistical method of ANOVA to determine if there is a significant difference between the two consecutive values of the variable to be predicted with the smallest difference. Cross validation with k=10 and rmse as a metric is used to calculate accuracy.

methodology

Figure 1. Flowchart of the proposed methodology.

Simple linear regression assumptions
Simple linear regression quality
Database structure
Installation
Code example

Simple linear regression assumptions

Outlier: The term anomaly indicates that there is data that deviates significantly from the rest.
Normality: refers to the normal distribution of errors or residuals.
Homoscedasticity: is another simple linear regression assumption and indicates whether the variance of the residuals is the same across different groups in the database.
Independence: refers to the absence of temporal correlation between residuals.
Linearity: is associated with the presence of a constant change of the variable to be predicted with respect to the predictor.

Simple linear regression quality

Dynamic range: is defined as the range of values of the variable to be predicted within which linearity exists.
Sensitivity: is defined as the value of the change in the variable to be predicted with respect to the predictor.
Resolution: is the ability of the measurement system to faithfully detect and indicate small changes in the characteristics of the measurement result.
Accuracy: is the degree of agreement between the result of a measurement and a true value of the measurand.

Database structure

The "regression_quality" program works with two databases. The first database contains all repetitions for the variable X (see Figure 2(a)), and the second database contains all repetitions for the variable Y (see Figure 2(b)). Figure 2 illustrates an example of how to organize the data to use the program effectively.

database

Figure 2. Example: (a) database for X, and (b) database for Y.

Installation

Instructions on how to install the project. For example:

pip install sl-regression-quality

Code example

For instance, the following code can be executed in Google Colab. Simply copy and paste it into a new Colab notebook.

#--------------------------------------------------------------------------------
# 1) Load libraries:
import pandas as pd
from sl_regression_quality.main_routine import regression_quality
from sl_regression_quality.load_data import load_csv

#--------------------------------------------------------------------------------
# 2) Load data . 
# 2 a) uncomment the following line to load data included in the project (as an example)

#dataset_x = load_csv('data_x_example.csv') # example for your data (uncomment line)
#dataset_y = load_csv('data_y_example.csv') # example for your data (uncomment line)

# 2 a) uncomment the following line to load the user's data by using the .csv file (described in the Database Structure section)

#dataset_x = load_csv('your_data.csv') # example for your data (uncomment line)
#dataset_y = load_csv('your_data.csv') # example for your data (uncomment line)


alpha = 0.05 # significance level
dL = 1.055 # dL
dU = 1.211 # dU

#--------------------------------------------------------------------------------
# 3) Run analysis
regression_quality(dataset_x,dataset_y,alpha,dL,dU)

Project details

Release history Release notifications | RSS feed

0.4.5

Oct 6, 2024

0.4.4

Oct 6, 2024

0.4.3

Sep 29, 2024

0.4.2

Sep 29, 2024

0.4.1

Sep 29, 2024

0.4.0

Sep 29, 2024

0.3.9

Jul 18, 2024

0.3.8

Jul 18, 2024

This version

0.3.7

Jul 18, 2024

0.3.6

Jul 18, 2024

0.3.5

Jul 18, 2024

0.3.4

Jul 18, 2024

0.3.3

Jul 12, 2024

0.3.1

Jul 12, 2024

0.3.0

Jul 12, 2024

0.2.3

Jul 11, 2024

0.2.2

Jul 11, 2024

0.2.1

Jul 11, 2024

0.2.0

Jul 11, 2024

0.1.3

Jul 11, 2024

0.1.2

Jul 11, 2024

0.1.1

Jul 11, 2024

0.1.0

Jul 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

sl_regression_quality-0.3.7-py3-none-any.whl (20.6 kB view details)

Uploaded Jul 18, 2024 Python 3

File details

Details for the file sl_regression_quality-0.3.7-py3-none-any.whl.

File metadata

Download URL: sl_regression_quality-0.3.7-py3-none-any.whl
Upload date: Jul 18, 2024
Size: 20.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for sl_regression_quality-0.3.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f8f974a25eb1eeb101eefe32f6371c9bf20dec2eef2e1b6aadacda42392787e3`
MD5	`ad9f2decf043f05cf6c352ee6ad3e53a`
BLAKE2b-256	`432a47c697d25852d6532e0bdda08353d8eadf6e455afc153e79311105c81489`