Skip to main content

A package containing functions to test linearity assumptions for linear regression performed on single or multiple linear regression for a specified dataset

Project description

lrasm

Package for testing linear and multiple linear regression assumptions

This package is built to contain functions to be able to quickly and easily test the linearity assumptions befre preforming linear regression or multiple linear regression for a specified dataset.

The three assumptions should be satisfied to ensure the effectiveness of a linear regression model for a particular dataset and are described as follows:

  • No Multicollinearity: individual predictors within a model should not be linearly correlated to avoid unstable linear estimators

  • Constant Variance of Residuals (homoscedasticity): Since data should be individually and identically distributed, the residuals should be independent of fitted values

  • Normality of residuals: Since the conditional expectation of the predicted value should be normal, the error terms of the resulting model should also be normally distributed

The package contains 3 functions one for checking multicolliniarity, one for checking constant variance and one for checking normality in the residuals.

Function 1: Multicolliniarity.

  • Takes in a pandas dataframe and a VIF threshold and checks if any of the calculated vif values exceed the given threshold. If so, the function will advise the user that this assumtion is violated, and vice versa.

  • Returns the Calculated VIF values and a statement telling the user whether or not the assumpton is violated.

Function 2: Constant Variance.

  • Takes in a pandas dataframe containing predictors, a pandas series containing the response, and a variability threshold and checks if the variabiliy of the residuals is contant by comparing it to the given threshold. If the threshold is exceeded the function will advise the user that this assumtion is violated, and vice versa.

  • Returns a plot of the fitted values vs residuals, the calculated variability value and a statement telling the user whether or not the assumpton is violated.

Function 3: Normality.

  • Takes in a pandas dataframe containing predictors, a pandas series containing the response, and a P-value threshold, and preforms a shapiro wilk test for normality. If the P-value of the test does not exceed the threshold, the function will advise the user that this assumtion is violated, and vice versa.

  • Returns the Calculated P-value and a statement telling the user whether or not the assumpton is violated.

Installation

$ pip install git+https://github.com/UBC-MDS/lrasm

Usage

Examples for usage and further documentation on ReadtheDocs can be found here: https://lrasm.readthedocs.io/en/latest/

lrasm can be used to check linear regression assumptions as follows:

from lrasm.homoscedasticity_tst import homoscedasticity_test
from lrasm.multicollinearity_tst import multicollinearity_test
from lrasm.normality_tst import normality_test
from sklearn import datasets
import pandas as pd
import matplotlib.pyplot as plt

# Import/Process Test data:

data = datasets.load_iris()
iris_df = pd.DataFrame(data=data.data, columns=data.feature_names)
X = iris_df.drop("sepal width (cm)", axis = 1)
y = iris_df["petal width (cm)"]

# Test for Normality:

p_value, res = normality_test(X, y)

# Test for Homoscedasticity:

corr_df, plot = homoscedasticity_test(X, y)

# Test for Multicollinearity:

vif_df = multicollinearity_test(X, VIF_thresh = 10)

Ecosystem

As of January 2022, there are no other packages that we have found which explicitly evaluate the assumptions made by linear regression. The LR_assumption_test package seeks to fill in this gap and build upon existing python packages. This package aggregates the functions offered by scikit-learn, statsmodels, scipy.stats, matplotlib and more, seeking to build upon them for the purpose of evaluating linear regression models. This is intended to make it more accessible for users to access the functionality of the previously mentioned packages, as well as improve the clarity of results.

Contributing

Authors: Yair Guterman, Hatef Rahmani, Song Bo Andy Yang
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

lrasm was created by Yair Guterman, Hatef Rahmani, Song Bo Andy Yang . It is licensed under the terms of the MIT license.

Credits

lrasm was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lrasm-0.1.3.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

lrasm-0.1.3-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file lrasm-0.1.3.tar.gz.

File metadata

  • Download URL: lrasm-0.1.3.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for lrasm-0.1.3.tar.gz
Algorithm Hash digest
SHA256 c53b1c314f48d0ba99384124690275b2a03d70b6103d4dbb7c9af2f7ae21468b
MD5 52c51c4c3366d06a71a797f5fe0e230b
BLAKE2b-256 2ceafe689c3c402b496a7b5c6251e40f6e59122c788b327b0471fd668df36eb7

See more details on using hashes here.

File details

Details for the file lrasm-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: lrasm-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for lrasm-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 de8144d0bd9c482b1c265f084c5b87878435b853886780e8ddbc942227ad12d8
MD5 0f43ab8d85db492ba68f879dcdc67905
BLAKE2b-256 1656490ee0ffc8e542214f6ecd3870cac869c661414d8ae3452ccedc8cb3b07f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page