Skip to main content

A Procedure for Multicollinearity Testing using Bootstrap

Project description

Functions to detect and quantify multicollinearity via a nonparametric pairs bootstrap.

MTest reports achieved significance levels (ASL; bootstrap proportions) for two widely used rules:

  • Klein's rule: flag multicollinearity if $R^2_j > R^2_g$
  • VIF rule: flag multicollinearity if $\mathrm{VIF}_j$ is large, with $\mathrm{VIF}_j = \dfrac{1}{1 - R^2_j}$

Reference: Morales-Oñate & Morales-Oñate (2023). MTest: a Bootstrap Test for Multicollinearity. Revista Politécnica, 51(2), 53–62.
DOI: https://doi.org/10.33333/rp.vol51n2.05


What MTest does

Given a fitted linear model, MTest:

  1. Resamples rows of the model frame (pairs bootstrap) nboot times.
  2. At each bootstrap replicate, recomputes the global $R^2_g$ and the auxiliary $R^2_j$ (regressing each predictor on the rest), using the same expanded design matrix as the original fit. This is robust to log(), I(), interactions, factors, poly(), etc.
  3. Returns bootstrap distributions and ASL (bootstrap proportions) for:
    • VIF rule (threshold on $R^2_j$):

$$ \mathrm{ASL}_{\mathrm{VIF}}(j) = \mathbb{P}\big(R^2_j > c\big) $$

Example: `valor_vif = 0.90` implies a VIF cutoff of $1 / (1 - 0.90) = 10$.
  • Klein's rule:

$$ \mathrm{ASL}_{\mathrm{Klein}}(j) = \mathbb{P}\big(R^2_g < R^2_j\big). $$

These ASLs are simple bootstrap proportions of the corresponding events (no additional parametric assumptions).


Model context

Linear regression model:

$$ Y_i = \beta_0 + \beta_1 X_{1i} + \cdots + \beta_p X_{pi} + u_i, \quad i=1,\ldots,n. $$

Auxiliary regressions (one per predictor):

$$ X_{ji} = \gamma_0 + \sum_{k \ne j} \gamma_k X_{ki} + e_{ji}, \quad j=1,\ldots,p. $$

Let $R^2_g$ be the global $R^2$ and $R^2_j$ the $R^2$ of the $j$-th auxiliary regression.

Installation

pip install mtest_py

Quickstart

Example 1: Multicollinearity Test (MTest)

import pandas as pd
from mtest import mtest, mtest_summary

# Load dataset (mtcars equivalent in R)
url = "https://raw.githubusercontent.com/selva86/datasets/master/mtcars.csv"
mtcars = pd.read_csv(url)

X = mtcars[["disp", "hp", "wt", "qsec"]]   # predictors
y = mtcars["mpg"].to_numpy()               # response

# Run MTest
res = mtest(X, y, n_boot=500, r2_threshold=0.9, seed=123, add_intercept=True)

# Print results
print("R² global:", res["R2_global"])
print("VIF:", res["VIF_named"])
print("p-values VIF rule:", res["p_vif"])
print("p-values Klein rule:", res["p_klein"])

# Tabular summary
df_sum = mtest_summary(res, sort_by="VIF")
print(df_sum)

Example 2: Pairwise Kolmogorov–Smirnov Test

from mtest import pairwise_ks_test, ks_summary

X = mtcars[["disp", "hp", "wt", "qsec"]]

ks_res = pairwise_ks_test(X, alternative="greater")
summary = ks_summary(ks_res, digits=6)

print(summary["summary_text"])

API

mtest(X, y, n_boot=1000, nsam=None, r2_threshold=0.9, seed=None, return_distributions=True)
  • X: array-like (n, p) predictors. Intercept is not added automatically.
  • y: array-like (n,) response.
  • n_boot: bootstrap replicates.
  • nsam: bootstrap sample size (default: n).
  • r2_threshold: threshold on auxiliary R² used for VIF rule.
  • seed: RNG seed.
  • return_distributions: if True, returns bootstrap arrays.

Return: dict with keys

  • R2_global, R2_aux (original sample),
  • VIF (original sample),
  • B_R2_global (n_boot,),
  • B_R2_aux (n_boot, p), columns aligned with predictors,
  • p_vif (dict), p_klein (dict).

Notes

  • For the VIF rule we use Pr(R²_j > r2_threshold) — pass r2_threshold accordingly.
  • Klein's rule p-value is Pr(R²_global < R²_j) across bootstrap replicates.
  • Numerical stability: we use least squares and guard divisions-by-zero.

Citation

Morales-Oñate, V., & Morales-Oñate, B. (2023).
MTest: a Bootstrap Test for Multicollinearity. Revista Politécnica, 51(2), 53–62.
https://doi.org/10.33333/rp.vol51n2.05


License

MIT (or your package license). Include the corresponding LICENSE file in the repo.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mtest_py-0.1.4.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mtest_py-0.1.4-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file mtest_py-0.1.4.tar.gz.

File metadata

  • Download URL: mtest_py-0.1.4.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mtest_py-0.1.4.tar.gz
Algorithm Hash digest
SHA256 177ae4a5882c51f13bf971f15249d45f377cebe4d74eaa4f8126965bb1cf8c9f
MD5 3416c74204360cc8848f2792ec73ca31
BLAKE2b-256 810b3e7143c0916a093986541494cf8ded9a185abb867d81a23f0579d22f8054

See more details on using hashes here.

File details

Details for the file mtest_py-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: mtest_py-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mtest_py-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 858d62ed77a0cb738053aaf586d50234bc6f2d4f2db7fb2e56617ba748108130
MD5 8a7bc631ebe6d7d67941c39fbb0dee8d
BLAKE2b-256 3744459a7f0ac4e115fe77043acca1b246fd77e478b4148bcf44af494f3916ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page