Skip to main content

Fast and robust approach to ridge regression with simultaneous estimation of model parameters and hyperparameter tuning within a Bayesian framework via expectation-maximization (EM).

Project description

fastridge

Fast and Accurate Ridge Regression via Expectation Maximization

Tests

by Shu Yu Tew, Mario Boley, Daniel F. Schmidt


The statistical performance of the ridge regression estimate for linear regression parameters fitted to a training dataset $\mathbf{X}\in\mathbb{R}^{n \times p}$, $\boldsymbol{y} \in \mathbb{R}^n$, i.e.,

$$ \hat{\boldsymbol{\beta}}\alpha = \arg\min{\boldsymbol{\beta} \in \mathbb{R}^p} \lbrace\Vert\boldsymbol{y} - \boldsymbol{X}\boldsymbol{\beta}\Vert^2 + \alpha\Vert\boldsymbol{\beta}\Vert^2\rbrace $$

strongly depends on the choice of the regularisation parameter $\alpha \in \mathbb{R}_+$. The commonly used approach to estimate the optimal value for this parameter is by leave-one-out cross-validation.

This package provides an alternative iterative algorithm based on the Bayesian formulation of ridge regression:

\begin{aligned}
\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{\beta}, \sigma^2, \tau^2 &\sim \mathrm{N}(\boldsymbol{X}\boldsymbol{\beta}, \sigma^2 \boldsymbol{I}_n)\\
\boldsymbol{\beta} \mid \sigma^2, \tau^2 &\sim \mathrm{N}(0, \tau^{-2}\sigma^{-2}\boldsymbol{I}_p)\\
\sigma^2 &\sim \sigma^{-2}\,\mathrm{d}\sigma^2\\
\tau^2 &\sim \pi(\tau^2)\,\mathrm{d}\tau^2
\end{aligned}

In particular, the package implements an expectation maximisation (EM) approach that approximates the marginal posterior mode $\arg\max_{\sigma^2, \tau^2} p(\sigma^2, \tau^2 \mid \boldsymbol{X}, \boldsymbol{y})$ by iterating the equation

$$ \sigma^2_{t+1}, \tau^2_{t+1} = \arg\min_{\sigma^2, \tau^2} \mathbb{E}_{\boldsymbol{\beta} \mid \sigma^2_t, \tau^2_t} \left[-\log\thinspace p(\boldsymbol{\beta}, \sigma^2, \tau^2)\right] $$

until a convergence criterion is met.

Usage

import numpy as np
from fastridge import RidgeEM, RidgeLOOCV

# synthetic regression data
rng = np.random.default_rng(0)
beta = np.array([1.0, -2.0, 0.5, 3.0, -1.5, 0.0])
x_train = rng.standard_normal((20, 6))
y_train = x_train @ beta + 0.1 * rng.standard_normal(20)
x_test = rng.standard_normal((1000, 6))
y_test = x_test @ beta + 0.1 * rng.standard_normal(1000)

# fit using EM approach 
em = RidgeEM().fit(x_train, y_train)
y_train_em = em.predict(x_train)
y_test_em = em.predict(x_test)
print(f'EM   train MSE: {np.mean((y_train - y_train_em)**2):.4f}')
print(f'EM   test MSE:  {np.mean((y_test - y_test_em)**2):.4f}')
print(f'EM   coef_:         {em.coef_}')
print(f'EM   alpha_:        {em.alpha_:.4f}')
print(f'EM   sigma_square:  {em.sigma_square_:.4f}')

# fit using fast LOOCV
cv = RidgeLOOCV().fit(x_train, y_train)
y_train_cv = cv.predict(x_train)
y_test_cv = cv.predict(x_test)
print(f'CV   train MSE: {np.mean((y_train - y_train_cv)**2):.4f}')
print(f'CV   test MSE:  {np.mean((y_test - y_test_cv)**2):.4f}')
print(f'CV   coef_:    {cv.coef_}')
print(f'CV   alpha_:   {cv.alpha_:.4f}')

Package Installation

To install the package from pypi use

pip install fastridge

or to install directly from this repository use

pip install git+https://github.com/marioboley/fastridge.git

(pip or pip3 depending on the local Python setup.)

Project Setup

To alter the package or to run and modify the analysis code, run

pip3 install -r requirements.txt
pip3 install -e .

at the root of the repository after cloning.

The second step (local editable installation) is required so that import fastridge works for the analysis notebooks in subdirectories.

It is recommended to install package and dependencies into a dedicated virtual environment by running at the project root before the above steps:

python3 -m venv .venv
source .venv/bin/activate   # or: conda create/activate for Anaconda

To test the project setup, run the test suite:

pytest

Citation

Should you find this repository helpful, please consider citing the associated paper:

@article{tew2023bayes,
  title={Bayes beats cross validation: Efficient and accurate ridge regression via expectation maximization},
  author={Tew, Shu Yu and Boley, Mario and Schmidt, Daniel},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  pages={19749--19768},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastridge-1.2.0.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastridge-1.2.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file fastridge-1.2.0.tar.gz.

File metadata

  • Download URL: fastridge-1.2.0.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastridge-1.2.0.tar.gz
Algorithm Hash digest
SHA256 2ddd125b9f7b02ff237b3323bdcc73bad45dc9daca74036be5657f0d9dc5c97c
MD5 1af2afcfad7a70d559f52ab991b26295
BLAKE2b-256 6187457bc59c3d1abb9a733b5cf4adc0dd41c557a63cacb3de7cd840f00774b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastridge-1.2.0.tar.gz:

Publisher: release.yml on marioboley/fastridge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastridge-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: fastridge-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastridge-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cb84b61f94e7dd7d71b68d377162514240c0bc23a8032a75f17dfa0ec03e87ef
MD5 8069847f61908a0e88e1f9369b3b0c24
BLAKE2b-256 24220850f4c3101beb8e6e2b16f2c58b1bdd4916bfc61cdb6b882eafe41ffd66

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastridge-1.2.0-py3-none-any.whl:

Publisher: release.yml on marioboley/fastridge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page