Fast and robust approach to ridge regression with simultaneous estimation of model parameters and hyperparameter tuning within a Bayesian framework via expectation-maximization (EM).
Project description
fastridge
Fast and Accurate Ridge Regression via Expectation Maximization
by Shu Yu Tew, Mario Boley, Daniel F. Schmidt
The statistical performance of the ridge regression estimate for linear regression parameters fitted to a training dataset $\mathbf{X}\in\mathbb{R}^{n \times p}$, $\boldsymbol{y} \in \mathbb{R}^n$, i.e.,
$$ \hat{\boldsymbol{\beta}}\alpha = \arg\min{\boldsymbol{\beta} \in \mathbb{R}^p} \lbrace\Vert\boldsymbol{y} - \boldsymbol{X}\boldsymbol{\beta}\Vert^2 + \alpha\Vert\boldsymbol{\beta}\Vert^2\rbrace $$
strongly depends on the choice of the regularisation parameter $\alpha \in \mathbb{R}_+$. The commonly used approach to estimate the optimal value for this parameter is by leave-one-out cross-validation.
This package provides an alternative iterative algorithm based on the Bayesian formulation of ridge regression:
\begin{aligned}
\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{\beta}, \sigma^2, \tau^2 &\sim \mathrm{N}(\boldsymbol{X}\boldsymbol{\beta}, \sigma^2 \boldsymbol{I}_n)\\
\boldsymbol{\beta} \mid \sigma^2, \tau^2 &\sim \mathrm{N}(0, \tau^{-2}\sigma^{-2}\boldsymbol{I}_p)\\
\sigma^2 &\sim \sigma^{-2}\,\mathrm{d}\sigma^2\\
\tau^2 &\sim \pi(\tau^2)\,\mathrm{d}\tau^2
\end{aligned}
In particular, the package implements an expectation maximisation (EM) approach that approximates the marginal posterior mode $\arg\max_{\sigma^2, \tau^2} p(\sigma^2, \tau^2 \mid \boldsymbol{X}, \boldsymbol{y})$ by iterating the equation
$$ \sigma^2_{t+1}, \tau^2_{t+1} = \arg\min_{\sigma^2, \tau^2} \mathbb{E}_{\boldsymbol{\beta} \mid \sigma^2_t, \tau^2_t} \left[-\log\thinspace p(\boldsymbol{\beta}, \sigma^2, \tau^2)\right] $$
until a convergence criterion is met.
Usage
import numpy as np
from fastridge import RidgeEM, RidgeLOOCV
# synthetic regression data
rng = np.random.default_rng(0)
beta = np.array([1.0, -2.0, 0.5, 3.0, -1.5, 0.0])
x_train = rng.standard_normal((20, 6))
y_train = x_train @ beta + 0.1 * rng.standard_normal(20)
x_test = rng.standard_normal((1000, 6))
y_test = x_test @ beta + 0.1 * rng.standard_normal(1000)
# fit using EM approach
em = RidgeEM().fit(x_train, y_train)
y_train_em = em.predict(x_train)
y_test_em = em.predict(x_test)
print(f'EM train MSE: {np.mean((y_train - y_train_em)**2):.4f}')
print(f'EM test MSE: {np.mean((y_test - y_test_em)**2):.4f}')
print(f'EM coef_: {em.coef_}')
print(f'EM alpha_: {em.alpha_:.4f}')
print(f'EM sigma_square: {em.sigma_square_:.4f}')
# fit using fast LOOCV
cv = RidgeLOOCV().fit(x_train, y_train)
y_train_cv = cv.predict(x_train)
y_test_cv = cv.predict(x_test)
print(f'CV train MSE: {np.mean((y_train - y_train_cv)**2):.4f}')
print(f'CV test MSE: {np.mean((y_test - y_test_cv)**2):.4f}')
print(f'CV coef_: {cv.coef_}')
print(f'CV alpha_: {cv.alpha_:.4f}')
Package Installation
To install the package from pypi use
pip install fastridge
or to install directly from this repository use
pip install git+https://github.com/marioboley/fastridge.git
(pip or pip3 depending on the local Python setup.)
Project Setup
To alter the package or to run and modify the analysis code, run
pip3 install -r requirements.txt
pip3 install -e .
at the root of the repository after cloning.
The second step (local editable installation) is required so that import fastridge works for the analysis notebooks in subdirectories.
It is recommended to install package and dependencies into a dedicated virtual environment by running at the project root before the above steps:
python3 -m venv .venv
source .venv/bin/activate # or: conda create/activate for Anaconda
To test the project setup, run the test suite:
pytest
Citation
Should you find this repository helpful, please consider citing the associated paper:
@article{tew2023bayes,
title={Bayes beats cross validation: Efficient and accurate ridge regression via expectation maximization},
author={Tew, Shu Yu and Boley, Mario and Schmidt, Daniel},
journal={Advances in Neural Information Processing Systems},
volume={36},
pages={19749--19768},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastridge-1.2.0.tar.gz.
File metadata
- Download URL: fastridge-1.2.0.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ddd125b9f7b02ff237b3323bdcc73bad45dc9daca74036be5657f0d9dc5c97c
|
|
| MD5 |
1af2afcfad7a70d559f52ab991b26295
|
|
| BLAKE2b-256 |
6187457bc59c3d1abb9a733b5cf4adc0dd41c557a63cacb3de7cd840f00774b9
|
Provenance
The following attestation bundles were made for fastridge-1.2.0.tar.gz:
Publisher:
release.yml on marioboley/fastridge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastridge-1.2.0.tar.gz -
Subject digest:
2ddd125b9f7b02ff237b3323bdcc73bad45dc9daca74036be5657f0d9dc5c97c - Sigstore transparency entry: 1407576002
- Sigstore integration time:
-
Permalink:
marioboley/fastridge@ca6bc9684453da9c00b70521a2b440c398326141 -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/marioboley
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ca6bc9684453da9c00b70521a2b440c398326141 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fastridge-1.2.0-py3-none-any.whl.
File metadata
- Download URL: fastridge-1.2.0-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb84b61f94e7dd7d71b68d377162514240c0bc23a8032a75f17dfa0ec03e87ef
|
|
| MD5 |
8069847f61908a0e88e1f9369b3b0c24
|
|
| BLAKE2b-256 |
24220850f4c3101beb8e6e2b16f2c58b1bdd4916bfc61cdb6b882eafe41ffd66
|
Provenance
The following attestation bundles were made for fastridge-1.2.0-py3-none-any.whl:
Publisher:
release.yml on marioboley/fastridge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastridge-1.2.0-py3-none-any.whl -
Subject digest:
cb84b61f94e7dd7d71b68d377162514240c0bc23a8032a75f17dfa0ec03e87ef - Sigstore transparency entry: 1407576118
- Sigstore integration time:
-
Permalink:
marioboley/fastridge@ca6bc9684453da9c00b70521a2b440c398326141 -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/marioboley
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ca6bc9684453da9c00b70521a2b440c398326141 -
Trigger Event:
push
-
Statement type: