Skip to main content

A packge for solving k-sparse ridge regression certifiably!

Project description

OKRidge

docs pypi license Downloads downloads arxiv badge

This repository contains source code to our NeurIPS 2023 paper:

OKRidge: Scalable Optimal k-Sparse Ridge Regression

Table of Content

Introduction

We consider the following optimization problem:

$$\min_{\mathbf{\beta}} \sum_{i=1}^n (y_i - \mathbf{x}_i^T \mathbf{\beta})^2 + \lambda_2 \lVert \mathbf{\beta} \rVert_2^2 \quad \text{s.t.} \quad \lVert \mathbf{\beta} \rVert_0 \leq k$$

Optimal k-sparse ridge regression is a crucial ML problem that has many applications in statistics, machine learning, and data mining. However, the problem is NP-hard, and existing algorithms are either slow (using commercial MIP solvers) or suboptimal (using convex or nonconvex regularizers to approximate $\ell_0$). We propose a novel algorithm, OKRidge, that can solve the problem to provable optimality in a scalable manner.

OKRidge is based on the branch-and-bound framework. The insight leading to the computational efficiency comes from a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search.

Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.

Installation

$ pip install okridge

Usage

Please see the example.ipynb jupyter notebook on GitHub for a detailed tutorial on how to use OKRidge in a python environment.

k = 10 # cardinality constraint
lambda2 = 0.1 # l2 regularization parameter
gap_tol = 1e-4 # optimality gap tolerance
verbose = True # print out the progress
time_limit = 180 # time limit in seconds

BnB_optimizer = BNBTree(X=X, y=y, lambda2=lambda2)

upper_bound, betas, optimality_gap, max_lower_bound, running_time = BnB_optimizer.solve(k = k, gap_tol = gap_tol, verbose = verbose, time_limit = time_limit)

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

okridge was created by Jiachang Liu. It is licensed under the terms of the BSD 3-Clause license.

Credits

okridge was created with cookiecutter and the py-pkgs-cookiecutter template.

Citing Our Work

If you find our work useful in your research, please consider citing the following paper:

@inproceedings{liu2023okridge,
  title={OKRidge: Scalable Optimal k-Sparse Ridge Regression},
  author={Liu, Jiachang and Rosen, Sam and Zhong, Chudi and Rudin, Cynthia},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okridge-0.1.1.tar.gz (16.5 kB view hashes)

Uploaded Source

Built Distribution

okridge-0.1.1-py3-none-any.whl (16.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page