Skip to main content

Fast group lasso regularised linear models in a sklearn-style API.

Project description

PyPI Downloads https://travis-ci.org/yngvem/group-lasso.svg?branch=master https://coveralls.io/repos/github/yngvem/group-lasso/badge.svg https://readthedocs.org/projects/group-lasso/badge/?version=latest https://img.shields.io/pypi/l/group-lasso.svg https://img.shields.io/badge/code%20style-black-000000.svg CodeFactor

The group lasso [1] regulariser is a well known method to achieve structured sparsity in machine learning and statistics. The idea is to create non-overlapping groups of covariates, and recover regression weights in which only a sparse set of these covariate groups have non-zero components.

There are several reasons for why this might be a good idea. Say for example that we have a set of sensors and each of these sensors generate five measurements. We don’t want to maintain an unneccesary number of sensors. If we try normal LASSO regression, then we will get sparse components. However, these sparse components might not correspond to a sparse set of sensors, since they each generate five measurements. If we instead use group LASSO with measurements grouped by which sensor they were measured by, then we will get a sparse set of sensors.

An extension of the group lasso regulariser is the sparse group lasso regulariser [2], which imposes both group-wise sparsity and coefficient-wise sparsity. This is done by combining the group lasso penalty with the traditional lasso penalty. In this library, I have implemented an efficient sparse group lasso solver being fully scikit-learn API compliant.

About this project

This project is developed by Yngve Mardal Moe and released under an MIT lisence. I am still working out a few things so changes might come rapidly.

Installation guide

Group-lasso requires Python 3.5+, numpy and scikit-learn. To install group-lasso via pip, simply run the command:

pip install group-lasso

Alternatively, you can manually pull this repository and run the setup.py file:

git clone https://github.com/yngvem/group-lasso.git
cd group-lasso
python setup.py

Documentation

You can read the full documentation on readthedocs.

Examples

There are several examples that show usage of the library here.

Further work

  1. Fully test with sparse arrays and make examples

  2. Make it easier to work with categorical data

  3. Poisson regression

Implementation details

The problem is solved using the FISTA optimiser [3] with a gradient-based adaptive restarting scheme [4]. No line search is currently implemented, but I hope to look at that later.

Although fast, the FISTA optimiser does not achieve as low loss values as the significantly slower second order interior point methods. This might, at first glance, seem like a problem. However, it does recover the sparsity patterns of the data, which can be used to train a new model with the given subset of the features.

Also, even though the FISTA optimiser is not meant for stochastic optimisation, it has to my experience not suffered a large fall in performance when the mini batch was large enough. I have therefore implemented mini-batch optimisation using FISTA, and thus been able to fit models based on data with ~500 columns and 10 000 000 rows on my moderately priced laptop.

Finally, we note that since FISTA uses Nesterov acceleration, is not a descent algorithm. We can therefore not expect the loss to decrease monotonically.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

group-lasso-1.5.0.tar.gz (4.3 MB view details)

Uploaded Source

Built Distribution

group_lasso-1.5.0-py3-none-any.whl (33.1 kB view details)

Uploaded Python 3

File details

Details for the file group-lasso-1.5.0.tar.gz.

File metadata

  • Download URL: group-lasso-1.5.0.tar.gz
  • Upload date:
  • Size: 4.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for group-lasso-1.5.0.tar.gz
Algorithm Hash digest
SHA256 3a86115fdfa387021c805a8e3bf09c1f1cc1e32b880778ab017488199ef57310
MD5 b276a20d246a9904833d59d052c34dec
BLAKE2b-256 21b4784d01db4eb7f3eafb9f1a9ac6f141d7050aba615cf6e0186ba9ebdaa299

See more details on using hashes here.

File details

Details for the file group_lasso-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: group_lasso-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 33.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for group_lasso-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a20ad4807834a4438a8829a36e0f355c7633e347aa73502dae8a22fc6e75e977
MD5 d3dc35910675795d95510b906c1dab65
BLAKE2b-256 6312ca38bf6ce7e97ce1b07652efdcec5e69caa0cce8f738afd66268c186fb3b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page