group-lasso·PyPI

Fast group lasso regularised linear models in a sklearn-style API.

Project description

https://coveralls.io/repos/github/yngvem/group-lasso/badge.svg

https://travis-ci.org/yngvem/group-lasso.svg?branch=master

https://img.shields.io/badge/code%20style-black-000000.svg

https://img.shields.io/pypi/l/group-lasso.svg

https://readthedocs.org/projects/group-lasso/badge/?version=latest

The group lasso [1] regulariser is a well known method to achieve structured sparsity in machine learning and statistics. The idea is to create non-overlapping groups of covariates, and recover regression weights in which only a sparse set of these covariate groups have non-zero components.

There are several reasons for why this might be a good idea. Say for example that we have a set of sensors and each of these sensors generate five measurements. We don’t want to maintain an unneccesary number of sensors. If we try normal LASSO regression, then we will get sparse components. However, these sparse components might not correspond to a sparse set of sensors, since they each generate five measurements. If we instead use group LASSO with measurements grouped by which sensor they were measured by, then we will get a sparse set of sensors.

An extension of the group lasso regulariser is the sparse group lasso regulariser [2], which imposes both group-wise sparsity and coefficient-wise sparsity. This is done by combining the group lasso penalty with the traditional lasso penalty. In this library, I have implemented an efficient sparse group lasso solver being fully scikit-learn API compliant.

About this project

This project is developed by Yngve Mardal Moe and released under an MIT lisence.

Installation guide

Currently, the code only works with Python 3.6+, but I aim to support Python 3.5 in the future. To install group-lasso via pip, simply run the command:

pip install group-lasso

Alternatively, you can manually pull this repository and run the setup.py file:

git clone https://github.com/yngvem/group-lasso.git
cd group-lasso
python setup.py

Documentation

You can read the full documentation on readthedocs.

Examples

Group lasso regression

The group lasso regulariser is implemented following the scikit-learn API, making it easy to use for those familiar with the Python ML ecosystem.

import numpy as np
from group_lasso import GroupLasso

# Dataset parameters
num_data_points = 10_000
num_features = 500
num_groups = 25
assert num_features % num_groups == 0

# Generate data matrix
X = np.random.standard_normal((num_data_points, num_features))

# Generate coefficients and intercept
w = np.random.standard_normal((500, 1))
intercept = 2

# Generate groups and randomly set coefficients to zero
groups = np.array([[group]*20 for group in range(25)]).ravel()
for group in range(num_groups):
    w[groups == group] *= np.random.random() < 0.8

# Generate target vector:
y = X@w + intercept
noise = np.random.standard_normal(y.shape)
noise /= np.linalg.norm(noise)
noise *= 0.3*np.linalg.norm(y)
y += noise

# Generate group lasso object and fit the model
gl = GroupLasso(groups=groups, reg=.05)
gl.fit(X, y)
estimated_w = gl.coef_
estimated_intercept = gl.intercept_[0]

# Evaluate the model
coef_correlation = np.corrcoef(w.ravel(), estimated_w.ravel())[0, 1]
print(f"True intercept: {intercept:.2f}. Estimated intercept: {estimated_intercept:.2f}")
print(f"Correlation between true and estimated coefficients: {coef_correlation:.2f}")

True intercept: 2.00. Estimated intercept: 1.53
Correlation between true and estimated coefficients: 0.98

Group lasso as a transformer

Group lasso regression can also be used as a transformer

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Ridge
from group_lasso import GroupLasso

# Dataset parameters
num_data_points = 10_000
num_features = 500
num_groups = 25
assert num_features % num_groups == 0

# Generate data matrix
X = np.random.standard_normal((num_data_points, num_features))

# Generate coefficients and intercept
w = np.random.standard_normal((500, 1))
intercept = 2

# Generate groups and randomly set coefficients to zero
groups = np.array([[group]*20 for group in range(25)]).ravel()
for group in range(num_groups):
    w[groups == group] *= np.random.random() < 0.8

# Generate target vector:
y = X@w + intercept
noise = np.random.standard_normal(y.shape)
noise /= np.linalg.norm(noise)
noise *= 0.3*np.linalg.norm(y)
y += noise

# Generate group lasso object and fit the model
# We use an artificially high regularisation coefficient since
#  we want to use group lasso as a variable selection algorithm.
gl = GroupLasso(groups=groups, group_reg=0.1, l1_reg=0.05)
gl.fit(X, y)
new_X = gl.transform(X)


# Evaluate the model
predicted_y = gl.predict(X)
R_squared = 1 - np.sum((y - predicted_y)**2)/np.sum(y**2)

print("The rows with zero-valued coefficients have now been removed from the dataset.")
print("The new shape is:", new_X.shape)
print(f"The R^2 statistic for the group lasso model is: {R_squared:.2f}")
print("This is very low since the regularisation is so high."

# Use group lasso in a scikit-learn pipeline
pipe = Pipeline(
    memory=None,
    steps=[
        ('variable_selection', GroupLasso(groups=groups, reg=.1)),
        ('regressor', Ridge(alpha=0.1))
    ]
)
pipe.fit(X, y)
predicted_y = pipe.predict(X)
R_squared = 1 - np.sum((y - predicted_y)**2)/np.sum(y**2)

print(f"The R^2 statistic for the pipeline is: {R_squared:.2f}")

The rows with zero-valued coefficients have now been removed from the dataset.
The new shape is: (10000, 280)
The R^2 statistic for the group lasso model is: 0.17
This is very low since the regularisation is so high.
The R^2 statistic for the pipeline is: 0.72

Furher work

The todos are, in decreasing order of importance

Python 3.5 compatibility
Classification problems
- I have an experimental implementation one-class logistic regression, but it is not yet fully validated.
Sparse group lasso
- The proximal operator can be computed using the closed-form solution in [3].
Overlapping groups sparse group lasso
- The proximal operator can be computed using the dual-form in [3].

Unfortunately, the most interesting parts are the least important ones, so expect the list to be worked on from both ends simultaneously.

Implementation details

The problem is solved using the FISTA optimiser [4] with a gradient-based adaptive restarting scheme [5]. No line search is currently implemented, but I hope to look at that later.

Although fast, the FISTA optimiser does not achieve as low loss values as the significantly slower second order interior point methods. This might, at first glance, seem like a problem. However, it does recover the sparsity patterns of the data, which can be used to train a new model with the given subset of the features.

Also, even though the FISTA optimiser is not meant for stochastic optimisation, it has to my experience not suffered a large fall in performance when the mini batch was large enough. I have therefore implemented mini-batch optimisation using FISTA, and thus been able to fit models based on data with ~500 columns and 10 000 000 rows on my moderately priced laptop.

Finally, we note that since FISTA uses Nesterov acceleration, is not a descent algorithm. We can therefore not expect the loss to decrease monotonically.

References

Project details

Release history Release notifications | RSS feed

1.5.0

Feb 4, 2021

1.4.1

Aug 12, 2020

1.4.0

Jul 18, 2020

1.3.3

Apr 20, 2020

1.3.2

Apr 20, 2020

1.3.1

Feb 17, 2020

1.3.0

Feb 10, 2020

1.2.2

Feb 8, 2020

1.2.1

Feb 7, 2020

1.1.1

Sep 5, 2019

1.0.0

Jul 27, 2019

This version

0.1.4

Jul 19, 2019

0.1.3

Jul 16, 2019

0.1.2

Jul 15, 2019

0.1.1

Jul 14, 2019

0.1.0

Jul 14, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

group-lasso-0.1.4.tar.gz (17.1 kB view details)

Uploaded Jul 19, 2019 Source

Built Distribution

group_lasso-0.1.4-py3-none-any.whl (13.2 kB view details)

Uploaded Jul 19, 2019 Python 3

File details

Details for the file group-lasso-0.1.4.tar.gz.

File metadata

Download URL: group-lasso-0.1.4.tar.gz
Upload date: Jul 19, 2019
Size: 17.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for group-lasso-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`9dafd91f799853e3bcdc383e86ed785aa6d3e84394cf9e5ea891b673855c6cc4`
MD5	`27af83e4a792b93e7782c0da62977739`
BLAKE2b-256	`986bc2652728a35e44a657c82c2b9b54d63bfbe77f5bdb4e2ceabfa36836f66d`

See more details on using hashes here.

File details

Details for the file group_lasso-0.1.4-py3-none-any.whl.

File metadata

Download URL: group_lasso-0.1.4-py3-none-any.whl
Upload date: Jul 19, 2019
Size: 13.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for group_lasso-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`66ce3aca16065e3b500a53f1c176835f4fd1c53ce01e3f2e7b12b1abc5d015ef`
MD5	`b5dd2f5d5449fd7455e486ddb12e31a6`
BLAKE2b-256	`c1eb9ef5a7830b678fb800f8daaec28094cc2ea14f848ab29c5496df8ffac56f`

See more details on using hashes here.

group-lasso 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

About this project

Installation guide

Documentation

Examples

Group lasso regression

Group lasso as a transformer

Furher work

Implementation details

References

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes