Skip to main content

A regression solver for high dimensional penalized linear, quantile and logistic regression models

Project description

asgl asgl logo

PyPI version Python Downloads Downloads/month License: GPL v3

Introduction

asgl fits penalized regression models for high-dimensional variable selection. It supports linear (lm), quantile (qr), and logistic (logit) regression, with a rich menu of penalizations — from plain Lasso to Adaptive Sparse Group Lasso (ASGL) — all through a single scikit-learn compatible Regressor class.

The package is especially useful when:

  • Variables have a known group structure (gene pathways, dummy-variable families, …)
  • You need simultaneous group- and individual-level sparsity
  • You want adaptive weights to improve oracle properties
  • Your design matrix X is a scipy.sparse matrix

Based on:


Features

Feature Details
Models Linear (lm), quantile (qr), logistic binary classification (logit)
Penalizations lasso, ridge, gl, sgl, alasso, aridge, agl, asgl, or None
Sparse input Both dense and scipy.sparse matrices accepted.
Multi-output Y lm and qr accept a 2D y matrix for simultaneous multi-response fitting
Solver fallback solver accepts a list; falls back through installed CVXPY solvers automatically
Adaptive weights 8 built-in weight techniques: pca_pct, pca_1, pls_pct, pls_1, lasso, ridge, unpenalized, sparse_pca
sklearn API Full fit / predict / score / GridSearchCV / cross_val_predict support

Installation

pip install asgl

Requirements: Python >= 3.10, cvxpy >= 1.5.0, numpy >= 1.20.0, scikit-learn >= 1.6, scipy >= 1.1

To run the test suite after installation:

pytest

Quickstart

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from asgl import Regressor

X, y = make_regression(n_samples=500, n_features=50, n_informative=20,
                       noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = Regressor(model='lm', penalization='lasso', lambda1=0.1)
model.fit(X_train, y_train)

print(model.coef_)
print(mean_squared_error(y_test, model.predict(X_test)))

API Reference

Regressor

from asgl import Regressor

Regressor(
    model='lm',                  # 'lm' | 'qr' | 'logit'
    penalization='lasso',        # see Penalizations table below, or None
    quantile=0.5,                # quantile level (qr only)
    fit_intercept=True,
    lambda1=0.1,                 # penalization strength
    alpha=0.5,                   # lasso/group-lasso tradeoff for sgl/asgl
    solver='default',            # str or list[str] — CVXPY solver(s)
    canon_backend='CPP',         # 'CPP' | 'SCIPY' | 'COO'
    verbose=False,
    weight_technique='pca_pct',  # adaptive weight method (adaptive penalties only)
    individual_power_weight=1,
    group_power_weight=1,
    variability_pct=0.9,
    lambda1_weights=0.1,
    spca_alpha=1e-5,
    spca_ridge_alpha=1e-2,
    individual_weights=None,     # override weight estimation with custom array
    group_weights=None,
    tol=1e-3,
    weight_tol=1e-4,
)

Penalizations

penalization Type Group structure required
None Unpenalized No
'lasso' Individual No
'ridge' Individual No
'gl' Group Yes
'sgl' Individual + Group Yes
'alasso' Adaptive individual No
'aridge' Adaptive individual No
'agl' Adaptive group Yes
'asgl' Adaptive individual + Group Yes

Key methods

Method Description
fit(X, y, group_index=None) Fit the model
predict(X) Predict (regression output or class labels for logit)
predict_proba(X) Class probabilities (logit only)
decision_function(X) Raw linear scores
score(X, y) R² (regression) or accuracy (classifier)

Fitted attributes

Attribute Description
coef_ (n_features,) or (n_features, n_outputs) coefficient array
intercept_ Intercept scalar
n_features_in_ Number of features seen during fit
solver_stats_ Dict with solver name, iterations, timing

Examples

1 — Quantile regression with Adaptive Sparse Group Lasso + cross-validation

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from asgl import Regressor

X, y = make_regression(n_samples=1000, n_features=50, n_informative=25,
                       noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

group_index = np.repeat(np.arange(1, 11), 5)   # 10 groups of 5 features each

model = Regressor(model='qr', penalization='asgl', quantile=0.5)

param_grid = {
    'lambda1': [1e-3, 1e-2, 1e-1, 1.0],
    'alpha':   [0.0, 0.5, 1.0],
}
cv = RandomizedSearchCV(model, param_grid, scoring='neg_median_absolute_error',
                        n_iter=12, cv=5)
cv.fit(X_train, y_train, **{'group_index': group_index})
print(cv.best_params_)
print(cv.score(X_test, y_test))

2 — Sparse input (scipy.sparse)

import scipy.sparse as sp
from sklearn.datasets import make_regression
from asgl import Regressor

X_dense, y = make_regression(n_samples=500, n_features=200, n_informative=30,
                              random_state=0)
X = sp.random(500, 200, density=0.05, format='csr')  # or your real sparse matrix

model = Regressor(model='lm', penalization='lasso', lambda1=0.05)
model.fit(X, y)
print(f"Non-zero coefficients: {(model.coef_ != 0).sum()}")

3 — Multi-output regression

import numpy as np
from sklearn.datasets import make_regression
from asgl import Regressor

X, y_1d = make_regression(n_samples=300, n_features=30, n_informative=10,
                           noise=3, random_state=7)
y = np.column_stack([y_1d, y_1d * 0.5 + np.random.randn(300) * 2])  # 2 outputs

group_index = np.repeat(np.arange(1, 6), 6)   # 5 groups

model = Regressor(model='lm', penalization='gl', lambda1=0.1)
model.fit(X, y, group_index=group_index)
print(model.coef_.shape)   # (n_features, 2)

4 — Solver fallback

from asgl import Regressor

# Try CLARABEL first, then SCS, then let cvxpy choose
model = Regressor(model='lm', penalization='lasso',
                  solver=['CLARABEL', 'SCS', 'default'])
model.fit(X_train, y_train)
print(model.solver_stats_['solver_name'])

5 — Logistic regression with custom decision threshold

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, cross_val_predict
from sklearn.metrics import accuracy_score
from asgl import Regressor

X, y = make_classification(n_samples=1000, n_features=100, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

model = Regressor(model='logit', penalization='ridge')
proba_cv = cross_val_predict(model, X_train, y_train, method='predict_proba', cv=5)

# Find threshold that maximises CV accuracy
thresholds = np.linspace(0.01, 0.99, 99)
best_thr = thresholds[np.argmax(
    [accuracy_score(y_train, (proba_cv[:, 1] >= t).astype(int)) for t in thresholds]
)]

model.fit(X_train, y_train)
test_preds = (model.predict_proba(X_test)[:, 1] >= best_thr).astype(int)
print(f"Test accuracy: {accuracy_score(y_test, test_preds):.3f}")

Citation

If you use asgl in a scientific publication, please cite:

@article{mendez2022adaptive,
  title   = {Adaptive sparse group lasso in quantile regression},
  author  = {M{\'e}ndez-Civieta, {\'A}lvaro and Aguilera-Morillo, M Carmen and Lillo, Rosa E},
  journal = {Advances in Data Analysis and Classification},
  year    = {2021},
  doi     = {10.1007/s11634-020-00413-8}
}

Full paper | Package paper | Towards Data Science walkthrough


Contributions

Contributions are welcome! Please open an issue to discuss ideas or submit a pull request.

See CONTRIBUTORS.md for a full list of contributors.

Acknowledgments

v2.2.0 incorporates a major contribution from zeyuz35: sparse matrix support, multi-output Y regression, solver fallbacks, performance improvements (vectorized group weights, PLS optimization), and an expanded test suite. See CONTRIBUTORS.md for details.


What's new?

2.2.0

  • Sparse matrix (scipy.sparse) input support throughout
  • Multivariate Y (multi-output) for lm and qr models
  • solver accepts a list of names with automatic fallback
  • New parameters: verbose, canon_backend
  • Performance: vectorized group weights, PLS without refitting
  • Internal refactor: skmodels.py → 5 focused modules
  • Test suite: 24 → 96 test functions
  • Requires Python >= 3.10

2.1.4

  • scikit-learn estimator tag compliance
  • Quantile loss optimized via residual-splitting LP

2.1.3

  • Logistic model rewritten: predict_proba, decision_function added
  • logit_proba and logit_raw model types removed

2.1.0

  • Ridge and adaptive ridge penalizations added ('ridge', 'aridge')

2.0.0

  • Regressor class introduced with full scikit-learn compatibility

License

GPL-3.0 — open source, modifications must be redistributed under the same license. See LICENSE for full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asgl-2.2.0.tar.gz (46.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asgl-2.2.0-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file asgl-2.2.0.tar.gz.

File metadata

  • Download URL: asgl-2.2.0.tar.gz
  • Upload date:
  • Size: 46.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for asgl-2.2.0.tar.gz
Algorithm Hash digest
SHA256 fcca11b6c442f99f9496bb1ef1a3d2658ea2b5bb1c570e984e5a9ab4606558b2
MD5 8400c548851922d4341fea9f5bc57892
BLAKE2b-256 9b4fa5ae434b63e6ff468a43f5ca9aeccdc07907d89ad8883aba701ef062b307

See more details on using hashes here.

File details

Details for the file asgl-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: asgl-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for asgl-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e5e4d4785dc0c230761f27a574e80f32204f38e674420bc995fd4541589368c
MD5 d9e46f3953fae548a029630d16c5a6b4
BLAKE2b-256 b8327975b77b1765f38da1290c33b68f927bbcc03182e5800df2205ae19c7f48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page