A regression solver for high dimensional penalized linear, quantile and logistic regression models
Project description
asgl 
Introduction
asgl fits penalized regression models for high-dimensional variable selection.
It supports linear (lm), quantile (qr), and logistic (logit) regression,
with a rich menu of penalizations — from plain Lasso to Adaptive Sparse Group
Lasso (ASGL) — all through a single scikit-learn compatible Regressor class.
The package is especially useful when:
- Variables have a known group structure (gene pathways, dummy-variable families, …)
- You need simultaneous group- and individual-level sparsity
- You want adaptive weights to improve oracle properties
- Your design matrix
Xis ascipy.sparsematrix
Based on:
- Adaptive Sparse Group Lasso in Quantile Regression
asgl: A Python Package for Penalized Linear and Quantile Regression
Features
| Feature | Details |
|---|---|
| Models | Linear (lm), quantile (qr), logistic binary classification (logit) |
| Penalizations | lasso, ridge, gl, sgl, alasso, aridge, agl, asgl, or None |
| Sparse input | Both dense and scipy.sparse matrices accepted. |
| Multi-output Y | lm and qr accept a 2D y matrix for simultaneous multi-response fitting |
| Solver fallback | solver accepts a list; falls back through installed CVXPY solvers automatically |
| Adaptive weights | 8 built-in weight techniques: pca_pct, pca_1, pls_pct, pls_1, lasso, ridge, unpenalized, sparse_pca |
| sklearn API | Full fit / predict / score / GridSearchCV / cross_val_predict support |
Installation
pip install asgl
Requirements: Python >= 3.10, cvxpy >= 1.5.0, numpy >= 1.20.0, scikit-learn >= 1.6, scipy >= 1.1
To run the test suite after installation:
pytest
Quickstart
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from asgl import Regressor
X, y = make_regression(n_samples=500, n_features=50, n_informative=20,
noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = Regressor(model='lm', penalization='lasso', lambda1=0.1)
model.fit(X_train, y_train)
print(model.coef_)
print(mean_squared_error(y_test, model.predict(X_test)))
API Reference
Regressor
from asgl import Regressor
Regressor(
model='lm', # 'lm' | 'qr' | 'logit'
penalization='lasso', # see Penalizations table below, or None
quantile=0.5, # quantile level (qr only)
fit_intercept=True,
lambda1=0.1, # penalization strength
alpha=0.5, # lasso/group-lasso tradeoff for sgl/asgl
solver='default', # str or list[str] — CVXPY solver(s)
canon_backend='CPP', # 'CPP' | 'SCIPY' | 'COO'
verbose=False,
weight_technique='pca_pct', # adaptive weight method (adaptive penalties only)
individual_power_weight=1,
group_power_weight=1,
variability_pct=0.9,
lambda1_weights=0.1,
spca_alpha=1e-5,
spca_ridge_alpha=1e-2,
individual_weights=None, # override weight estimation with custom array
group_weights=None,
tol=1e-3,
weight_tol=1e-4,
)
Penalizations
penalization |
Type | Group structure required |
|---|---|---|
None |
Unpenalized | No |
'lasso' |
Individual | No |
'ridge' |
Individual | No |
'gl' |
Group | Yes |
'sgl' |
Individual + Group | Yes |
'alasso' |
Adaptive individual | No |
'aridge' |
Adaptive individual | No |
'agl' |
Adaptive group | Yes |
'asgl' |
Adaptive individual + Group | Yes |
Key methods
| Method | Description |
|---|---|
fit(X, y, group_index=None) |
Fit the model |
predict(X) |
Predict (regression output or class labels for logit) |
predict_proba(X) |
Class probabilities (logit only) |
decision_function(X) |
Raw linear scores |
score(X, y) |
R² (regression) or accuracy (classifier) |
Fitted attributes
| Attribute | Description |
|---|---|
coef_ |
(n_features,) or (n_features, n_outputs) coefficient array |
intercept_ |
Intercept scalar |
n_features_in_ |
Number of features seen during fit |
solver_stats_ |
Dict with solver name, iterations, timing |
Examples
1 — Quantile regression with Adaptive Sparse Group Lasso + cross-validation
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from asgl import Regressor
X, y = make_regression(n_samples=1000, n_features=50, n_informative=25,
noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
group_index = np.repeat(np.arange(1, 11), 5) # 10 groups of 5 features each
model = Regressor(model='qr', penalization='asgl', quantile=0.5)
param_grid = {
'lambda1': [1e-3, 1e-2, 1e-1, 1.0],
'alpha': [0.0, 0.5, 1.0],
}
cv = RandomizedSearchCV(model, param_grid, scoring='neg_median_absolute_error',
n_iter=12, cv=5)
cv.fit(X_train, y_train, **{'group_index': group_index})
print(cv.best_params_)
print(cv.score(X_test, y_test))
2 — Sparse input (scipy.sparse)
import scipy.sparse as sp
from sklearn.datasets import make_regression
from asgl import Regressor
X_dense, y = make_regression(n_samples=500, n_features=200, n_informative=30,
random_state=0)
X = sp.random(500, 200, density=0.05, format='csr') # or your real sparse matrix
model = Regressor(model='lm', penalization='lasso', lambda1=0.05)
model.fit(X, y)
print(f"Non-zero coefficients: {(model.coef_ != 0).sum()}")
3 — Multi-output regression
import numpy as np
from sklearn.datasets import make_regression
from asgl import Regressor
X, y_1d = make_regression(n_samples=300, n_features=30, n_informative=10,
noise=3, random_state=7)
y = np.column_stack([y_1d, y_1d * 0.5 + np.random.randn(300) * 2]) # 2 outputs
group_index = np.repeat(np.arange(1, 6), 6) # 5 groups
model = Regressor(model='lm', penalization='gl', lambda1=0.1)
model.fit(X, y, group_index=group_index)
print(model.coef_.shape) # (n_features, 2)
4 — Solver fallback
from asgl import Regressor
# Try CLARABEL first, then SCS, then let cvxpy choose
model = Regressor(model='lm', penalization='lasso',
solver=['CLARABEL', 'SCS', 'default'])
model.fit(X_train, y_train)
print(model.solver_stats_['solver_name'])
5 — Logistic regression with custom decision threshold
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, cross_val_predict
from sklearn.metrics import accuracy_score
from asgl import Regressor
X, y = make_classification(n_samples=1000, n_features=100, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
model = Regressor(model='logit', penalization='ridge')
proba_cv = cross_val_predict(model, X_train, y_train, method='predict_proba', cv=5)
# Find threshold that maximises CV accuracy
thresholds = np.linspace(0.01, 0.99, 99)
best_thr = thresholds[np.argmax(
[accuracy_score(y_train, (proba_cv[:, 1] >= t).astype(int)) for t in thresholds]
)]
model.fit(X_train, y_train)
test_preds = (model.predict_proba(X_test)[:, 1] >= best_thr).astype(int)
print(f"Test accuracy: {accuracy_score(y_test, test_preds):.3f}")
Citation
If you use asgl in a scientific publication, please cite:
@article{mendez2022adaptive,
title = {Adaptive sparse group lasso in quantile regression},
author = {M{\'e}ndez-Civieta, {\'A}lvaro and Aguilera-Morillo, M Carmen and Lillo, Rosa E},
journal = {Advances in Data Analysis and Classification},
year = {2021},
doi = {10.1007/s11634-020-00413-8}
}
Full paper | Package paper | Towards Data Science walkthrough
Contributions
Contributions are welcome! Please open an issue to discuss ideas or submit a pull request.
See CONTRIBUTORS.md for a full list of contributors.
Acknowledgments
v2.2.0 incorporates a major contribution from zeyuz35: sparse matrix support, multi-output Y regression, solver fallbacks, performance improvements (vectorized group weights, PLS optimization), and an expanded test suite. See CONTRIBUTORS.md for details.
What's new?
2.2.0
- Sparse matrix (
scipy.sparse) input support throughout - Multivariate Y (multi-output) for
lmandqrmodels solveraccepts a list of names with automatic fallback- New parameters:
verbose,canon_backend - Performance: vectorized group weights, PLS without refitting
- Internal refactor:
skmodels.py→ 5 focused modules - Test suite: 24 → 96 test functions
- Requires Python >= 3.10
2.1.4
- scikit-learn estimator tag compliance
- Quantile loss optimized via residual-splitting LP
2.1.3
- Logistic model rewritten:
predict_proba,decision_functionadded logit_probaandlogit_rawmodel types removed
2.1.0
- Ridge and adaptive ridge penalizations added (
'ridge','aridge')
2.0.0
Regressorclass introduced with full scikit-learn compatibility
License
GPL-3.0 — open source, modifications must be redistributed under the same license. See LICENSE for full text.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asgl-2.2.0.tar.gz.
File metadata
- Download URL: asgl-2.2.0.tar.gz
- Upload date:
- Size: 46.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcca11b6c442f99f9496bb1ef1a3d2658ea2b5bb1c570e984e5a9ab4606558b2
|
|
| MD5 |
8400c548851922d4341fea9f5bc57892
|
|
| BLAKE2b-256 |
9b4fa5ae434b63e6ff468a43f5ca9aeccdc07907d89ad8883aba701ef062b307
|
File details
Details for the file asgl-2.2.0-py3-none-any.whl.
File metadata
- Download URL: asgl-2.2.0-py3-none-any.whl
- Upload date:
- Size: 29.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e5e4d4785dc0c230761f27a574e80f32204f38e674420bc995fd4541589368c
|
|
| MD5 |
d9e46f3953fae548a029630d16c5a6b4
|
|
| BLAKE2b-256 |
b8327975b77b1765f38da1290c33b68f927bbcc03182e5800df2205ae19c7f48
|