COMBSS: continuous optimisation for best subset selection in generalised linear models (linear, logistic, multinomial)
Project description
COMBSS: Best Subset Selection for Generalised Linear Models
Python implementation of COMBSS (Continuous Optimisation for Best Subset Selection) for generalised linear models. COMBSS reformulates the NP-hard discrete subset selection problem as a continuous optimisation over the hypercube [0,1]^p, making it scalable to high-dimensional settings with p >> n.
Supported model types:
- Linear regression (continuous response)
- Binary logistic regression (two-class classification)
- Multinomial logistic regression (multi-class, C > 2)
References
Moka, Liquet, Zhu & Muller (2024). COMBSS: best subset selection via continuous optimization. Statistics and Computing.
Mathur, Liquet, Muller & Moka (2026). Parsimonious Subset Selection for Generalized Linear Models with Biomedical Applications. arXiv preprint.
Installation
pip install combss
Quick Start
Linear regression
import combss
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=1000, n_features=50, n_informative=5,
noise=0.1, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.4)
# Frank-Wolfe method (default) — with validation data
model = combss.linear.model()
model.fit(X_train, y_train, X_val=X_val, y_val=y_val, q=10)
print("Best subset:", model.subset)
print("Best MSE:", model.mse)
# All subsets for k = 1, ..., q
for k, feat in enumerate(model.subset_list, 1):
print(f"k={k:2d} features={feat.tolist()}")
Without validation data, subset_list is still available
but subset, mse, and coef_ will be None:
model = combss.linear.model()
model.fit(X_train, y_train, q=10)
for k, feat in enumerate(model.subset_list, 1):
print(f"k={k:2d} features={feat.tolist()}")
To use the original Adam + dynamic-lambda method from v1.x:
model = combss.linear.model()
model.fit(X_train, y_train, X_val=X_val, y_val=y_val, q=10, method='original')
print("Best subset:", model.subset)
print("Validation MSE:", model.mse)
print("Optimal lambda:", model.lambda_)
Binary logistic regression
import combss
# X_train, X_val : (n, p) feature matrices
# y_train, y_val : binary labels {0, 1}
model = combss.logistic.model()
model.fit(X_train, y_train, X_val=X_val, y_val=y_val, q=15)
print("Best subset:", model.subset)
print("Best accuracy:", model.accuracy)
for k, feat in enumerate(model.subset_list, 1):
print(f"k={k:2d} features={feat.tolist()}")
Multinomial logistic regression
import combss
import numpy as np
import pandas as pd
# Example with the Khan SRBCT dataset (4 tumour classes, 2308 genes)
train = pd.read_csv('data/Khan_train.csv')
test = pd.read_csv('data/Khan_test.csv')
y_train = train.iloc[:, 0].values.astype(int)
X_train = train.iloc[:, 1:].values.astype(float)
y_test = test.iloc[:, 0].values.astype(int)
X_test = test.iloc[:, 1:].values.astype(float)
C = len(np.unique(y_train))
model = combss.multinomial.model()
model.fit(X_train, y_train, X_val=X_test, y_val=y_test, q=20, C=C)
print("Best subset:", model.subset)
print("Best accuracy:", model.accuracy)
for k, feat in enumerate(model.subset_list, 1):
print(f"k={k:2d} features={feat.tolist()}")
Lambda selection via LOOCV
Select the ridge penalty using leave-one-out cross-validation:
import combss
# Classification
best_lam, best_lam_per_k, cv_df = combss.cv.select_lambda(
X_train, y_train,
q=15, C=4,
model_type='multinomial',
)
# Linear regression
best_lam, best_lam_per_k, cv_df = combss.cv.select_lambda(
X_train, y_train,
q=15,
model_type='linear',
)
API Reference
combss.linear.model
Best subset selection for linear regression.
model = combss.linear.model()
model.fit(X_train, y_train, q=10) # FW (default), no validation
model.fit(X_train, y_train, X_val, y_val, q=10) # FW with validation
model.fit(X_train, y_train, X_val, y_val, q=10, # Original method
method='original')
Frank-Wolfe method parameters (method='fw', default)
Sparsity is controlled by k (model size): COMBSS returns selected features for
each k = 1, ..., q. The lam_ridge parameter is an optional ridge regularisation
on the coefficients in the inner solver, unrelated to the sparsity penalty lambda
used in the original method.
| Parameter | Default | Description |
|---|---|---|
q |
min(n,p) | Maximum subset size |
Niter |
25 | Number of homotopy iterations |
lam_ridge |
0 | Ridge penalty on coefficients in the inner solver |
alpha |
0.01 | Frank-Wolfe step size |
scale |
True | Column-normalise X |
mandatory_features |
None | 1-indexed features always included |
inner_tol |
1e-4 | Inner solver tolerance |
patience |
20 | Early stopping: stop after this many k with no improvement. Requires validation data. Set to None to disable |
min_k |
20 | Minimum k values before early stopping can trigger. Automatically disabled if min_k + patience > q |
verbose |
True | Print progress |
Original method parameters (method='original')
Sparsity is controlled by lambda: a grid of lambda values is searched, and each
lambda yields a different subset. The best subset is selected by validation MSE.
| Parameter | Default | Description |
|---|---|---|
q |
min(n,p) | Maximum subset size |
nlam |
50 | Number of lambda values in dynamic grid |
scaling |
True | Enable feature scaling |
tau |
0.5 | Threshold parameter |
delta_frac |
1 | n/delta in objective function |
eta |
0.001 | Truncation parameter |
gd_patience |
10 | Patience for Adam termination |
gd_maxiter |
1000 | Maximum Adam iterations |
gd_tol |
1e-5 | Adam convergence tolerance |
cg_maxiter |
n | Conjugate gradient max iterations |
cg_tol |
1e-5 | Conjugate gradient tolerance |
Output attributes — Frank-Wolfe method
| Attribute | Description |
|---|---|
subset |
Best subset (0-indexed). Requires validation data, otherwise None |
coef_ |
Regression coefficients (length p, zeros for unselected). Requires validation data, otherwise None |
mse |
Validation MSE for best subset. Requires validation data, otherwise None |
lam_ridge |
Ridge penalty value used |
subset_list |
List of subsets for k = 1, ..., q (0-indexed). May be shorter if early stopping triggered |
Output attributes — Original method
| Attribute | Description |
|---|---|
subset |
Best subset (0-indexed) |
coef_ |
Regression coefficients (length p, zeros for unselected) |
mse |
Validation MSE for best subset |
lambda_ |
Optimal lambda value |
subset_list |
List of subsets across the lambda grid (0-indexed) |
lambda_list |
Lambda grid values |
combss.logistic.model
Best subset selection for binary logistic regression.
model = combss.logistic.model()
model.fit(X_train, y_train, q=15) # Without validation
model.fit(X_train, y_train, X_val, y_val, q=15) # With validation
Labels y must be binary {0, 1}. Parameters are the same as the Frank-Wolfe method
in combss.linear.model.
Output attributes
| Attribute | Description |
|---|---|
subset |
Best subset (0-indexed). Requires validation data, otherwise None |
coef_ |
Logistic coefficients (length p, zeros for unselected). Requires validation data, otherwise None |
accuracy |
Validation accuracy for best subset. Requires validation data, otherwise None |
lam_ridge |
Ridge penalty value used |
subset_list |
List of subsets for k = 1, ..., q (0-indexed) |
combss.multinomial.model
Best subset selection for multinomial logistic regression.
model = combss.multinomial.model()
model.fit(X_train, y_train, q=20, C=4) # Without validation
model.fit(X_train, y_train, X_val, y_val, q=20, C=4) # With validation
Labels y must be in {1, ..., C}. Parameters are the same as the Frank-Wolfe method
in combss.linear.model, plus:
| Additional Parameter | Default | Description |
|---|---|---|
C |
inferred | Number of classes |
Output attributes
| Attribute | Description |
|---|---|
subset |
Best subset (0-indexed). Requires validation data, otherwise None |
coef_ |
Multinomial coefficients (shape (C, p), zeros for unselected). Requires validation data, otherwise None |
accuracy |
Validation accuracy for best subset. Requires validation data, otherwise None |
lam_ridge |
Ridge penalty value used |
subset_list |
List of subsets for k = 1, ..., q (0-indexed) |
combss.cv.select_lambda
LOOCV-based ridge penalty selection.
best_lam, best_lam_per_k, results_df = combss.cv.select_lambda(
X, y, q, C=None, lambda_grid=None, model_type='multinomial', ...
)
| Parameter | Default | Description |
|---|---|---|
X |
-- | Feature matrix (n, p), no intercept |
y |
-- | Response / labels |
q |
-- | Maximum subset size |
C |
None | Number of classes (required for classification) |
lambda_grid |
[0] + logspace(-3,1,10) | Candidate ridge penalty values |
Niter |
50 | Homotopy iterations |
model_type |
'multinomial' | 'logit', 'multinomial', or 'linear' |
lambda_refit |
0 | Ridge penalty for LOOCV refit |
Returns: (best_lambda, best_lambda_per_k, results_df)
- Classification:
results_dfhas columnslambda, k, loocv_acc, selected - Linear:
results_dfhas columnslambda, k, loocv_mse, selected
combss.metrics
Performance metrics for variable selection evaluation.
result = combss.metrics.performance_metrics(X, beta_true, beta_pred)
# result keys: pe, mcc, acc, sens, spec, f1, prec
Algorithm Overview
The Frank-Wolfe homotopy algorithm operates as follows:
- Initialise t = (k/p, ..., k/p) -- centroid of the k-sparse simplex T_k
- For i = 1, ..., N (homotopy loop):
- Set delta_i on a geometric schedule from delta_min to delta_max
- Solve the ridge-penalised GLM inner problem at current t
- Compute the Danskin gradient (no Hessian required)
- Find the Frank-Wolfe vertex (k smallest gradient components)
- Update: t <- (1 - alpha) t + alpha s
- Select the k features with the largest final t values
The penalty schedule is auto-calibrated from the spectral norm of X, so no tuning is needed in practice.
Intercept Handling
The intercept is handled internally and is not subject to selection. For the Frank-Wolfe method, an intercept column is prepended automatically. For the original method, the intercept is treated the same as other features.
Dependencies
- Python 3.8+
- NumPy (>= 1.21.0)
- SciPy (>= 1.7.0)
- scikit-learn (>= 1.0.0)
- pandas (>= 1.3.0)
Developers
- Sarat Moka
- Anant Mathur
- Hua Yang Hu
Contributing
Contributions are welcome! Please open an issue or submit a pull request at github.com/saratmoka/combss.
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file combss-2.0.0.tar.gz.
File metadata
- Download URL: combss-2.0.0.tar.gz
- Upload date:
- Size: 42.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f492fb1c3f1b81df1792a3de8ce72bb8e2081474c0ef2025a99fb4c87613432
|
|
| MD5 |
d9e0ea2ca7f84984c221576613810874
|
|
| BLAKE2b-256 |
6dd501e9388f73f553004fe8a7cb77d21c76138970890bd7b9c4bfeb8fc842cc
|
File details
Details for the file combss-2.0.0-py2.py3-none-any.whl.
File metadata
- Download URL: combss-2.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 35.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
103c22b632cb5294fd969d86dc2bcd79385e978a364b81fe546131c4728b0aa3
|
|
| MD5 |
89c3c75f40db2d2eb63b8506277b05b5
|
|
| BLAKE2b-256 |
42757e6763fa8323e95fe6237ee92c97af074a8b7c1a936d01477e523003f7b0
|