A package implementation of COMBSS, a continuous optimisation method toward best subset selection
Project description
Continuous Optimization Method for Best Subset Selection
Python implementation of a novel continuous optimization method for best subset selection in linear regression.
📄 Reference:
Moka, Liquet, Zhu & Muller (2024)
COMBSS: best subset selection via continuous optimization
Statistics and Computing
🔗 GitHub Repository: saratmoka/combss
Key Features
- 🎯 Continuous relaxation of discrete subset selection
- ⚡ Scalable optimization for high-dimensional data
Intercept Handling
The intercept term (if included) is subject to the same selection process as other features.
Installation
pip install combss
Quick Start
A simple example:
import combss
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
# Generate sample data
X, y = make_regression(n_samples=1000, n_features=50, noise=0.1, random_state=42)
# Split into training and validation sets (60-40 split)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.4, random_state=42)
# Initialize and fit model with validation data
model = combss.linear.model()
model.fit(
X_train=X_train,
y_train=y_train,
X_val=X_val, # Validation features
y_val=y_val, # Validation targets
q=10, # Maximum subset size
nlam=50 # Number of λ values
)
# Results
print("Best subset indices:", model.subset)
print("Best coefficients:", model.coef_)
print("Validation MSE:", model.mse)
print("Optimal lambda:", model.lambda_)
print("Computation time (s):", model.run_time)
An example with known true coefficients:
import combss
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
# Configuration
n_samples = 5000
n_features = 50
n_informative = 5 # the number of non-zero coefficients
noise_level = 0.1
# Generate data with exactly 5 informative features
X, y, true_coef = make_regression(
n_samples=n_samples,
n_features=n_features,
n_informative=n_informative,
noise=noise_level,
coef=True, # Return the actual coefficients used
random_state=42
)
# The true coefficients will be non-zero for first 5 features
print("Number of truly informative features:", sum(true_coef != 0))
# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.4, random_state=42)
# Initialize and fit model
model = combss.linear.model()
model.fit(
X_train=X_train,
y_train=y_train,
X_val=X_val,
y_val=y_val,
q=10,
nlam=50
)
# Results analysis
print("\nTrue non-zero coefficients:", np.where(true_coef != 0)[0])
print("Estimated subset:", model.subset)
print("\nValidation MSE:", model.mse)
Documentation
Core Parameters
| Parameter | Description | Default |
|---|---|---|
q |
Maximum subset size | min(n,p) |
nlam |
Number of λ values | 50 |
scaling |
Enable feature scaling | True |
tau |
Threshold parameter | 0.5 |
delta_frac |
δ/n in objective function | 1 |
Other Parameters
model.fit(
...,
t_init=t_init, # Initial point for vector t
eta=0.001, # Truncation parameter
patience=10, # Early stopping rounds
gd_maxiter=1000, # Maximum number of iterations for the gradient based optimization
gd_tol=1e-5, # Tolerance for the gradient based optimization
cg_maxiter=1000, # Maximum number of iterations allowed in the conjugate gradient method
cg_tol=1e-6 # Conjugate gradient tolerance
)
Output Attributes
| Attribute | Description |
|---|---|
subset |
Selected feature indices (0-based) |
coef_ |
Regression coefficients |
mse |
Mean squared error |
lambda_ |
Optimal λ value |
run_time |
Execution time (seconds) |
subset_list |
The list of subsets over the grid |
lambda_list |
The grid of λ values. |
Dependencies
- Python 3.7+
- NumPy (≥1.21.0)
- SciPy (≥1.7.0)
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
Developers
- Sarat Moka (@saratmoka)
- Hua Yang Hu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file combss-1.1.4.tar.gz.
File metadata
- Download URL: combss-1.1.4.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98e0c527235b3c47b381613170e5e7ea16c0582401e43996a1723531713a018a
|
|
| MD5 |
e053f04c77ef136a3f7f4de9a8baadef
|
|
| BLAKE2b-256 |
1924f2a9a950af7a98d6c4c61d9df01ccb01e02ca8753cf7fb14eaea3075b6c7
|
File details
Details for the file combss-1.1.4-py2.py3-none-any.whl.
File metadata
- Download URL: combss-1.1.4-py2.py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d67ecba85474dbb86afe462e7dccada6c1bdfcf1adc05c272b9a53e9d8b116cc
|
|
| MD5 |
c6db9bdf9b84863da6208cf8c88c3d60
|
|
| BLAKE2b-256 |
632b9d9eb222e4a4d6687ee85da63986e4f21bedf1967f389ac7eb0c99ffe7c3
|