Skip to main content

A package implementation of COMBSS, a continuous optimisation method toward best subset selection

Project description

COMBSS Logo

Continuous Optimization Method for Best Subset Selection

PyPI version License

Python implementation of a novel continuous optimization method for best subset selection in linear regression.

📄 Reference:
Moka, Liquet, Zhu & Muller (2024)
COMBSS: best subset selection via continuous optimization
Statistics and Computing

🔗 GitHub Repository: saratmoka/combss

Key Features

  • 🎯 Continuous relaxation of discrete subset selection
  • ⚡ Scalable optimization for high-dimensional data

Intercept Handling

The intercept term (if included) is subject to the same selection process as other features.

Installation

pip install combss

Quick Start

A simple example:

import combss
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate sample data
X, y = make_regression(n_samples=1000, n_features=50, noise=0.1, random_state=42)

# Split into training and validation sets (60-40 split)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.4, random_state=42)

# Initialize and fit model with validation data
model = combss.linear.model()
model.fit(
    X_train=X_train, 
    y_train=y_train,
    X_val=X_val,      # Validation features
    y_val=y_val,      # Validation targets
    q=10,             # Maximum subset size
    nlam=50           # Number of λ values
)

# Results
print("Best subset indices:", model.subset)
print("Best coefficients:", model.coef_)
print("Validation MSE:", model.mse)
print("Optimal lambda:", model.lambda_)
print("Computation time (s):", model.run_time)

An example with known true coefficients:

import combss
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Configuration
n_samples = 5000
n_features = 50
n_informative = 5  # the number of non-zero coefficients
noise_level = 0.1

# Generate data with exactly 5 informative features
X, y, true_coef = make_regression(
    n_samples=n_samples,
    n_features=n_features,
    n_informative=n_informative, 
    noise=noise_level,
    coef=True,  # Return the actual coefficients used
    random_state=42
)

# The true coefficients will be non-zero for first 5 features
print("Number of truly informative features:", sum(true_coef != 0))  

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.4, random_state=42)

# Initialize and fit model
model = combss.linear.model()
model.fit(
    X_train=X_train, 
    y_train=y_train,
    X_val=X_val,
    y_val=y_val,
    q=10,
    nlam=50
)

# Results analysis
print("\nTrue non-zero coefficients:", np.where(true_coef != 0)[0])
print("Estimated subset:", model.subset)
print("\nValidation MSE:", model.mse)

Documentation

Core Parameters

Parameter Description Default
q Maximum subset size min(n,p)
nlam Number of λ values 50
scaling Enable feature scaling True
tau Threshold parameter 0.5
delta_frac δ/n in objective function 1

Other Parameters

model.fit(
    ...,
    t_init=t_init,     # Initial point for vector t
    eta=0.001,         # Truncation parameter
    patience=10,       # Early stopping rounds
    gd_maxiter=1000,   # Maximum number of iterations for the gradient based optimization
    gd_tol=1e-5,       # Tolerance for the gradient based optimization
    cg_maxiter=1000,   # Maximum number of iterations allowed in the conjugate gradient method
    cg_tol=1e-6        # Conjugate gradient tolerance
)

Output Attributes

Attribute Description
subset Selected feature indices (0-based)
coef_ Regression coefficients
mse Mean squared error
lambda_ Optimal λ value
run_time Execution time (seconds)
subset_list The list of subsets over the grid
lambda_list The grid of λ values.

Dependencies

  • Python 3.7+
  • NumPy (≥1.21.0)
  • SciPy (≥1.7.0)

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Developers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

combss-1.1.4.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

combss-1.1.4-py2.py3-none-any.whl (18.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file combss-1.1.4.tar.gz.

File metadata

  • Download URL: combss-1.1.4.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for combss-1.1.4.tar.gz
Algorithm Hash digest
SHA256 98e0c527235b3c47b381613170e5e7ea16c0582401e43996a1723531713a018a
MD5 e053f04c77ef136a3f7f4de9a8baadef
BLAKE2b-256 1924f2a9a950af7a98d6c4c61d9df01ccb01e02ca8753cf7fb14eaea3075b6c7

See more details on using hashes here.

File details

Details for the file combss-1.1.4-py2.py3-none-any.whl.

File metadata

  • Download URL: combss-1.1.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for combss-1.1.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d67ecba85474dbb86afe462e7dccada6c1bdfcf1adc05c272b9a53e9d8b116cc
MD5 c6db9bdf9b84863da6208cf8c88c3d60
BLAKE2b-256 632b9d9eb222e4a4d6687ee85da63986e4f21bedf1967f389ac7eb0c99ffe7c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page