Skip to main content

Cardinality- and budget-constrained feature selection for logistic regression using mixed-integer conic optimization

Project description

l0l2learn

Feature selection for logistic regression using mixed-integer conic optimization. Unlike Lasso-based approaches, l0l2learn directly optimizes feature subsets under explicit cardinality or budget constraints.

Overview

l0l2learn is a Python package that provides sklearn-style estimators for cardinality- and budget-constrained feature selection in logistic regression. The package currently includes:

  • L0L2Classifier: L0-constrained L2-regularized logistic regression
  • ResampledL0L2Classifier: resampling-based feature selection with frequency-based aggregation to improve the selection stability

Installation

To install the package, use the following command:

pip install l0l2learn

Please check the MOSEK website to request and set up a license for the conic solver.

Quick Start

Feature Selection Without Resampling

from sklearn.datasets import load_breast_cancer
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

from l0l2learn import L0L2Classifier


X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y
)

clf = L0L2Classifier(
    b=3,
    lambd=1.0
)

clf.fit(X_train, y_train)
y_proba = clf.predict_proba(X_test)

print("ROC AUC:      ", roc_auc_score(y_test, y_proba[:, 1]))
print("Coefficients: ", clf.coef_)
print("Intercept:    ", clf.intercept_)
print("Support:      ", clf.support_)

Feature Selection With Resampling

from sklearn.datasets import load_breast_cancer
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

from l0l2learn import ResampledL0L2Classifier


X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y
)

clf = ResampledL0L2Classifier(
    b=3,
    param_grid={"lambd": [1.0]},
    n_resamples=3
)

clf.fit(X_train, y_train)
y_proba = clf.predict_proba(X_test)

print("ROC AUC:      ", roc_auc_score(y_test, y_proba[:, 1]))
print("Coefficients: ", clf.coef_)
print("Intercept:    ", clf.intercept_)
print("Support:      ", clf.support_)
print("VIFs:         ", clf.variable_inclusion_frequencies_[clf.support_])
print("MSFs:         ", clf.model_selection_frequencies_)

Hyperparameters

Feature Costs

Feature-specific costs can be supplied through c:

clf = L0L2Classifier(c=[1, 2, 5])

The optimization then accounts for some variables to be more expensive than others.

Feature Selection Budget

The feature selection budget is controlled through b:

clf = L0L2Classifier(b=5)

When all feature costs are equal to one (the default), b directly controls the maximum number of selected features.

L2 Regularization

The L2 regularization strength is given by lambd:

clf = L0L2Classifier(lambd=0.1)

Larger values can attenuate overfitting and increase robustness.

Number of Resamples

n_resamples determines how many resampled models are fitted:

clf = ResampledL0L2Classifier(b=5, n_resamples=99)

Larger values can improve frequency estimates but increase runtime.

Other Hyperparameters

L0L2Classifier

  • fit_intercept: Whether an intercept term is included in the logistic regression model.

  • time_limit: Maximum runtime in seconds for the optimization problem.

  • mosek_log: Enables printing of MOSEK solver output.

ResampledL0L2Classifier

  • resampling: Controls whether and how rows, columns, both, or neither are resampled.

  • n_row_subsamples: Number or fraction of observations used during row subsampling.

  • n_column_subsamples: Number or fraction of features used during column subsampling.

  • aggregation: Whether model selection or variable inclusion frequencies are used for aggregation.

  • vif_threshold: Minimum variable inclusion frequency required when using aggregation="VIF".

  • estimator: Alternative base estimator used instead of the default L0L2Classifier.

  • param_grid: Hyperparameter grid used for cross-validation when tuning lambd.

  • cv: Cross-validation strategy used for hyperparameter tuning.

  • scoring: Scoring metric used to select the best hyperparameter configuration.

  • numerical_features: Specifies which DataFrame columns should be treated as numerical features.

  • categorical_features: Specifies which DataFrame columns should be treated as categorical features.

  • fit_intercept: Whether an intercept term is included in the logistic regression model.

  • mosek_time_limit: Maximum runtime in seconds for each individual optimization problem.

  • total_time_limit: Maximum runtime in seconds for the complete resampling procedure.

  • max_consecutive_failures: Stops resampling if too many consecutive model fits fail.

  • mosek_log: Enables printing of MOSEK solver output.

  • n_jobs: Number of parallel workers used during resampling.

  • random_state: Controls the randomness of resampling and cross-validation procedures.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Authors

  • Ricardo Knauer (HTW Berlin)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

l0l2learn-0.1.1.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

l0l2learn-0.1.1-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file l0l2learn-0.1.1.tar.gz.

File metadata

  • Download URL: l0l2learn-0.1.1.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for l0l2learn-0.1.1.tar.gz
Algorithm Hash digest
SHA256 58685f8b06ac27ef236b4bd8198d043fdd777e94fc1a6f617fe963a80a0bd997
MD5 a00ff3fa93724fca3a05370fe3234dab
BLAKE2b-256 b2c1c832504d96e568ef84ca9d57f23a2725af42e77d1a38bafc90c15af03403

See more details on using hashes here.

File details

Details for the file l0l2learn-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: l0l2learn-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for l0l2learn-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0bbf03c491e50b6b55c94e5dfde4394be981f6828025c5e43cddbe50d2870c29
MD5 4dc0418aa1551f5f0667bbb5462008c4
BLAKE2b-256 7bd8edcb60b22e9b5a2c13fa7319c095cad67fdcb56e381e99b3ee567594b9dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page