Skip to main content

Penalized Linear Models for Classification and Regression

Project description

penlm - Penalized Linear Models

penlm is a Python package that implements a few penalty based linear models that aren't (currently) available in scikit-learn. All the models are implemented with the familiar fit/predict/predict_proba/score interface.

The supported estimators are:

  • Smoothly Adaptively Centered Ridge (SACR)
  • Ridge penalty on first/second derivatives (Functional Linear Model)
  • Adaptive Lasso
  • Relaxed Lasso
  • Broken Adaptive Ridge (BAR)
  • Non-negative Garrote (NNG)

All the estimators have fit_intercept and scale arguments (scaling is done with sklearn.preprocessing.StandardScaler) , and work for the following tasks:

  • Linear Regression
  • Binary Classification (Logistic Regression)
  • Multiclass Classification (One-vs-Rest)

A custom cross-validation object is provided in order to perform grid search hyperparameter tuning (with any splitter from scikit-learn), and uses multiprocessing for parallelization (default n_jobs = -1). Multiclass fitting is not parallelized in this version (that would be beneficial when a high number of cores is available, or when refitting the best estimator in the grid search object).

Installation

The package can be installed from terminal with the command pip install penlm. Some of the estimators in penlm are obtained by directly wrapping scikit-learn classes, while the SACR, FLMs, and NNG are formulated using Pyomo, which in turn needs a solver to interface with. Depending on the estimator, the optimization problems are quadratic with equality and/or inequality constraints, and all the code was tested using the solver Ipopt. You just need to download the executable binary, and then add the folder that contains it to your path (tested on Ubuntu).

Usage

The following lines show how to fit an estimator with its own parameters and grid search object, by using a StratifiedKFold splitter:

import numpy as np
import penlm.grid_search as gs
from penlm.smoothly_adaptively_centered_ridge import SACRClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.datasets import load_wine
from sklearn.metrics import balanced_accuracy_score

X, Y = load_wine(return_X_y = True)

train_index = [i for i in range(100)]
test_index = [i for i in range(100, len(X))]

lambda_list = np.logspace(-5, 5, 10, base = 2)
phi_list = np.linspace(0, 1, 10)[1:]


estimator = SACRClassifier(solver = 'ipopt',
                           scale = True,
                           fit_intercept = True)
parameters = {'phi':phi_list,
              'lambda':lambda_list}
cv = StratifiedKFold(n_splits = 2, 
                     random_state = 46,
                     shuffle = True)              
grid_search = gs.GridSearchCV(estimator,
                              parameters,
                              cv,
                              scoring = balanced_accuracy_score,
                              verbose = False,
                              n_jobs = -1)
grid_search.fit(X[train_index],
                Y[train_index])
score = grid_search.score(X[test_index],
                          Y[test_index])

The test folder in the github repo contains two sample scripts that show how to use all the estimators (in both classification and regression tasks). In particular, for the Adaptive Lasso and the NNG you need to provide an initial coefficient vector as a np.ndarray, with the same shape as the one found in scikit-learn estimators (the test scripts fit a LogisticRegression/Ridge estimator and use the estimator.coef_ object). Moreover, regarding the scoring, all the estimators and the grid search class use accuracy/R^2 as default scores (when the argument scoring = None), but you can provide any Callable scoring function found in sklearn.metrics. Beware that higher is better, and therefore when scoring with errors like sklearn.metrics.mean_squared_error, you need to wrap that in a custom function that changes its sign.

Citing

We encourage the users to cite the original papers of the implemented estimators. In particular, the code published in this package has been used in the case studies of this paper.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

penlm-1.1.0.tar.gz (13.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page