penlm

Penalized Linear Models for Classification and Regression

These details have not been verified by PyPI

Project links

Project description

penlm - Penalized Linear Models

penlm is a Python package that implements a few penalty based linear models that aren't (currently) available in scikit-learn. All the models are implemented with the familiar fit/predict/predict_proba/score interface.

The supported estimators are:

Smoothly Adaptively Centered Ridge (SACR)
Ridge penalty on first/second derivatives (Functional Linear Model)
Adaptive Lasso
Relaxed Lasso
Broken Adaptive Ridge (BAR)
Non-negative Garrote (NNG)

All the estimators have fit_intercept and scale arguments (scaling is done with sklearn.preprocessing.StandardScaler) , and work for the following tasks:

Linear Regression
Binary Classification (Logistic Regression)
Multiclass Classification (One-vs-Rest)

A custom cross-validation object is provided in order to perform grid search hyperparameter tuning (with any splitter from scikit-learn), and uses multiprocessing for parallelization (default n_jobs = -1). Multiclass fitting is not parallelized in this version (that would be beneficial when a high number of cores is available, or when refitting the best estimator in the grid search object).

Installation

The package can be installed from terminal with the command pip install penlm. Some of the estimators in penlm are obtained by directly wrapping scikit-learn classes, while the SACR, FLMs, and NNG are formulated using Pyomo, which in turn needs a solver to interface with. Depending on the estimator, the optimization problems are quadratic with equality and/or inequality constraints, and all the code was tested using the solver Ipopt. You just need to download the executable binary, and then add the folder that contains it to your path (tested on Ubuntu).

Usage

The following lines show how to fit an estimator with its own parameters and grid search object, by using a StratifiedKFold splitter:

import numpy as np
import penlm.grid_search as gs
from penlm.smoothly_adaptively_centered_ridge import SACRClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.datasets import load_wine
from sklearn.metrics import balanced_accuracy_score

X, Y = load_wine(return_X_y = True)

train_index = [i for i in range(100)]
test_index = [i for i in range(100, len(X))]

lambda_list = np.logspace(-5, 5, 10, base = 2)
phi_list = np.linspace(0, 1, 10)[1:]


estimator = SACRClassifier(solver = 'ipopt',
                           scale = True,
                           fit_intercept = True)
parameters = {'phi':phi_list,
              'lambda':lambda_list}
cv = StratifiedKFold(n_splits = 2, 
                     random_state = 46,
                     shuffle = True)              
grid_search = gs.GridSearchCV(estimator,
                              parameters,
                              cv,
                              scoring = balanced_accuracy_score,
                              verbose = False,
                              n_jobs = -1)
grid_search.fit(X[train_index],
                Y[train_index])
score = grid_search.score(X[test_index],
                          Y[test_index])

The test folder in the github repo contains two sample scripts that show how to use all the estimators (in both classification and regression tasks). In particular, for the Adaptive Lasso and the NNG you need to provide an initial coefficient vector as a np.ndarray, with the same shape as the one found in scikit-learn estimators (the test scripts fit a LogisticRegression/Ridge estimator and use the estimator.coef_ object). Moreover, regarding the scoring, all the estimators and the grid search class use accuracy/R^2 as default scores (when the argument scoring = None), but you can provide any Callable scoring function found in sklearn.metrics. Beware that higher is better, and therefore when scoring with errors like sklearn.metrics.mean_squared_error, you need to wrap that in a custom function that changes its sign.

Citing

We encourage the users to cite the original papers of the implemented estimators. In particular, the code published in this package has been used in the case studies of this paper.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Jul 28, 2024

1.0.12

Dec 19, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

penlm-1.1.0.tar.gz (13.4 kB view details)

Uploaded Jul 28, 2024 Source

File details

Details for the file penlm-1.1.0.tar.gz.

File metadata

Download URL: penlm-1.1.0.tar.gz
Upload date: Jul 28, 2024
Size: 13.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for penlm-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8dc9d93b0af4429e39a0897eb7e9fdb0c686cb61f05ef29c0fe50bf561ef53dc`
MD5	`30ec7b65608f8a6491f545eb840816d6`
BLAKE2b-256	`da875e86df3b989678b17400cbf3a13294c6d5a384ee3285576acf9a6321e84c`

See more details on using hashes here.

penlm 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

penlm - Penalized Linear Models

Installation

Usage

Citing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes