Skip to main content

Machine learning framework built on second-order optimization

Project description

peak-engines

PyPI version License: CC BY 4.0 API Reference

peak-engines is machine learning framework that focuses on applying advanced optimization algorithms to build better models.

Installation

pip install peak-engines

Fit Logistic Regression Hyperparameters

The leave-one-out cross-validation of logistic regression can be efficiently approximated. At a high level, this is how it works: For given hyperparameters C, we

  1. Find the parameters b that optimize logistic regression for the given C.
  2. For each data index i, we compute the hessian H_{-i} and gradient g_{-i} of the log-likelihood with the ith data entry removed. (We can reuse the hessian computed from (1) to do this with a minimal amount of work.)
  3. We apply the Matrix Inversion Lemma to efficiently compute the inverse H_{-i}^{-1}.
  4. We use H_{-i}^{-1} and g_{-i} to take a single step of Newton's method to approximate the logistic regression coefficients with the ith entry removed b_{-i}.
  5. And finally, we used the b_{-i}'s to approximate the out-of-sample predictions and estimate the leave-one-out cross-validation.

See the paper A scalable estimate of the out-of-sample prediction error via approximate leave-one-out by Kamiar Rad and Arian Maleki for more details.

We can, furthermore, differentiate the Approximate Leave-One-Out metric with respect to the hyperparameters and quickly climb to the best performing C. Here's how to do it with peak-engines:

Load an example dataset

from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
X, y = load_breast_cancer(return_X_y=True)
X = StandardScaler().fit_transform(X)

Find the best performing C

model = peak_engines.LogisticRegressionModel()
model.fit(X, y)
print('C =', model.C_[0])

prints

C = 0.66474879

If we compute the LOOCV by brute force and compare to the ALOOCV, we can see how accurate the approximation is

alt text

Fit Ridge Regression Hyperparameters

By expressing cross-validation as an optimization objective and computing derivatives, peak-engines is able to efficiently find regularization parameters that lead to the best score on a leave-one-out or generalized cross-validation. It, futhermore, scales to handle multiple regularizers. Here's an example of how it works

import numpy as np
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
from peak_engines import RidgeRegressionModel
model = RidgeRegressionModel(normalize=True)
# Fit will automatically find the best alpha that minimizes the Leave-one-out Cross-validation.
# when you call fit. There's no need to provide a search space because peak_engines optimizes the
# LOOCV directly. It the computes derivatives of the LOOCV with respect to the hyperparameters and
# is able to quickly zero in on the best alpha.
model.fit(X, y)
print('alpha =', model.alpha_)

prints

alpha = 0.009274259071634289

Fit Warped Linear Regression

Let X and y denote the feature matrix and target vector of a regression dataset. Under the assumption of normally distributed errors, Ordinary Least Squares (OLS) finds the linear model that maximizes the likelihood of the dataset

What happens when errors aren't normally distributed? Well, the model will be misspecified and there's no reason to think its likelihood predictions will be accurate. This is where Warped Linear Regression can help. It introduces an extra step to OLS where it transforms the target vector using a malleable, monotonic function f parameterized by ψ and adjusts the parameters to maximize the likelihood of the transformed dataset

By introducing the additional transformation step, Warped Linear Regression is more general-purpose than OLS while still retaining the strong structure and interpretability. Here's how you use it

Load an example dataset

from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)

Fit a warped linear regression model

import peak_engines
model = peak_engines.WarpedLinearRegressionModel()
model.fit(X_train, y_train)

Visualize the warping function

import numpy as np
import matplotlib.pyplot as plt
y_range = np.arange(np.min(y), np.max(y), 0.01)
z = model.warper_.compute_latent(y_range)
plt.plot(y_range, z)
plt.xlabel('Median Housing Value in $1000s')
plt.ylabel('Latent Variable')
plt.scatter(y, model.warper_.compute_latent(y))

alt text

Tutorials

Articles

Examples

Documentation

See doc/Reference.pdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

peak_engines-0.2.8-cp32-abi3-manylinux1_x86_64.whl (16.3 MB view hashes)

Uploaded CPython 3.2+

peak_engines-0.2.8-cp32-abi3-macosx_10_9_intel.whl (16.9 MB view hashes)

Uploaded CPython 3.2+ macOS 10.9+ intel

peak_engines-0.2.8-cp27-cp27mu-manylinux1_x86_64.whl (16.3 MB view hashes)

Uploaded CPython 2.7mu

peak_engines-0.2.8-cp27-cp27mu-macosx_10_9_intel.whl (16.9 MB view hashes)

Uploaded CPython 2.7mu macOS 10.9+ intel

peak_engines-0.2.8-cp27-cp27m-manylinux1_x86_64.whl (16.3 MB view hashes)

Uploaded CPython 2.7m

peak_engines-0.2.8-cp27-cp27m-macosx_10_9_intel.whl (16.9 MB view hashes)

Uploaded CPython 2.7m macOS 10.9+ intel

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page