Skip to main content
Help us improve Python packaging – donate today!

Compute Friedman and Popescu's H statistics, in order to look for interactions among variables in scikit-learn gradient-boosting models.

Project Description

This distribution provides a Python module for computing Friedman and Popescu’s H statistics, in order to look for interactions among variables in scikit-learn gradient-boosting models (http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting).

See Jerome H. Friedman and Bogdan E. Popescu, 2008, “Predictive learning via rule ensembles”, Ann. Appl. Stat. 2:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.

Installation

pip install sklearn-gbmi

Usage

Given a scikit-learn gradient-boosting model gbm that has been fitted to a NumPy array or pandas data frame array_or_frame and a list of indices of columns of the array or columns of the data frame indices_or_columns, the H statistic of the variables represented by the elements of array_or_frame and specified by indices_or_columns can be computed via

from sklearn_gbmi import *

h(gbm, array_or_frame, indices_or_columns)

Alternatively, the two-variable H statistic of each pair of variables represented by the elements of array_or_frame and specified by indices_or_columns can be computed via

from sklearn_gbmi import *

h_all_pairs(gbm, array_or_frame, indices_or_columns)

(Compared to iteratively calling h, calling h_all_pairs avoids redundant computations.)

indices_or_columns is optional, with default value 'all'. If it is 'all', then all columns of array_or_frame are used.

NaN is returned if a computation is spoiled by weak main effects and rounding errors.

H varies from 0 to 1. The larger H, the stronger the evidence for an interaction among the variables.

Example

See the Jupyter notebook example.ipynb (https://github.com/ralphhaygood/sklearn-gbmi/blob/master/example.ipynb) for a complete example of how to use the module.

Notes

1. Per Friedman and Popescu, only variables with strong main effects should be examined for interactions. Strengths of main effects are available as gbm.feature_importances_ once gbm has been fitted.

2. Per Friedman and Popescu, collinearity among variables can lead to interactions in gbm that are not present in the target function. To forestall such spurious interactions, check for strong correlations among variables before fitting gbm.

Release history Release notifications

This version
History Node

1.0.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
sklearn-gbmi-1.0.0.zip (11.1 kB) Copy SHA256 hash SHA256 Source None Jan 14, 2017

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page