Skip to main content

Compute Friedman and Popescu's H statistics, in order to look for interactions among variables in scikit-learn gradient-boosting models.

Project description

This package provides a Python module for computing Friedman and Popescu’s H statistics, in order to look for interactions among variables in scikit-learn gradient-boosting models (http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting).

See Jerome H. Friedman and Bogdan E. Popescu, 2008, “Predictive learning via rule ensembles”, Ann. Appl. Stat. 2:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.

Installation

pip install sklearn-gbmi

On some systems, if you wish to use this package with Python 3, then you must install with pip3 rather than pip.

In case of difficulties with installing or using this package, consult “Advanced installation” below.

Usage

Given a scikit-learn gradient-boosting model gbm that has been fitted to a NumPy array or pandas data frame array_or_frame and a list of indices of columns of the array or columns of the data frame indices_or_columns, the H statistic of the variables represented by the elements of array_or_frame and specified by indices_or_columns can be computed via

from sklearn_gbmi import *

h(gbm, array_or_frame, indices_or_columns)

Alternatively, the two-variable H statistic of each pair of variables represented by the elements of array_or_frame and specified by indices_or_columns can be computed via

from sklearn_gbmi import *

h_all_pairs(gbm, array_or_frame, indices_or_columns)

(Compared to iteratively calling h, calling h_all_pairs avoids redundant computations.)

indices_or_columns is optional, with default value ‘all’. If it is ‘all’, then all columns of array_or_frame are used.

NaN is returned if a computation is spoiled by weak main effects and rounding errors.

H varies from 0 to 1. The larger H, the stronger the evidence for an interaction among the variables.

Example

See the Jupyter notebook example.ipynb (https://github.com/ralphhaygood/sklearn-gbmi/blob/master/example.ipynb) for a complete example of how to use this package.

Notes

1. Per Friedman and Popescu, only variables with strong main effects should be examined for interactions. Strengths of main effects are available as gbm.feature_importances_ once gbm has been fitted.

2. Per Friedman and Popescu, collinearity among variables can lead to interactions in gbm that are not present in the target function. To forestall such spurious interactions, check for strong correlations among variables before fitting gbm.

Advanced installation

Installing this package requires NumPy, so if installation fails with a complaint that NumPy is missing, add it to the install command:

pip install numpy sklearn-gbmi

For performance, this package is partly implemented using Cython (C extensions for Python). It includes a C file that was generated by Cython, which is compiled for your system when you install the package. Normally, this C file is fine, but occasionally, it may not compile, or the result may not run. In the first case, installing the package fails, while in the second case, using the package fails, typically with a cryptic error message; for example:

ValueError: sklearn.tree._criterion.Criterion size changed, may indicate binary incompatibility.

In such a case, you may still be able to install and use the package by regenerating the C file, as follows.

First, if this package is installed (i.e., installation succeeds, but usage fails), uninstall it:

pip uninstall sklearn-gbmi

Then, install Cython:

pip install cython

Next, set the environment variable USE_CYTHONIZE to 1. For bash and similar shells:

export USE_CYTHONIZE=1

For csh and similar shells:

setenv USE_CYTHONIZE 1

Finally, reinstall this package:

pip install sklearn-gbmi –no-cache-dir

The C file should be regenerated and compiled for your system, hopefully making this package usable on your system.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for sklearn-gbmi, version 1.0.3
Filename, size File type Python version Upload date Hashes
Filename, size sklearn-gbmi-1.0.3.zip (142.2 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page