vimpy: nonparametric variable importance assessment in python
Project description
vimpy: nonparametric variable importance assessment in python
Author: Brian Williamson
Introduction
In predictive modeling applications, it is often of interest to determine the relative contribution of subsets of features in explaining an outcome; this is often called variable importance. It is useful to consider variable importance as a function of the unknown, underlying data-generating mechanism rather than the specific predictive algorithm used to fit the data. This package provides functions that, given fitted values from predictive algorithms, compute nonparametric estimates of deviance- and variance-based variable importance, along with asymptotically valid confidence intervals for the true importance.
Installation
You may install a stable release of vimpy
using pip
by running python pip install vimpy
from a Terminal window. Alternatively, you may install within a virtualenv
environment.
You may install the current dev release of vimpy
by downloading this repository directly.
Issues
If you encounter any bugs or have any specific feature requests, please file an issue.
Example
This example shows how to use vimpy
in a simple setting with simulated data and using a single regression function. For more examples and detailed explanation, please see the R
vignette (to come).
## load required libraries
import numpy as np
import vimpy
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV
## -------------------------------------------------------------
## problem setup
## -------------------------------------------------------------
## define a function for the conditional mean of Y given X
def cond_mean(x = None):
f1 = np.where(np.logical_and(-2 <= x[:, 0], x[:, 0] < 2), np.floor(x[:, 0]), 0)
f2 = np.where(x[:, 1] <= 0, 1, 0)
f3 = np.where(x[:, 2] > 0, 1, 0)
f6 = np.absolute(x[:, 5]/4) ** 3
f7 = np.absolute(x[:, 6]/4) ** 5
f11 = (7./3)*np.cos(x[:, 10]/2)
ret = f1 + f2 + f3 + f6 + f7 + f11
return ret
## create data
np.random.seed(4747)
n = 100
p = 15
s = 1 # importance desired for X_1
x = np.zeros((n, p))
for i in range(0, x.shape[1]) :
x[:,i] = np.random.normal(0, 2, n)
y = cond_mean(x) + np.random.normal(0, 1, n)
## -------------------------------------------------------------
## preliminary step: get regression estimators
## -------------------------------------------------------------
## use grid search to get optimal number of trees and learning rate
ntrees = np.arange(100, 3500, 500)
lr = np.arange(.01, .5, .05)
param_grid = [{'n_estimators':ntrees, 'learning_rate':lr}]
## set up cv objects
cv_full = GridSearchCV(GradientBoostingRegressor(loss = 'ls', max_depth = 1), param_grid = param_grid, cv = 5)
cv_small = GridSearchCV(GradientBoostingRegressor(loss = 'ls', max_depth = 1), param_grid = param_grid, cv = 5)
## fit the full regression
cv_full.fit(x, y)
full_fit = cv_full.best_estimator_.predict(x)
## fit the reduced regression
x_small = np.delete(x, s, 1) # delete the columns in s
cv_small.fit(x_small, full_fit)
small_fit = cv_small.best_estimator_.predict(x_small)
## -------------------------------------------------------------
## get variable importance estimates
## -------------------------------------------------------------
## set up the vimp object
vimp = vimpy.vimp_regression(y, x, full_fit, small_fit, s)
## get the naive estimator
vimp.plugin()
## get the corrected estimator
vimp.update()
vimp.onestep_based_estimator()
## get a standard error
vimp.onestep_based_se()
## get a confidence interval
vimp.get_ci()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vimpy-0.0.5.tar.gz
.
File metadata
- Download URL: vimpy-0.0.5.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f942ec8ce302264a74dde876c17f52bbff06a14fcdb0b745fafe91b472c4aa1 |
|
MD5 | 50bdbd1093f88fe1256f84c01995aa15 |
|
BLAKE2b-256 | 5e4b186aba5e20e133055108d44618699356cddd0cd8d95773c537a92b84c653 |
File details
Details for the file vimpy-0.0.5-py2-none-any.whl
.
File metadata
- Download URL: vimpy-0.0.5-py2-none-any.whl
- Upload date:
- Size: 2.7 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 835023c5c9cfd2423f4dc4b0fdb06733066a8f1e2f2f4e15bddaefd62aabbb07 |
|
MD5 | 75db7c9e963f42cb63c812e2d26dca36 |
|
BLAKE2b-256 | 88417264fcf02ac301b7f31b4b5891ca6e065ca1a71172475b95feb406a4b58a |