vimpy: nonparametric variable importance assessment in python

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

vimpy: nonparametric variable importance assessment in python

Author: Brian Williamson

Introduction

In predictive modeling applications, it is often of interest to determine the relative contribution of subsets of features in explaining an outcome; this is often called variable importance. It is useful to consider variable importance as a function of the unknown, underlying data-generating mechanism rather than the specific predictive algorithm used to fit the data. This package provides functions that, given fitted values from predictive algorithms, compute nonparametric estimates of deviance- and variance-based variable importance, along with asymptotically valid confidence intervals for the true importance.

Installation

You may install a stable release of vimpy using pip by running python pip install vimpy from a Terminal window. Alternatively, you may install within a virtualenv environment.

You may install the current dev release of vimpy by downloading this repository directly.

Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

Example

This example shows how to use vimpy in a simple setting with simulated data and using a single regression function. For more examples and detailed explanation, please see the R vignette (to come).

## load required libraries
import numpy as np
import vimpy
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV

## -------------------------------------------------------------
## problem setup
## -------------------------------------------------------------
## define a function for the conditional mean of Y given X
def cond_mean(x = None):
    f1 = np.where(np.logical_and(-2 <= x[:, 0], x[:, 0] < 2), np.floor(x[:, 0]), 0) 
    f2 = np.where(x[:, 1] <= 0, 1, 0)
    f3 = np.where(x[:, 2] > 0, 1, 0)

    f6 = np.absolute(x[:, 5]/4) ** 3
    f7 = np.absolute(x[:, 6]/4) ** 5

    f11 = (7./3)*np.cos(x[:, 10]/2)

    ret = f1 + f2 + f3 + f6 + f7 + f11

    return ret

## create data
np.random.seed(4747)
n = 100
p = 15
s = 1 # importance desired for X_1
x = np.zeros((n, p))
for i in range(0, x.shape[1]) :
    x[:,i] = np.random.normal(0, 2, n)

y = cond_mean(x) + np.random.normal(0, 1, n)

## -------------------------------------------------------------
## preliminary step: get regression estimators
## -------------------------------------------------------------
## use grid search to get optimal number of trees and learning rate
ntrees = np.arange(100, 3500, 500)
lr = np.arange(.01, .5, .05)

param_grid = [{'n_estimators':ntrees, 'learning_rate':lr}]

## set up cv objects
cv_full = GridSearchCV(GradientBoostingRegressor(loss = 'ls', max_depth = 1), param_grid = param_grid, cv = 5)
cv_small = GridSearchCV(GradientBoostingRegressor(loss = 'ls', max_depth = 1), param_grid = param_grid, cv = 5)

## fit the full regression
cv_full.fit(x, y)
full_fit = cv_full.best_estimator_.predict(x)

## fit the reduced regression
x_small = np.delete(x, s, 1) # delete the columns in s
cv_small.fit(x_small, full_fit)
small_fit = cv_small.best_estimator_.predict(x_small)

## -------------------------------------------------------------
## get variable importance estimates
## -------------------------------------------------------------
## set up the vimp object
vimp = vimpy.vimp_regression(y, x, full_fit, small_fit, s)
## get the naive estimator
vimp.plugin()
## get the corrected estimator
vimp.update()
vimp.onestep_based_estimator()
## get a standard error
vimp.onestep_based_se()
## get a confidence interval
vimp.get_ci()

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

2.1

Jun 18, 2020

2.0.2.2

Jun 3, 2020

2.0.2.1

Jun 2, 2020

2.0.2

Jun 1, 2020

2.0.1

Apr 9, 2020

1.0.0

Oct 28, 2018

0.0.10

Jun 20, 2018

This version

0.0.9

Jun 20, 2018

0.0.5

Jun 19, 2018

0.0.4

Jun 15, 2018

0.0.3

Jun 15, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vimpy-0.0.9.tar.gz (4.3 kB view details)

Uploaded Jun 20, 2018 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vimpy-0.0.9-py2-none-any.whl (5.3 kB view details)

Uploaded Jun 20, 2018 Python 2

File details

Details for the file vimpy-0.0.9.tar.gz.

File metadata

Download URL: vimpy-0.0.9.tar.gz
Upload date: Jun 20, 2018
Size: 4.3 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for vimpy-0.0.9.tar.gz
Algorithm	Hash digest
SHA256	`6cfd437c7584e01c2cfdd2424aea9aa687b32b2d261cde2be4d6753003ec1e37`
MD5	`a92e7d7def571e17c6afafe2a0331f18`
BLAKE2b-256	`d3529acc45faad3cd65aaf3a27ecc6c4d58eea67917f004b2237b090f3dcb451`

See more details on using hashes here.

File details

Details for the file vimpy-0.0.9-py2-none-any.whl.

File metadata

Download URL: vimpy-0.0.9-py2-none-any.whl
Upload date: Jun 20, 2018
Size: 5.3 kB
Tags: Python 2
Uploaded using Trusted Publishing? No

File hashes

Hashes for vimpy-0.0.9-py2-none-any.whl
Algorithm	Hash digest
SHA256	`028004c63eacbd78145f04b4e0825fd66bd269944461db65002933c408250442`
MD5	`0078e489c0f93087b45939862795f24a`
BLAKE2b-256	`ba2562bd194731a06bbeff8762823e93c20c120920ea2075d87f348a46a3b2c3`

See more details on using hashes here.

vimpy 0.0.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

vimpy: nonparametric variable importance assessment in python

Introduction

Installation

Issues

Example

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes