Heuristic for quick feature selection for tabular regression/classification using shapley values
Project description
Overview
shap-select
implements a heuristic for fast feature selection, for tabular regression and classification models.
The basic idea is running a linear or logistic regression of the target on the Shapley values of the original features, on the validation set, discarding the features with negative coefficients, and ranking/filtering the rest according to their statistical significance. For motivation and details, see the example notebook
Earlier packages using Shapley values for feature selection exist, the advantages of this one are
- Regression on the validation set to combat overfitting
- Only a single fit of the original model needed
- A single intuitive hyperparameter for feature selection: statistical significance
- Bonferroni correction for multiclass classification
- Address collinearity of (Shapley value) features by repeated (linear/logistic) regression
Usage
from shap_select import shap_select
# Here model is any model supported by the shap library, fitted on a different (train) dataset
# Task can be regression, binary, or multiclass
selected_features_df = shap_select(model, X_val, y_val, task="multiclass", threshold=0.05)
feature name | t-value | stat.significance | coefficient | selected | |
---|---|---|---|---|---|
0 | x5 | 20.211299 | 0.000000 | 1.052030 | 1 |
1 | x4 | 18.315144 | 0.000000 | 0.952416 | 1 |
2 | x3 | 6.835690 | 0.000000 | 1.098154 | 1 |
3 | x2 | 6.457140 | 0.000000 | 1.044842 | 1 |
4 | x1 | 5.530556 | 0.000000 | 0.917242 | 1 |
5 | x6 | 2.390868 | 0.016827 | 1.497983 | 1 |
6 | x7 | 0.901098 | 0.367558 | 2.865508 | 0 |
7 | x8 | 0.563214 | 0.573302 | 1.933632 | 0 |
8 | x9 | -1.607814 | 0.107908 | -4.537098 | -1 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file shap-select-0.1.0.tar.gz
.
File metadata
- Download URL: shap-select-0.1.0.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76f72cb564f60a3422af3dac1432b319e381901bd65c96e062d58ca707f91b6d |
|
MD5 | 3bb40dc1450362572cc254124d897528 |
|
BLAKE2b-256 | ba19c45eee82dfa35673533501330b044007346605026ab917038cbd9702ecc9 |