Skip to main content

This package is an extension of the KernelExplainer of shap package that explains the output of any machine learning model, taking into account dependencies between features.

Project description

Shapley values for correlated features

This package contains an extension of the shap package based on the paper 'Explaining individual predictions when features are dependent: More accurate approximations to Shapley values' that describes methods to more accurately approximate shapley values when features in the dataset are correlated.

Installation

To install the package with pip, simply run

pip install corr_shap

Alternatively, you can download the corr_shap repository and create a conda environment with

conda env create -f environment.yml

Background

SHAP

SHAP (SHapley Additive exPlanations) is a method to explain the output of a machine learning model. It uses Shapley values from game theory to compute the contribution of each input feature to the output of the model. Therefore, it can help users understand the factors influencing a model's decision-making process. Since the computational effort to calculate Shapley values grows exponentially, approximation methods such as Kernel SHAP are needed. See the paper 'A Unified Approach to Interpreting Model Predictions' by Scott M. Lundberg and Su-In Lee for more details on Kernel SHAP or their SHAP git repo for the implementation.

Correlated Explainer

One disadvantage of Kernel SHAP is the fact that it assumes that all features are independent. If there is a high correlation between the features, the results of Kernel SHAP can be inaccurate. Therefore, Kjersti Aas, Martin Jullum and Anders Løland propose an extension of Kernel SHAP in their paper 'Explaining individual predictions when features are dependent: More accurate approximations to Shapley values'. Instead of assuming feature independence, they use either a Gaussian distribution, a Gaussian copula distribution, an empirical conditional distribution, or a combination of the empirical distribution with one of the other two. This can produce more accurate results in case of dependent features.

Their proposed method is implemented in the 'CorrExplainer' class. Based on the chosen sampling strategy, the CorrExplainer uses one of the distributions mentioned above or returns the same result as the Kernel Explainer (while having a faster runtime) in case the 'default' sampling strategy is chosen. In our comparisons (with data sets 'adult', 'linear independent 60' and 'diabetes') the CorrExplainer was between 6 to 19 times faster than the Kernel Explainer. However, in its current implementation it is only suitable for the explanation of tabular data.

Examples

Explaining a single instance

Below is a code example that shows how to use the CorrExplainer to explain a single instance of the 'adult' dataset and display the result in a bar plot.

from sklearn import linear_model
from sklearn.model_selection import train_test_split
import shap
from corr_shap.CorrExplainer import CorrExplainer

# load data
x, y = shap.datasets.adult()

# train model
x_training_data, x_test_data, y_training_data, y_test_data \
    = train_test_split(x, y, test_size=0.2, random_state=0)
model = linear_model.LinearRegression()
model.fit(x_training_data, y_training_data)

# create explanation object with CorrExplainer
explainer = CorrExplainer(model.predict, x_training_data, sampling="default")
explanation = explainer(x_test_data[:1])

shap.plots.bar(explanation)

plot

Explaining full 'adult' dataset

To get a sense, which features are most important in the whole dataset and not just a single instance, the shap values for each feature and each sample can be visualized in the same plot. See example code here.

plot

Credit default data

Another example with a credit default dataset from the rivapy package with high correlation between the features 'income' and 'savings' and a model that ignores the 'savings' feature can be found here.

Bar plot explaining a single instance: plot

Summary plot explaining multiple samples: plot

Further examples can be found in the examples folder.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corr_shap-0.0.1.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

corr_shap-0.0.1-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file corr_shap-0.0.1.tar.gz.

File metadata

  • Download URL: corr_shap-0.0.1.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for corr_shap-0.0.1.tar.gz
Algorithm Hash digest
SHA256 283f00fb26805b802871d464cd1d7c14cc770c2d08bf08d6159ca232dd043034
MD5 273fce281eede801083656014d5c8455
BLAKE2b-256 b18074afb697b944b27e81fb9f22a391101ee2ad50ebea462a9bb9ada85b93da

See more details on using hashes here.

File details

Details for the file corr_shap-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: corr_shap-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for corr_shap-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 41cb95a815e9ca9239cfcbb6d19062edb8419f4803ef27755424169b56e58fb0
MD5 5b2f2e2d946ef25aa6a19768369e98a7
BLAKE2b-256 5510a63044e170156163f123cb74e5a7ac8856e1e90b3d229ac5f3d273166760

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page