Skip to main content

A Python port of R package mshap to interpret combined model outputs.

Project description

mshap

codecov

This is a Python port of srmatth/mshap

The goal of mshap is to allow SHAP values for two-part models to be easily computed. A two-part model is one where the output from one model is multiplied by the output from another model. These are often used in the Actuarial industry, but have other use cases as well.

Installation

Install mSHAP from pypi with the following code:

pip install mshap

Or the development version from github with:

pip install git+https://github.com/Diadochokinetic/mshap

Basic Use

We will demonstrate a simple use case on simulated data. Suppose that we wish to be able to predict to total amount of money a consumer will spend on a subscription to a software product. We might simulate 4 explanatory variables that looks like the following:

import numpy as np

age = np.random.uniform(18, 60, size=1000)
income = np.random.uniform(50000, 150000, size=1000)
married = np.random.randint(0, 2, size=1000)
sex = np.random.randint(0, 2, size=1000)

Now because this is a contrived example, we will knowingly set the response variables as follows (suppose here that cost_per_month is usage based, so as to be continuous):

cost_per_month = (0.0006 * income - 0.2 * sex + 0.5 * married - 0.001 * age) + 10
num_months = 15 * (0.001 * income * 0.001 * sex * 0.5 * married - 0.05 * age) ** 2

Thus, we have our data. We will combine the covariates and target variables into a single data frame for ease of use in python.

import pandas as pd

data = pd.DataFrame(
    {
        "age": age,
        "income": income,
        "married": married,
        "sex": sex,
        "cost_per_month": cost_per_month,
        "num_months": num_months,
    }
)

The end goal of this exercise is to predict the total revenue from the given customer, which mathematically will be cost_per_month * num_months. Instead of multiplying these two vectors together initially, we will instead create two models: one to predict cost_per_month and the other to predict num_months. We can then multiply the output of the two models together to get our predictions.

We now create our two models and predict on the training sets:

from sklearn.ensemble import RandomForestRegressor

X = data[["age", "income", "married", "sex"]]
y1 = data["cost_per_month"]
y2 = data["num_months"]

cpm_mod = RandomForestRegressor(n_estimators=100, max_depth=10, max_features=2)
cpm_mod.fit(X, y1)
# > RandomForestRegressor(max_depth=10, max_features=2)
nm_mod = RandomForestRegressor(n_estimators=100, max_depth=10, max_features=2)
nm_mod.fit(X, y2)
# > RandomForestRegressor(max_depth=10, max_features=2)
cpm_preds = cpm_mod.predict(X)
nm_preds = nm_mod.predict(X)

tot_rev = cpm_preds * nm_preds

We will now proceed to use TreeSHAP and subsequently mSHAP to explain the ultimate model predictions.

import shap

cpm_ex = shap.Explainer(cpm_mod)
cpm_shap = cpm_ex.shap_values(X)
cpm_expected_value = cpm_ex.expected_value

nm_ex = shap.Explainer(nm_mod)
nm_shap = nm_ex.shap_values(X)
nm_expected_value = nm_ex.expected_value
from mshap import Mshap

final_shap = Mshap(
    cpm_shap, nm_shap, cpm_expected_value, nm_expected_value
).shap_values()
final_shap
{'shap_vals':                0            1          2          3
 0   -2876.216193   325.130506  13.474704 -26.475439
 1    1950.301864   200.312921 -11.558773 -64.926704
 2   -2092.259421  -734.279715   7.840975  15.369813
 3    2735.235840 -1642.421894 -11.395891 -63.590990
 4    1971.574419  -878.331239 -20.712473  36.722350
 ..           ...          ...        ...        ...
 995 -1261.220638  1439.860900   2.017464  48.838624
 996  1291.397944  -553.954467 -27.043572 -50.365440
 997  1320.930428  -492.378408 -20.519565 -50.760569
 998  1156.518243  -415.144837  20.484928  59.726275
 999 -3375.016633   732.381880 -33.174228 -86.247622
 
 [1000 rows x 4 columns],
 'expected_value': 4284.231240147299}

You can put the result into a shap Explanation object to use shap plot capabilities:

final_shap_explanation = shap.Explanation(
    values=final_shap["shap_vals"].values,
    base_values=final_shap["expected_value"],
    data=X,
    feature_names=X.columns,
)
shap.summary_plot(final_shap_explanation, X)

png

Citations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mshap-0.2.2.tar.gz (59.3 kB view details)

Uploaded Source

Built Distribution

mshap-0.2.2-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file mshap-0.2.2.tar.gz.

File metadata

  • Download URL: mshap-0.2.2.tar.gz
  • Upload date:
  • Size: 59.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for mshap-0.2.2.tar.gz
Algorithm Hash digest
SHA256 0099a4648f1252b6151207c35f2d42825cd51f23f5e17abd42180efec42a78d2
MD5 76bec81ecb9b7c6fa5010f2722991b69
BLAKE2b-256 93d3939618d69629c1824437622b257f285712dc1a7a97221fbf2e8f902ab559

See more details on using hashes here.

Provenance

File details

Details for the file mshap-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: mshap-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for mshap-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 86c5df69da04d9206384d414df3143824ead8e46f7af6cb65e17649ae628fbde
MD5 a22d64c26ad0a46fc38ce235777227af
BLAKE2b-256 cd73d1268feb852930016b2c3899252893693a373f0dfc134e33cd55937445a8

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page