Skip to main content

Global sensitivity analysis for insurance pricing models — variance decomposition via Shapley effects

Project description

insurance-sensitivity

Global sensitivity analysis for insurance pricing models.

The problem

You have a fitted pricing model — a GLM, gradient boosted tree, or anything else with a predict method. You want to know: which rating factors drive the most variance in your premiums?

The naive answer is SHAP. But SHAP decomposes individual predictions, not portfolio-level variance. For a regulatory submission or a fair value assessment, you need a statement like "vehicle group explains 34% of the variance in fitted premiums across our portfolio". That is a different question, and it needs a different tool.

The standard tool for this is Sobol indices — but Sobol first-order indices are only valid under independent inputs. UK motor rating factors are not independent. Driver age correlates with NCD level. Postcode correlates with vehicle type. Sobol S1 indices will over-count the contribution of factors that are correlated with high-importance factors.

Shapley effects (Owen 2014, Song et al. 2016) solve this. They use the same Shapley formula from cooperative game theory, but applied to variance decomposition rather than individual predictions. The effects always sum to V[Y] and are never negative, regardless of correlations.

This library implements Shapley effects with insurance-specific extensions:

  • Exposure-weighted variance (mid-term policies, partial-year risks)
  • Categorical rating factors via empirical sampling (no encoding)
  • CLH subsampling for large portfolios (Rabitti & Tzougas 2025, EAJ)
  • Fitted-model interface — pass your model, not parameter distributions

Installation

pip install insurance-sensitivity
pip install insurance-sensitivity[plots]   # matplotlib for charts
pip install insurance-sensitivity[polars]  # polars DataFrame input

Quick start

import pandas as pd
from insurance_sensitivity import SensitivityAnalysis

# fitted_glm: any model with a .predict(X) method
# training_df: the data the model was fitted on, with an 'exposure' column

sa = SensitivityAnalysis(
    model=fitted_glm,
    X=training_df,
    exposure_col='exposure',  # year fractions for each policy
    log_scale=True,           # decompose Var[log(fitted)] — right choice
                              # for a multiplicative GLM
    random_state=42,
)

# Shapley effects: correct under correlated inputs
result = sa.shapley(
    n_perms=256,       # more permutations → lower Monte Carlo error
)
print(result)
# ShapleyResult(total_variance=0.1847)
#   vehicle_group: 34.2%
#   ncd_band: 22.1%
#   driver_age: 18.4%
#   area: 11.3%
#   ...

result.plot_bar()  # horizontal bar chart with 95% CIs
result.plot_pie()  # pie chart of % contributions

# Sobol indices: faster, but warns if inputs are correlated
sobol = sa.sobol(n_samples=1024)
sobol.plot_bar()  # S1 and ST side by side

Large portfolios: CLH subsampling

For portfolios with >10k rows, the k-NN step in the Song estimator gets slow. Rabitti & Tzougas (2025) showed that selecting ~2000 representative observations via Conditional Latin Hypercube sampling gives results very close to the full-sample estimate, at a fraction of the cost.

result = sa.shapley(
    n_perms=256,
    n_subsample=2500,  # subsample size (default: use full dataset)
)

Group attributions

If you want attribution at the level of rating factor groups (e.g. all vehicle-related factors as one group, all driver-related factors as another):

groups = {
    'vehicle': ['vehicle_group', 'vehicle_age', 'cc_band'],
    'driver':  ['driver_age', 'ncd_band', 'licence_years'],
    'area':    ['postcode_area', 'garage_type'],
}
result = sa.shapley(n_perms=256, groups=groups)
# effects DataFrame now has rows: vehicle, driver, area

Interaction effects

interactions = sa.interaction_effects()
# Returns a DataFrame comparing phi_j vs S1_j * V[Y].
# High phi_j - S1_j means factor j acts mostly through interactions,
# not in isolation.
print(interactions[['factor', 'phi', 'S1_abs', 'interaction_pct']])

When to use Shapley effects vs Sobol

Use Shapley effects (.shapley()) when:

  • Your rating factors are correlated (almost always true)
  • You need the effects to sum to total variance (required for regulatory use)
  • You want a defensible decomposition for fair value / FCA reporting

Use Sobol indices (.sobol()) when:

  • You know your inputs are approximately independent
  • You want a faster, rougher estimate for exploration
  • You need second-order interaction indices S2(i,j)

The library warns you if you run Sobol on correlated inputs.

Supported model types

The wrapper handles these automatically:

  • sklearn: any estimator with .predict() or .predict_proba()
  • statsmodels: GLM results with .predict(exog=X) signature
  • glum: GeneralizedLinearRegressor with .predict(X)
  • LightGBM: Booster and sklearn API
  • XGBoost: Booster and sklearn API
  • CatBoost: CatBoostRegressor, CatBoostClassifier

For anything else, pass predict_fn='my_method_name'.

References

Owen, A.B. (2014). Sobol' indices and Shapley value. SIAM/ASA Journal on Uncertainty Quantification, 2(1), 245–251.

Song, E., Nelson, B.L. & Staum, J.C. (2016). Shapley effects for global sensitivity analysis: Theory and computation. SIAM/ASA Journal on Uncertainty Quantification, 4(1), 1060–1083.

Biessy, G. (2024). Construction of Rating Systems Using Global Sensitivity Analysis: A Numerical Investigation. ASTIN Bulletin, 54(1), 25–45. DOI: 10.1017/asb.2023.34

Saltelli, A. et al. (2010). Variance based sensitivity analysis of model output. Computer Physics Communications, 181(2), 259–270.

Rabitti, G. & Tzougas, G. (2025). Accelerating the computation of Shapley effects for datasets with many observations. European Actuarial Journal, 15, 885–898. DOI: 10.1007/s13385-025-00412-z

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_sensitivity-0.1.0.tar.gz (145.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_sensitivity-0.1.0-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file insurance_sensitivity-0.1.0.tar.gz.

File metadata

  • Download URL: insurance_sensitivity-0.1.0.tar.gz
  • Upload date:
  • Size: 145.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_sensitivity-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2ac17062b474ec0074044e5d8227539b6658d3df98acb0e9dff5b2221c629684
MD5 714a113f4cbebcfc1c6c909abfc0df51
BLAKE2b-256 333a20d27d9a976ad8ac5de61196f3d223805153d3ab729ed13ddb453abbff53

See more details on using hashes here.

File details

Details for the file insurance_sensitivity-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: insurance_sensitivity-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_sensitivity-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ffbf414ddbf0a6d20742fbbdc5b5b37388624ed653c64049b8d85078d09d8cc
MD5 31c8132a8fa19619f28a660ce48736a5
BLAKE2b-256 bcd94b69c1c81e7063be51a803db98d0a1c085e1a9660c3ff76008de1674ba33

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page