Global sensitivity analysis for insurance pricing models — variance decomposition via Shapley effects

These details have not been verified by PyPI

Project links

Project description

insurance-sensitivity

Global sensitivity analysis for insurance pricing models.

The problem

You have a fitted pricing model — a GLM, gradient boosted tree, or anything else with a predict method. You want to know: which rating factors drive the most variance in your premiums?

The naive answer is SHAP. But SHAP decomposes individual predictions, not portfolio-level variance. For a regulatory submission or a fair value assessment, you need a statement like "vehicle group explains 34% of the variance in fitted premiums across our portfolio". That is a different question, and it needs a different tool.

The standard tool for this is Sobol indices — but Sobol first-order indices are only valid under independent inputs. UK motor rating factors are not independent. Driver age correlates with NCD level. Postcode correlates with vehicle type. Sobol S1 indices will over-count the contribution of factors that are correlated with high-importance factors.

Shapley effects (Owen 2014, Song et al. 2016) solve this. They use the same Shapley formula from cooperative game theory, but applied to variance decomposition rather than individual predictions. The effects always sum to V[Y] and are never negative, regardless of correlations.

This library implements Shapley effects with insurance-specific extensions:

Exposure-weighted variance (mid-term policies, partial-year risks)
Categorical rating factors via empirical sampling (no encoding)
CLH subsampling for large portfolios (Rabitti & Tzougas 2025, EAJ)
Fitted-model interface — pass your model, not parameter distributions

Installation

pip install insurance-sensitivity
pip install insurance-sensitivity[plots]   # matplotlib for charts
pip install insurance-sensitivity[polars]  # polars DataFrame input

Quick start

import pandas as pd
from insurance_sensitivity import SensitivityAnalysis

# fitted_glm: any model with a .predict(X) method
# training_df: the data the model was fitted on, with an 'exposure' column

sa = SensitivityAnalysis(
    model=fitted_glm,
    X=training_df,
    exposure_col='exposure',  # year fractions for each policy
    log_scale=True,           # decompose Var[log(fitted)] — right choice
                              # for a multiplicative GLM
    random_state=42,
)

# Shapley effects: correct under correlated inputs
result = sa.shapley(
    n_perms=256,       # more permutations → lower Monte Carlo error
)
print(result)
# ShapleyResult(total_variance=0.1847)
#   vehicle_group: 34.2%
#   ncd_band: 22.1%
#   driver_age: 18.4%
#   area: 11.3%
#   ...

result.plot_bar()  # horizontal bar chart with 95% CIs
result.plot_pie()  # pie chart of % contributions

# Sobol indices: faster, but warns if inputs are correlated
sobol = sa.sobol(n_samples=1024)
sobol.plot_bar()  # S1 and ST side by side

Large portfolios: CLH subsampling

For portfolios with >10k rows, the k-NN step in the Song estimator gets slow. Rabitti & Tzougas (2025) showed that selecting ~2000 representative observations via Conditional Latin Hypercube sampling gives results very close to the full-sample estimate, at a fraction of the cost.

result = sa.shapley(
    n_perms=256,
    n_subsample=2500,  # subsample size (default: use full dataset)
)

Group attributions

If you want attribution at the level of rating factor groups (e.g. all vehicle-related factors as one group, all driver-related factors as another):

groups = {
    'vehicle': ['vehicle_group', 'vehicle_age', 'cc_band'],
    'driver':  ['driver_age', 'ncd_band', 'licence_years'],
    'area':    ['postcode_area', 'garage_type'],
}
result = sa.shapley(n_perms=256, groups=groups)
# effects DataFrame now has rows: vehicle, driver, area

Interaction effects

interactions = sa.interaction_effects()
# Returns a DataFrame comparing phi_j vs S1_j * V[Y].
# High phi_j - S1_j means factor j acts mostly through interactions,
# not in isolation.
print(interactions[['factor', 'phi', 'S1_abs', 'interaction_pct']])

When to use Shapley effects vs Sobol

Use Shapley effects (.shapley()) when:

Your rating factors are correlated (almost always true)
You need the effects to sum to total variance (required for regulatory use)
You want a defensible decomposition for fair value / FCA reporting

Use Sobol indices (.sobol()) when:

You know your inputs are approximately independent
You want a faster, rougher estimate for exploration
You need second-order interaction indices S2(i,j)

The library warns you if you run Sobol on correlated inputs.

Supported model types

The wrapper handles these automatically:

sklearn: any estimator with .predict() or .predict_proba()
statsmodels: GLM results with .predict(exog=X) signature
glum: GeneralizedLinearRegressor with .predict(X)
LightGBM: Booster and sklearn API
XGBoost: Booster and sklearn API
CatBoost: CatBoostRegressor, CatBoostClassifier

For anything else, pass predict_fn='my_method_name'.

References

Owen, A.B. (2014). Sobol' indices and Shapley value. SIAM/ASA Journal on Uncertainty Quantification, 2(1), 245–251.

Song, E., Nelson, B.L. & Staum, J.C. (2016). Shapley effects for global sensitivity analysis: Theory and computation. SIAM/ASA Journal on Uncertainty Quantification, 4(1), 1060–1083.

Biessy, G. (2024). Construction of Rating Systems Using Global Sensitivity Analysis: A Numerical Investigation. ASTIN Bulletin, 54(1), 25–45. DOI: 10.1017/asb.2023.34

Saltelli, A. et al. (2010). Variance based sensitivity analysis of model output. Computer Physics Communications, 181(2), 259–270.

Rabitti, G. & Tzougas, G. (2025). Accelerating the computation of Shapley effects for datasets with many observations. European Actuarial Journal, 15, 885–898. DOI: 10.1007/s13385-025-00412-z

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_sensitivity-0.1.0.tar.gz (145.3 kB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_sensitivity-0.1.0-py3-none-any.whl (24.7 kB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file insurance_sensitivity-0.1.0.tar.gz.

File metadata

Download URL: insurance_sensitivity-0.1.0.tar.gz
Upload date: Mar 10, 2026
Size: 145.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_sensitivity-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2ac17062b474ec0074044e5d8227539b6658d3df98acb0e9dff5b2221c629684`
MD5	`714a113f4cbebcfc1c6c909abfc0df51`
BLAKE2b-256	`333a20d27d9a976ad8ac5de61196f3d223805153d3ab729ed13ddb453abbff53`

See more details on using hashes here.

File details

Details for the file insurance_sensitivity-0.1.0-py3-none-any.whl.

File metadata

Download URL: insurance_sensitivity-0.1.0-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 24.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_sensitivity-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ffbf414ddbf0a6d20742fbbdc5b5b37388624ed653c64049b8d85078d09d8cc`
MD5	`31c8132a8fa19619f28a660ce48736a5`
BLAKE2b-256	`bcd94b69c1c81e7063be51a803db98d0a1c085e1a9660c3ff76008de1674ba33`

See more details on using hashes here.

insurance-sensitivity 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-sensitivity

The problem

Installation

Quick start

Large portfolios: CLH subsampling

Group attributions

Interaction effects

When to use Shapley effects vs Sobol

Supported model types

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes