Skip to main content

Causal mediation analysis for insurance pricing fairness — decomposes postcode effects into legitimate and potentially discriminatory components

Project description

insurance-mediation

Causal mediation analysis for insurance pricing fairness.

The problem

Your motor insurer prices by postcode. The FCA asks whether your postcode ratings are acting as a proxy for ethnicity or religion. You say no — postcodes reflect genuine risk differences like crime, flood exposure, and socioeconomic deprivation. But how much of your postcode differential is actually explained by those legitimate factors, versus being a direct postcode effect that cannot be justified?

That's a mediation question. You want to decompose:

Postcode → (IMD deprivation) → Claim rate
    ↘_________________________↗
           direct path

The indirect effect (through IMD) is defensible: postcodes with worse deprivation have genuinely different risk profiles and IMD is a legitimate rating factor. The direct effect (postcode on claims, holding IMD fixed) is the part that requires scrutiny — if it's substantial, postcode is doing something that deprivation alone doesn't explain.

This library implements that decomposition with proper causal inference methodology for GLM-based insurance models (Poisson, Gamma, Tweedie) and produces an FCA-ready audit report.

Three estimands, because the methodology choice matters

CDE (Controlled Direct Effect) — default for FCA compliance.

"If we intervened to set everyone's IMD deprivation to the national median, what would the remaining postcode price differential be?"

CDE requires the weakest causal assumptions (no cross-world counterfactuals needed). If CDE ≈ 0, the postcode effect is fully explained by deprivation. If CDE is large, postcode is doing something beyond deprivation.

NDE/NIE (Natural Direct/Indirect Effects) — academic decomposition.

Decomposes the total postcode effect into:

  • NIE: the part that operates through IMD
  • NDE: the part that does not

NDE + NIE = Total Effect. Requires stronger assumptions (sequential ignorability plus no treatment-induced mediator-outcome confounders). More informative but more assumptions to defend.

Total Effect — the overall A → Y causal effect, estimated from the fitted outcome model.

The library is opinionated: CDE is the right default for regulatory work because you can actually defend the assumptions in front of the FCA. NDE/NIE are provided for completeness and academic reporting.

Quick start

from insurance_mediation import MediationAnalysis

ma = MediationAnalysis(
    outcome_model="poisson",   # or "gamma", "tweedie", "gaussian"
    mediator_model="linear",   # or "logistic" for binary mediator (flood zone)
    exposure_col="exposure",   # for Poisson/Tweedie: log(exposure) offset
    n_mc_samples=1000,         # Monte Carlo samples for NDE/NIE
    n_bootstrap=500,           # bootstrap replicates for CIs
)

results = ma.fit(
    data=df,
    treatment="postcode_group",
    mediator="imd_decile",
    outcome="claim_count",
    covariates=["vehicle_age", "driver_age", "cover_type"],
    treatment_value="E1",
    control_value="SW1",
)

# Controlled Direct Effect: what if IMD were equalised to decile 5?
print(results.cde(mediator_level=5))
# EffectEstimate(CDE: +0.0821 [+0.0312, +0.1330] *)

# Natural effects decomposition
print(results.nde())   # EffectEstimate(NDE: +0.0834 [...])
print(results.nie())   # EffectEstimate(NIE: +0.0624 [...])
print(results.total_effect())  # EffectEstimate(TE: +0.1458 [...])

# How much is explained by IMD?
te = results.total_effect().effect
nie = results.nie().effect
print(f"{nie/te:.0%} of the price differential operates through IMD")

# Sensitivity: how strong must unmeasured confounding be to explain away the NIE?
results.sensitivity(rho_range=(-0.5, 0.5))

# FCA-ready HTML report
results.report(
    protected_attribute="ethnicity",
    title="Postcode-IMD Mediation Analysis — Motor 2024",
    output="mediation_report.html"
)

GLM families

The library handles the non-linearity of insurance GLMs correctly. For Poisson, Gamma, and Tweedie models the indirect effect is not computed as a product of regression coefficients (the Baron & Kenny approach) — that only works for linear models. Instead, NDE/NIE are estimated via Monte Carlo integration over the mediator distribution (VanderWeele 2015, extended to GLMs).

Outcome Family Link Use case
"poisson" Poisson log Claim frequency
"gamma" Gamma log Claim severity
"tweedie" Tweedie (var_power=1.5) log Burning cost
"gaussian" Gaussian identity Benchmarking, continuous outcomes

Effects for log-link models are on the log-mean scale (log rate ratios). Use result.nde().ratio to get the multiplicative relativity (e.g., 1.09 = 9% higher claim rate).

Pre-fitted models

You can pass a pre-fitted statsmodels GLM instead of a family string. This is useful when you have a production pricing model with interactions, splines, and offsets that you want to use directly:

import statsmodels.api as sm

# Your production GLM (already fitted)
fitted_glm = sm.GLM(y, X, family=sm.families.Poisson(), offset=offset).fit()

ma = MediationAnalysis(
    outcome_model=fitted_glm,
    mediator_model="linear",
    exposure_col="exposure",
)

Note: bootstrap confidence intervals require the model to be refittable from a formula string. Pre-fitted models without a stored formula will produce point estimates only.

Sensitivity analysis

Sequential ignorability — the key assumption for causal mediation — cannot be verified from data. The sensitivity analysis shows how robust the NIE is to violations of this assumption.

Two tools are provided:

Imai et al. (2010) rho sensitivity: varies the residual correlation between outcome and mediator models (rho). Under sequential ignorability, rho = 0. The analysis reports the rho at which the NIE would cross zero.

E-value (VanderWeele & Ding 2017): the minimum risk ratio of an unmeasured confounder (with both treatment assignment and the outcome) that would be needed to explain away the NIE. E > 2 is generally considered a robust finding.

sens = results.sensitivity(rho_range=(-0.5, 0.5))
print(sens)
# SensitivityResult(E-value=2.847, E-value(CI)=1.923, rho_at_zero=0.312)

FCA report

The HTML report is designed to be attached to a pricing actuarial function report or submitted as evidence for FCA data ethics review. It includes:

  • Causal DAG (SVG, self-contained)
  • Effect decomposition table (CDE, NDE, NIE, TE) with 95% CIs on log and ratio scale
  • Fairness interpretation (percentage of differential explained by mediator)
  • Sensitivity analysis table
  • Identification assumptions (plain English, per estimand)
  • Template Section 19 proportionality statement

No binary dependencies (wkhtmltopdf, headless Chrome). Pure Python / Jinja2.

Causal identification assumptions

Be honest about what this requires:

For CDE (weakest):

  • C1: No unmeasured treatment-outcome confounders given covariates
  • C2: No unmeasured mediator-outcome confounders given (treatment, covariates)
  • C3: No unmeasured treatment-mediator confounders given covariates

C4 (no treatment-induced mediator-outcome confounders) is not required for CDE. This is why CDE is the preferred estimand for regulatory use.

For NDE/NIE (stronger, adds):

  • C4: No treatment-induced mediator-outcome confounders. In practice: no other postcode-level variable (beyond IMD) that also confounds the IMD → claim rate relationship. This is harder to defend when postcodes affect crime, flood, housing density, and all of these correlate with IMD.

If you can't defend C4, report CDE and acknowledge it.

Installation

pip install insurance-mediation

Dependencies: numpy, scipy, pandas, statsmodels, jinja2. No heavy ML deps.

Optional visualisation: pip install insurance-mediation[viz] adds matplotlib and networkx for causal DAG plots.

Scope

v0.1.0: single mediator. Multiple simultaneous mediators (e.g., IMD and flood zone and crime rate) are deferred to v0.2.0 where we'll use interventional indirect effects (VanderWeele & Vansteelandt 2009) which avoid the cross-world counterfactual issues that arise with multiple mediators.

Methodology references

  • VanderWeele, T.J. (2015). Explanation in Causal Inference. Oxford University Press.
  • Imai, K., Keele, L., Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods 15(4): 309–334.
  • VanderWeele, T.J. & Ding, P. (2017). Sensitivity analysis in observational research: introducing the E-value. Annals of Internal Medicine 167(4): 268–274.
  • Robins, J.M. & Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology 3(2): 143–155.
  • Jackson, J.W. (2021). Meaningful causal decompositions in health equity research. Epidemiology 32(2): 230–239.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_mediation-0.1.0.tar.gz (39.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_mediation-0.1.0-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file insurance_mediation-0.1.0.tar.gz.

File metadata

  • Download URL: insurance_mediation-0.1.0.tar.gz
  • Upload date:
  • Size: 39.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_mediation-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a2ae7e0e78c66b700a6057d0d836578e8c818124f5504b4238189b17b3c279c5
MD5 8c04e14418b1bcc5739f281b8470d7f9
BLAKE2b-256 02de2c619b613bd33a585e20e0cc52441caa44cb208476d1917f2aa436f2cce5

See more details on using hashes here.

File details

Details for the file insurance_mediation-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: insurance_mediation-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_mediation-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 050096b86fd3acd1f3bae2d71166abd0a84486b0b3dca7cda7768f7d65e0ac6c
MD5 df00c08bab47ee08d11c33aa0c3b8140
BLAKE2b-256 aa2a1b10a4661f5173ef72b112f7ec6dec1f7f2c9e8774e95402645b6d41e892

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page