Skip to main content

Composite (spliced) severity regression with covariate-dependent thresholds for insurance pricing

Project description

insurance-composite

Composite (spliced) severity regression with covariate-dependent thresholds.

The problem

Standard severity GLMs fit one distribution across the whole claim range. This is wrong for most lines of business.

Motor bodily injury claims: 90% are soft tissue injuries under £20k. The other 10% are catastrophic injuries that cost £200k–£5m and follow a completely different distribution. Fitting a single Gamma misrepresents both ends.

The right approach is to split at a threshold: a light body distribution below, a heavy tail distribution above. This is called a spliced or composite model. It directly prices the large-loss loading you were previously fudging with a manual factor.

The unsolved problem in Python (until now) is regression. Claim severity depends on covariates — vehicle type, driver age, occupation class. When your model has individual-specific covariates, the natural splice point should vary by policyholder too. A young driver with a high-powered vehicle doesn't belong at the same threshold as a standard commuter.

This library implements composite severity regression where covariates drive the tail scale parameter, which automatically makes the threshold covariate-dependent via mode-matching.

What it does

Three composite models for V1:

Model Body Tail Threshold
LognormalBurrComposite Lognormal Burr XII Mode-matching (data-driven, covariate-dependent)
LognormalGPDComposite Lognormal GPD Fixed or profile likelihood
GammaGPDComposite Gamma GPD Fixed or profile likelihood

Plus CompositeSeverityRegressor, a scikit-learn-compatible wrapper that takes a feature matrix and makes per-policyholder severity predictions.

Why mode-matching with Burr XII, not GPD

GPD is the canonical EVT tail distribution. But mode-matching requires the tail distribution to have a positive finite mode. GPD with xi >= 0 has mode 0 — which covers all insurance heavy-loss scenarios (xi typically 0.1–0.5 for UK lines).

Burr XII has mode = beta * [(alpha-1)/(delta*alpha+1)]^{1/delta} for alpha > 1. This is tractable and positive, making it the natural choice for mode-matching composite models. The covariate-dependent threshold then falls out automatically as each policyholder's Burr scale beta varies with their covariates.

If you try to combine GPD with mode-matching, the library raises ValueError with a clear explanation rather than silently failing.

Installation

pip install insurance-composite

With plotting support:

pip install insurance-composite[plotting]

Quick start

import numpy as np
from insurance_composite import LognormalBurrComposite, CompositeSeverityRegressor

# Fit without covariates (mode-matching finds threshold automatically)
model = LognormalBurrComposite(threshold_method="mode_matching")
model.fit(claim_amounts)

print(f"Threshold: £{model.threshold_:,.0f}")
print(f"Body weight: {model.pi_:.2%} of claims below threshold")
print(model.summary(claim_amounts))

# Value at Risk and TVaR
print(f"99th percentile: £{model.var(0.99):,.0f}")
print(f"TVaR(99%): £{model.tvar(0.99):,.0f}")

# ILF at standard motor BI limits
for lim in [250_000, 500_000, 1_000_000]:
    print(f"ILF({lim:,}): {model.ilf(lim, basic_limit=250_000):.4f}")

model.plot_fit(claim_amounts)
# With covariates (covariate-dependent threshold)
reg = CompositeSeverityRegressor(
    composite=LognormalBurrComposite(threshold_method="mode_matching"),
    feature_cols=["vehicle_age", "driver_age", "region"],
)
reg.fit(X_train, y_train)

# Each policyholder gets their own threshold
thresholds = reg.predict_thresholds(X_test)
print(f"Threshold range: £{thresholds.min():,.0f} – £{thresholds.max():,.0f}")

# ILF schedule per policyholder
ilf = reg.compute_ilf(
    X_test,
    limits=[50_000, 100_000, 250_000, 500_000, 1_000_000],
    basic_limit=250_000,
)
# GPD tail with fixed threshold (when you have a natural attachment point)
from insurance_composite import LognormalGPDComposite

model = LognormalGPDComposite(threshold=100_000.0, threshold_method="fixed")
model.fit(claims_above_deductible)
# Profile likelihood threshold (data-driven, no mode-matching constraint)
from insurance_composite import GammaGPDComposite

model = GammaGPDComposite(threshold_method="profile_likelihood")
model.fit(claim_amounts)
print(f"Selected threshold: £{model.threshold_:,.0f}")

Diagnostics

from insurance_composite.diagnostics import (
    quantile_residuals,
    density_overlay_plot,
    qq_plot,
    mean_excess_plot,
)

# Randomized quantile residuals (Dunn & Smyth 1996)
# Should be N(0,1) under correct specification
resid = quantile_residuals(model, claim_amounts)

# Visual checks
density_overlay_plot(model, claim_amounts)   # fitted density on histogram
qq_plot(model, claim_amounts)               # model vs empirical quantiles
mean_excess_plot(claim_amounts)             # threshold guidance

UK insurance applications

Motor BI: Body captures typical whiplash/minor injury claims (£500–£20k). Tail captures serious injury (£50k–£5m). GPD tail index xi ≈ 0.2–0.4 for UK motor BI.

Employers' Liability: Occupation class enters the tail scale. Different industries have wildly different tail behaviour (asbestos claims vs. office injuries). The composite threshold shifts with occupation.

PI/D&O: Deductible is the natural threshold — set threshold_method='fixed' at the deductible level. GPD above the deductible is standard actuarial practice.

Reinsurance pricing: XL layer (d, d+L] pricing uses model.limited_expected_value(d+L) - model.limited_expected_value(d) directly from the fitted composite.

Comparison to R packages

Capability ReIns evmix insurance-composite
Composite body + tail ME-Pareto Various-GPD LN/Gamma + Burr/GPD
Regression covariates No No Yes
Covariate-dependent threshold No No Yes (mode-matching)
Mode-matching threshold No No Yes (Burr XII)
Profile likelihood threshold No Yes Yes
scikit-learn API No No Yes
Python No No Yes

The R packages handle univariate fitting well. This library's differentiator is regression with covariate-dependent thresholds — none of the R packages do that.

Methodology

Based on:

  • Liu, Li, Shi (2024) — GBII composite regression with varying threshold. Insurance: Mathematics and Economics.
  • Fung et al. (2022) — Lognormal-T composite regression. arXiv:2208.01262.
  • Reynkens et al. (2017) — ME-Pareto splicing with censoring. IME 77:65-77.

The mode-matching approach sets the threshold equal to the tail distribution's mode. This guarantees C1 continuity at the splice point: both density and its derivative are continuous. In practice it also stabilises estimation because the threshold is determined by the shape parameters, not estimated separately.

V2 roadmap

  • Mixed Erlang body (EM algorithm, dense nonparametric body)
  • Soft splicing (Fung-Jeong-Tzougas 2024)
  • Censored data support (Reynkens 2017)
  • Full GBII distribution family

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_composite-0.1.0.tar.gz (37.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_composite-0.1.0-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file insurance_composite-0.1.0.tar.gz.

File metadata

  • Download URL: insurance_composite-0.1.0.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_composite-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1b1f97a0bbb87eba8efbec6ff3e0ee0ba11c48a3d4fe28560bb39db923079790
MD5 63e90252872cc98add133822d512cdf9
BLAKE2b-256 05a82de078d10e66e6344141f582c6634b0e534dd00f8156da5eb6fe28f59a82

See more details on using hashes here.

File details

Details for the file insurance_composite-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: insurance_composite-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_composite-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1ced2f5bbf76d05e43d2ed86a3f699a6119c7745ee5698ba9d2c2f89789860b7
MD5 d4dad5c53bab56d5097b5c45956928a5
BLAKE2b-256 a0dfb807e3a8621481d87c276fa82e940c356faea879e4d80691b6022b544dd8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page