Composite (spliced) severity regression with covariate-dependent thresholds for insurance pricing
Project description
insurance-composite
Composite (spliced) severity regression with covariate-dependent thresholds.
The problem
Standard severity GLMs fit one distribution across the whole claim range. This is wrong for most lines of business.
Motor bodily injury claims: 90% are soft tissue injuries under £20k. The other 10% are catastrophic injuries that cost £200k–£5m and follow a completely different distribution. Fitting a single Gamma misrepresents both ends.
The right approach is to split at a threshold: a light body distribution below, a heavy tail distribution above. This is called a spliced or composite model. It directly prices the large-loss loading you were previously fudging with a manual factor.
The unsolved problem in Python (until now) is regression. Claim severity depends on covariates — vehicle type, driver age, occupation class. When your model has individual-specific covariates, the natural splice point should vary by policyholder too. A young driver with a high-powered vehicle doesn't belong at the same threshold as a standard commuter.
This library implements composite severity regression where covariates drive the tail scale parameter, which automatically makes the threshold covariate-dependent via mode-matching.
What it does
Three composite models for V1:
| Model | Body | Tail | Threshold |
|---|---|---|---|
LognormalBurrComposite |
Lognormal | Burr XII | Mode-matching (data-driven, covariate-dependent) |
LognormalGPDComposite |
Lognormal | GPD | Fixed or profile likelihood |
GammaGPDComposite |
Gamma | GPD | Fixed or profile likelihood |
Plus CompositeSeverityRegressor, a scikit-learn-compatible wrapper that takes a feature matrix and makes per-policyholder severity predictions.
Why mode-matching with Burr XII, not GPD
GPD is the canonical EVT tail distribution. But mode-matching requires the tail distribution to have a positive finite mode. GPD with xi >= 0 has mode 0 — which covers all insurance heavy-loss scenarios (xi typically 0.1–0.5 for UK lines).
Burr XII has mode = beta * [(alpha-1)/(delta*alpha+1)]^{1/delta} for alpha > 1. This is tractable and positive, making it the natural choice for mode-matching composite models. The covariate-dependent threshold then falls out automatically as each policyholder's Burr scale beta varies with their covariates.
If you try to combine GPD with mode-matching, the library raises ValueError with a clear explanation rather than silently failing.
Installation
pip install insurance-composite
With plotting support:
pip install insurance-composite[plotting]
Quick start
import numpy as np
from insurance_composite import LognormalBurrComposite, CompositeSeverityRegressor
# Fit without covariates (mode-matching finds threshold automatically)
model = LognormalBurrComposite(threshold_method="mode_matching")
model.fit(claim_amounts)
print(f"Threshold: £{model.threshold_:,.0f}")
print(f"Body weight: {model.pi_:.2%} of claims below threshold")
print(model.summary(claim_amounts))
# Value at Risk and TVaR
print(f"99th percentile: £{model.var(0.99):,.0f}")
print(f"TVaR(99%): £{model.tvar(0.99):,.0f}")
# ILF at standard motor BI limits
for lim in [250_000, 500_000, 1_000_000]:
print(f"ILF({lim:,}): {model.ilf(lim, basic_limit=250_000):.4f}")
model.plot_fit(claim_amounts)
# With covariates (covariate-dependent threshold)
reg = CompositeSeverityRegressor(
composite=LognormalBurrComposite(threshold_method="mode_matching"),
feature_cols=["vehicle_age", "driver_age", "region"],
)
reg.fit(X_train, y_train)
# Each policyholder gets their own threshold
thresholds = reg.predict_thresholds(X_test)
print(f"Threshold range: £{thresholds.min():,.0f} – £{thresholds.max():,.0f}")
# ILF schedule per policyholder
ilf = reg.compute_ilf(
X_test,
limits=[50_000, 100_000, 250_000, 500_000, 1_000_000],
basic_limit=250_000,
)
# GPD tail with fixed threshold (when you have a natural attachment point)
from insurance_composite import LognormalGPDComposite
model = LognormalGPDComposite(threshold=100_000.0, threshold_method="fixed")
model.fit(claims_above_deductible)
# Profile likelihood threshold (data-driven, no mode-matching constraint)
from insurance_composite import GammaGPDComposite
model = GammaGPDComposite(threshold_method="profile_likelihood")
model.fit(claim_amounts)
print(f"Selected threshold: £{model.threshold_:,.0f}")
Diagnostics
from insurance_composite.diagnostics import (
quantile_residuals,
density_overlay_plot,
qq_plot,
mean_excess_plot,
)
# Randomized quantile residuals (Dunn & Smyth 1996)
# Should be N(0,1) under correct specification
resid = quantile_residuals(model, claim_amounts)
# Visual checks
density_overlay_plot(model, claim_amounts) # fitted density on histogram
qq_plot(model, claim_amounts) # model vs empirical quantiles
mean_excess_plot(claim_amounts) # threshold guidance
UK insurance applications
Motor BI: Body captures typical whiplash/minor injury claims (£500–£20k). Tail captures serious injury (£50k–£5m). GPD tail index xi ≈ 0.2–0.4 for UK motor BI.
Employers' Liability: Occupation class enters the tail scale. Different industries have wildly different tail behaviour (asbestos claims vs. office injuries). The composite threshold shifts with occupation.
PI/D&O: Deductible is the natural threshold — set threshold_method='fixed' at the deductible level. GPD above the deductible is standard actuarial practice.
Reinsurance pricing: XL layer (d, d+L] pricing uses model.limited_expected_value(d+L) - model.limited_expected_value(d) directly from the fitted composite.
Comparison to R packages
| Capability | ReIns | evmix | insurance-composite |
|---|---|---|---|
| Composite body + tail | ME-Pareto | Various-GPD | LN/Gamma + Burr/GPD |
| Regression covariates | No | No | Yes |
| Covariate-dependent threshold | No | No | Yes (mode-matching) |
| Mode-matching threshold | No | No | Yes (Burr XII) |
| Profile likelihood threshold | No | Yes | Yes |
| scikit-learn API | No | No | Yes |
| Python | No | No | Yes |
The R packages handle univariate fitting well. This library's differentiator is regression with covariate-dependent thresholds — none of the R packages do that.
Methodology
Based on:
- Liu, Li, Shi (2024) — GBII composite regression with varying threshold. Insurance: Mathematics and Economics.
- Fung et al. (2022) — Lognormal-T composite regression. arXiv:2208.01262.
- Reynkens et al. (2017) — ME-Pareto splicing with censoring. IME 77:65-77.
The mode-matching approach sets the threshold equal to the tail distribution's mode. This guarantees C1 continuity at the splice point: both density and its derivative are continuous. In practice it also stabilises estimation because the threshold is determined by the shape parameters, not estimated separately.
V2 roadmap
- Mixed Erlang body (EM algorithm, dense nonparametric body)
- Soft splicing (Fung-Jeong-Tzougas 2024)
- Censored data support (Reynkens 2017)
- Full GBII distribution family
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_composite-0.1.0.tar.gz.
File metadata
- Download URL: insurance_composite-0.1.0.tar.gz
- Upload date:
- Size: 37.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b1f97a0bbb87eba8efbec6ff3e0ee0ba11c48a3d4fe28560bb39db923079790
|
|
| MD5 |
63e90252872cc98add133822d512cdf9
|
|
| BLAKE2b-256 |
05a82de078d10e66e6344141f582c6634b0e534dd00f8156da5eb6fe28f59a82
|
File details
Details for the file insurance_composite-0.1.0-py3-none-any.whl.
File metadata
- Download URL: insurance_composite-0.1.0-py3-none-any.whl
- Upload date:
- Size: 29.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ced2f5bbf76d05e43d2ed86a3f699a6119c7745ee5698ba9d2c2f89789860b7
|
|
| MD5 |
d4dad5c53bab56d5097b5c45956928a5
|
|
| BLAKE2b-256 |
a0dfb807e3a8621481d87c276fa82e940c356faea879e4d80691b6022b544dd8
|