Skip to main content

Whittaker-Henderson smoothing for insurance pricing and actuarial rating tables: 1D and 2D graduation with automatic lambda selection via cross-validation.

Project description

insurance-whittaker

PyPI Downloads Python Tests License Open In Colab nbviewer

Whittaker-Henderson smoothing for UK insurance rating tables — automatic lambda selection via REML, Bayesian credible intervals, and Poisson-correct claim frequency fitting. The actuarial standard for graduation, now in Python.

Blog post: Whittaker-Henderson Smoothing for Insurance Pricing


The problem

Raw loss ratios by age band are noisy. Age 47 looks cheaper than age 46 for no reason other than random variation in thin cells, and a 5-point moving average applied at the boundaries undershoots the young-driver peak you actually want to charge for. Smoothing by hand introduces bias; smoothing without a principled method for choosing how much to smooth leaves you defending an arbitrary decision.

Whittaker-Henderson is the actuarial standard for this — a penalised least-squares smoother with a mathematically principled way to select the smoothing parameter. Every UK pricing team does this. Until now, most did it in Excel or SAS.

Regulatory context: IFRS 17 and Solvency II internal model reviews increasingly ask pricing teams to document the smoothing methodology and its uncertainty. A point estimate from a moving average does not answer that question. Bayesian credible intervals do.


Installation

pip install insurance-whittaker
# or
uv add insurance-whittaker

For plots:

pip install "insurance-whittaker[plot]"

Quick start

import numpy as np
from insurance_whittaker import WhittakerHenderson1D

ages = np.arange(17, 80)
exposures = 500 * np.exp(-0.5 * ((ages - 40) / 18) ** 2) + 50
true_lr = 0.35 * np.exp(-0.05 * (ages - 17)) + 0.06
loss_ratios = true_lr + np.random.default_rng(42).normal(0, np.sqrt(true_lr / exposures))

wh = WhittakerHenderson1D(order=2, lambda_method="reml")
result = wh.fit(ages, loss_ratios, weights=exposures)

print(result.lambda_)   # e.g. 847.3 — selected automatically via REML
print(result.edf)       # effective degrees of freedom — e.g. 5.2

df = result.to_polars()  # columns: x, y, weight, fitted, ci_lower, ci_upper, std_fitted
result.plot()            # requires insurance-whittaker[plot]

See examples/ for 2-D smoothing (age × vehicle group), Poisson count fitting, and NCD scale graduation.


Why this library?

No production-quality Python implementation of Whittaker-Henderson existed. The R package (WH from CRAN) is the reference, but the Python actuarial ecosystem had nothing equivalent. This library ports the methodology, validated against the published R results.

The key design choice is automatic lambda selection via REML (restricted marginal likelihood). REML has a unique, well-defined optimum, avoids the overfitting tendency of GCV, and follows Biessy (2026, ASTIN Bulletin) — the current actuarial reference on the topic. You do not tune a smoothing parameter; the data tells you how smooth the curve should be.


Compared to alternatives

Manual / Excel scipy splines R WH package insurance-whittaker
Automatic lambda selection No Partial (CV) Yes (REML) Yes (REML)
Bayesian credible intervals No No No Yes
2-D cross-tables Tedious No Yes Yes
Poisson count extension No No No Yes
Polars output No No No Yes
Python-native No Yes No Yes

What it does

1-D smoothing (WhittakerHenderson1D) — age curves, NCD scales, vehicle group factors, bonus-malus scales. Pass raw observations and exposures; get back a smooth curve with credible intervals.

2-D smoothing (WhittakerHenderson2D) — cross-tables (age × vehicle group, age × claim-free years). Same framework, same API, one penalty per dimension.

Poisson extension (WhittakerHendersonPoisson) — smooth claim frequencies directly from count data and exposures, not from derived loss ratios that carry additional noise from thin cells.

Automatic lambda selection — REML (recommended), GCV, AIC, or BIC. REML has a unique optimum and no local minima.

Bayesian credible intervals — posterior uncertainty bands on all smoothed values. They widen correctly in thin-data regions: wide at age 17 and age 80, narrow in the high-exposure core. This is essential for a pricing team that needs to know where the curve is reliable.


Mathematical basis

The smoother minimises:

sum_i w_i (y_i - theta_i)^2 + lambda * ||D^q theta||^2

where w_i are exposures, D^q is the q-th order difference operator, and lambda controls the smoothness penalty. The solution is:

theta_hat = (W + lambda D'D)^{-1} W y

Solved via Cholesky factorisation — under 0.1 seconds for a 64-band rating curve. Lambda is selected by maximising the restricted marginal likelihood (REML).

Reference: Biessy (2026), Whittaker-Henderson Smoothing Revisited, ASTIN Bulletin. arXiv:2306.06932.


Validated performance

On a synthetic driver age curve (63 bands, ages 17–79) with known true shape and realistic exposure distribution, benchmarked against a 5-point moving average and raw rates:

Method RMSE vs true Thin-tail bands Boundary behaviour
Raw rates Highest Worst n/a
5-point moving average ~55% of raw Moderate Pulls toward centre
Whittaker-Henderson (REML) Lowest Best Automatic

REML selects lambda within 10% of the oracle (ground-truth-optimal) value. 95% credible intervals achieve at least 90% coverage on held-out bands, including the thin-tail ages.

In the well-observed middle of the curve (ages 30–55), a 5-point moving average and W-H produce nearly identical results. The gap is driven by the tails. If your rating table covers only well-observed ages, a moving average is fine. If young and old drivers are in scope — they always are in UK motor — W-H earns its keep.

Full validation notebook: notebooks/databricks_validation.py.


2-D example

import numpy as np
from insurance_whittaker import WhittakerHenderson2D

# 10 age bands x 5 vehicle groups
true_lr = (0.08 + 0.15 * np.exp(-0.04 * np.linspace(20, 65, 10)))[:, None] \
        + 0.03 * np.arange(1, 6)[None, :]
exposures = np.outer([80, 200, 350, 450, 500, 480, 420, 300, 180, 90],
                     [300, 250, 200, 150, 100])
y = np.clip(true_lr + np.random.default_rng(0).normal(0, np.sqrt(true_lr / exposures)), 0.01, None)

wh = WhittakerHenderson2D(order_x=2, order_z=2)
result = wh.fit(y, weights=exposures)

result.fitted    # smoothed table, same shape as y
result.lambda_x  # smoothing parameter in age direction
result.lambda_z  # smoothing parameter in vehicle direction
df = result.to_polars()  # long format: x, z, fitted, ci_lower, ci_upper, std_fitted

Poisson example

import numpy as np
from insurance_whittaker import WhittakerHendersonPoisson

ages = np.arange(17, 80)
true_rate = 0.28 * np.exp(-0.04 * (ages - 17)) + 0.04
policy_years = (800 * np.exp(-0.5 * ((ages - 38) / 16) ** 2) + 80).astype(float)
claim_counts = np.random.default_rng(42).poisson(true_rate * policy_years).astype(float)

wh = WhittakerHendersonPoisson(order=2)
result = wh.fit(ages, counts=claim_counts, exposure=policy_years)

result.fitted_rate    # smoothed claim rate per policy year
result.fitted_count   # smoothed expected claims
result.ci_lower_rate  # 95% CI on rate scale (always positive)

Lambda selection

Method Description
'reml' Restricted marginal likelihood — recommended. Unique optimum, no local minima.
'gcv' Generalised cross-validation — faster, can overfit on small datasets.
'aic' AIC — penalises effective degrees of freedom.
'bic' BIC — stronger penalty, often over-smooths relative to AIC.

REML is the default and is strongly preferred for actuarial applications. See Biessy (2026) for the simulation evidence.


Limitations

  • W-H is a smoother, not a shape constraint. It does not enforce monotonicity. If you want a monotone NCD curve, apply a post-fit isotonic regression pass or set a minimum lambda.
  • 2-D smoothing penalises each dimension independently. Cross-effects where optimal smoothness in age depends on vehicle group are not captured. Fit the interaction explicitly in your GLM.
  • REML can fail on degenerate data. With fewer than 8–10 observations per dimension, the REML objective can be flat. The optimiser will warn, but inspect the curve visually on short tables.
  • Credible intervals fix lambda at its REML estimate. They do not account for uncertainty in lambda itself, so they are slightly too narrow — most visibly at boundaries.

References

  1. Whittaker, E.T. (1923). "On a New Method of Graduation." Proceedings of the Edinburgh Mathematical Society, 41, 63–75.
  2. Henderson, R. (1924). "A New Method of Graduation." Transactions of the Actuarial Society of America, 25, 29–40.
  3. Eilers, P.H.C. & Marx, B.D. (1996). "Flexible Smoothing with B-splines and Penalties." Statistical Science, 11(2), 89–121. doi:10.1214/ss/1038425655
  4. Lee, W.C. & Fung, W.K. (2001). "Graduation by the Whittaker-Henderson method with application to the actuarial table of illness." Statistics in Medicine, 20(19), 2945–2962.
  5. Kimeldorf, G.S. & Wahba, G. (1970). "A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines." The Annals of Mathematical Statistics, 41(2), 495–502. doi:10.1214/aoms/1177697089
  6. Biessy, G. (2026). "Whittaker-Henderson Smoothing Revisited." ASTIN Bulletin. arXiv:2306.06932

Part of the Burning Cost toolkit

Takes raw exposure and loss data from claims triangles or rating factor summaries. Feeds smoothed curves into insurance-gam (as input features) and insurance-credibility (as prior means for Bühlmann-Straub).

Library Description
insurance-gam Interpretable GAMs — the natural next step once your rating tables are smoothed
insurance-credibility Bühlmann-Straub credibility — blends smoothed table estimates with portfolio experience
insurance-monitoring Model drift detection — monitors whether smoothed curves remain calibrated in production
insurance-governance Model validation and MRM governance — produces the sign-off pack for rating tables

Part of the Burning Cost open-source insurance analytics toolkit. → See all libraries


Community

Found it useful? A GitHub star helps others find it.

Licence

BSD-3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_whittaker-0.1.5.tar.gz (190.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_whittaker-0.1.5-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file insurance_whittaker-0.1.5.tar.gz.

File metadata

  • Download URL: insurance_whittaker-0.1.5.tar.gz
  • Upload date:
  • Size: 190.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_whittaker-0.1.5.tar.gz
Algorithm Hash digest
SHA256 080300a33e9702cca223e51911651de9137660e718eb4780000cf574f6600ed0
MD5 c6d6ec615331e4b0bf495f52e02df75e
BLAKE2b-256 6cc8d29596186089be54d6b70fc5f222ef68a3a5f7d8ef2738031548edfc8d56

See more details on using hashes here.

File details

Details for the file insurance_whittaker-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: insurance_whittaker-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_whittaker-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 4fd3db90f24e81df0e8acf979070fc9f092bbc0d72d3160d4c4606bef9bc64e9
MD5 cc737f471abe62ae64f1c6a534277cac
BLAKE2b-256 8560b298b9deea257664a36302e2e10f832af175f815febec94e269ac4c84aed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page