Conformal prediction intervals for insurance claims regression — Hong order-statistic shortcut, Tweedie nonconformity scores, and Solvency II SCR reporting
Project description
insurance-conformal-claims
GLM prediction intervals are wrong. Not "a bit off" — structurally wrong. A log-normal GLM fitted to Pareto-distributed personal injury claims achieves 57.8% coverage when the nominal level is 99.5%. This is from Table 2 of Hong (2025), and it means that a Solvency II capital model built on GLM intervals is invalid.
Conformal prediction fixes this without requiring you to know the true distribution. This library implements the three most useful conformal methods for insurance claims regression, plus a direct interface to the Solvency II 99.5% capital requirement.
What this is
Three methods from two peer-reviewed papers, implemented in Python for the first time:
Hong's order-statistic shortcut (arXiv:2503.03659, 2601.21153): Full conformal prediction that reduces to sorting a single array. No model required. Finite-sample valid coverage for any exchangeable distribution. O(n log n) — not a grid search.
Tweedie nonconformity scores (Manna et al. 2025, ASMBI asmb.70045): Pearson, deviance, and Anscombe residuals using the Tweedie variance function V(mu) = mu^p. Implemented as split conformal with a two-stage locally weighted approach that auto-fits a CatBoost spread model.
SCR reporting: Thin wrapper that extracts the 99.5% upper bound and formats it as a Solvency II / UK Solvency UK report.
None of these have a Python implementation elsewhere. The R code at alokesh17/conformal_LightGBM_tweedie covers the Manna scores but is research scripts, not a package.
Installation
pip install insurance-conformal-claims
With optional dependencies:
pip install insurance-conformal-claims[all] # sklearn, catboost, matplotlib
pip install insurance-conformal-claims[catboost] # for TwoStageLWConformal
pip install insurance-conformal-claims[plot] # for calibration_plot
Quick start
Model-free intervals (no GLM required)
import numpy as np
from insurance_conformal_claims import HongConformal, SCRReport
# Training data — any insurance claims dataset
# X: covariate matrix, y: claim amounts (non-negative)
hc = HongConformal()
hc.fit(X_train, y_train)
# 99.5% prediction intervals (Solvency II alpha)
intervals = hc.predict_interval(X_portfolio, alpha=0.005)
print(f"Upper bounds: {intervals[:, 1]}")
# Direct SCR extraction
report = SCRReport(hc)
scr = report.solvency_capital_requirement(X_portfolio, alpha=0.005)
print(f"SCR upper bound: £{scr:,.0f}")
With a regression model (narrower intervals)
The h-transformation framework (Hong 2026) uses any sklearn estimator to reduce interval width while preserving the finite-sample guarantee:
from sklearn.ensemble import GradientBoostingRegressor
from insurance_conformal_claims import HongTransformConformal
htc = HongTransformConformal(h_model=GradientBoostingRegressor())
htc.fit(X_train, y_train, X_cal, y_cal)
intervals = htc.predict_interval(X_test, alpha=0.05)
A well-fitted linear model reduces mean interval width by ~25% relative to the model-free baseline, from the paper's experiments on personal injury claims.
Tweedie-specific scores
When you have a fitted Tweedie GLM or CatBoost model:
from insurance_conformal_claims import TweedePearsonScore, TwoStageLWConformal
# Direct score usage
score = TweedePearsonScore(p=1.5) # p from your fitted GLM
residuals = score.score(y_cal, mu_cal)
q = np.quantile(residuals, 0.95)
upper_bounds = score.inverse(q, mu_new, upper=True)
# Or: full two-stage conformal (fits spread model automatically)
from catboost import CatBoostRegressor
lw = TwoStageLWConformal(
mean_model=CatBoostRegressor(loss_function='Tweedie:variance_power=1.5'),
p=1.5,
)
lw.fit(X_train, y_train)
lw.calibrate(X_cal, y_cal)
intervals = lw.predict_interval(X_new, alpha=0.005)
Coverage diagnostics
from insurance_conformal_claims import conditional_coverage_gap, calibration_plot
# Per-segment coverage (diagnose conditional vs marginal gap)
result = conditional_coverage_gap(
hc, X_test, y_test, alpha=0.05,
groups=X_test[:, 0].astype(int) # e.g. vehicle class
)
print(result["group_results"])
# Calibration plot (nominal vs empirical coverage)
import matplotlib.pyplot as plt
ax = calibration_plot(hc, X_test, y_test)
plt.show()
SCR report
report = SCRReport(hc)
df = report.coverage_table(X_test, y_test, alphas=[0.005, 0.01, 0.05, 0.10])
print(df)
# HTML for regulatory submission
html = report.to_html(X_test, y_test)
with open("scr_report.html", "w") as f:
f.write(html)
# JSON for downstream systems
import json
payload = json.loads(report.to_json(X_test, y_test))
The order-statistic shortcut
Standard full conformal prediction is impractical: for each candidate response value y, you augment the calibration set and recompute a nonconformity score — effectively an infinite grid search.
Hong (2025) shows that for a specific nonconformity measure that is linear in y, the prediction region collapses to a single order statistic. The adjusted score for training observation i given new point x is:
W_i = Y_i + (1/n) * sum_j (x_j - X_{ij})
The 100(1-alpha)% prediction region is (0, W_{(k)}) where k = min(n, floor((n+1)(1-alpha) + 1)). Sort {W_1, ..., W_n}, pick index k. That is the entire algorithm.
Coverage is guaranteed finite-sample (Theorem 1): P(Y in C) >= 1-alpha for all n and all exchangeable distributions. No model. No distributional assumption. The GLM misspecification problem disappears.
Coverage under misspecification
From Table 2 of Hong (2025), on personal injury claims data where the true distribution is Pareto but the model assumes log-normal:
| Method | Actual coverage | Interval width |
|---|---|---|
| Conformal (this library) | 99.6% | 1.13x oracle |
| GLM (misspecified) | 57.8% | 0.07x oracle |
| Random forest | 98.8% | 0.31x oracle |
The GLM intervals are narrow and wrong. The conformal intervals are conservative (a feature, not a bug — they are guaranteed valid).
Solvency II application
Setting alpha=0.005 gives a 99.5% prediction interval — exactly the Solvency Capital Requirement calibration under Solvency II Article 101, maintained in UK regulation via PRA PS9/24.
scr = report.solvency_capital_requirement(X_portfolio, alpha=0.005)
# "If the insurer wants to comply with Solvency II, they can set their
# risk capital level to [scr] for this line of business."
# — Hong (2025), Section 5
Coverage is marginal, not conditional. The guarantee is P(Y <= SCR) >= 99.5% averaged over the portfolio distribution, not for each individual risk. For per-policy capital, see insurance-multivariate-conformal.
Limitations
Marginal coverage only. All three methods guarantee P(Y in I) >= 1-alpha averaged over the covariate distribution. Conditional on X=x, coverage may be lower in thin regions of covariate space. Use conditional_coverage_gap() to diagnose this for your portfolio.
Exchangeability assumption. Hong's guarantee requires iid observations. UK motor claims spanning more than 2-3 years are non-stationary (claims inflation, AY effects, mix shift). Restrict calibration data to recent years.
Tweedie p is an input. The nonconformity scores require the Tweedie power p as a parameter. Estimate it from your data via MLE (e.g., statsmodels.genmod.families.Tweedie, or read from a fitted CatBoost model).
Two-stage spread model is in-sample. TwoStageLWConformal.fit() fits the spread model on training Pearson residuals, which are in-sample and optimistic. For unbiased spread estimation, use a third data split or cross-fitting.
References
- Hong (2025) "Conformal Prediction of Future Insurance Claims in the Regression Problem" arXiv:2503.03659
- Hong (2026) "A New Strategy for Finite-Sample Valid Prediction of Future Insurance Claims in the Regression Setting" arXiv:2601.21153
- Manna et al. (2025) "Conformal Prediction Inference in Regularized Insurance Models" ASMBI asmb.70045
Licence
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_conformal_claims-0.1.0.tar.gz.
File metadata
- Download URL: insurance_conformal_claims-0.1.0.tar.gz
- Upload date:
- Size: 31.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d7816df6410f0f8f3d270d1be2d01015cab6b300e1b5502342b624fd0b3d644
|
|
| MD5 |
1d21e22122a075db214f3c106d645a2e
|
|
| BLAKE2b-256 |
88ea2f1b6beb0acfe9d41dde776e9a853f4b5e63fc6662aece0f16fb0817fdf7
|
File details
Details for the file insurance_conformal_claims-0.1.0-py3-none-any.whl.
File metadata
- Download URL: insurance_conformal_claims-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3816ff09bceb3db4b8d2bfa7a05e9f323999088c072788ffcc9b962de0d507a2
|
|
| MD5 |
bcdab00d5d9fd5f64f410df661e212d5
|
|
| BLAKE2b-256 |
770c0b39784fab7a217a72cb1efac0d58392700625a9774611025f5de32f4eb1
|