Skip to main content

Conformal prediction intervals for insurance claims regression — Hong order-statistic shortcut, Tweedie nonconformity scores, and Solvency II SCR reporting

Project description

insurance-conformal-claims

GLM prediction intervals are wrong. Not "a bit off" — structurally wrong. A log-normal GLM fitted to Pareto-distributed personal injury claims achieves 57.8% coverage when the nominal level is 99.5%. This is from Table 2 of Hong (2025), and it means that a Solvency II capital model built on GLM intervals is invalid.

Conformal prediction fixes this without requiring you to know the true distribution. This library implements the three most useful conformal methods for insurance claims regression, plus a direct interface to the Solvency II 99.5% capital requirement.

What this is

Three methods from two peer-reviewed papers, implemented in Python for the first time:

Hong's order-statistic shortcut (arXiv:2503.03659, 2601.21153): Full conformal prediction that reduces to sorting a single array. No model required. Finite-sample valid coverage for any exchangeable distribution. O(n log n) — not a grid search.

Tweedie nonconformity scores (Manna et al. 2025, ASMBI asmb.70045): Pearson, deviance, and Anscombe residuals using the Tweedie variance function V(mu) = mu^p. Implemented as split conformal with a two-stage locally weighted approach that auto-fits a CatBoost spread model.

SCR reporting: Thin wrapper that extracts the 99.5% upper bound and formats it as a Solvency II / UK Solvency UK report.

None of these have a Python implementation elsewhere. The R code at alokesh17/conformal_LightGBM_tweedie covers the Manna scores but is research scripts, not a package.

Installation

pip install insurance-conformal-claims

With optional dependencies:

pip install insurance-conformal-claims[all]   # sklearn, catboost, matplotlib
pip install insurance-conformal-claims[catboost]  # for TwoStageLWConformal
pip install insurance-conformal-claims[plot]   # for calibration_plot

Quick start

Model-free intervals (no GLM required)

import numpy as np
from insurance_conformal_claims import HongConformal, SCRReport

# Training data — any insurance claims dataset
# X: covariate matrix, y: claim amounts (non-negative)
hc = HongConformal()
hc.fit(X_train, y_train)

# 99.5% prediction intervals (Solvency II alpha)
intervals = hc.predict_interval(X_portfolio, alpha=0.005)
print(f"Upper bounds: {intervals[:, 1]}")

# Direct SCR extraction
report = SCRReport(hc)
scr = report.solvency_capital_requirement(X_portfolio, alpha=0.005)
print(f"SCR upper bound: £{scr:,.0f}")

With a regression model (narrower intervals)

The h-transformation framework (Hong 2026) uses any sklearn estimator to reduce interval width while preserving the finite-sample guarantee:

from sklearn.ensemble import GradientBoostingRegressor
from insurance_conformal_claims import HongTransformConformal

htc = HongTransformConformal(h_model=GradientBoostingRegressor())
htc.fit(X_train, y_train, X_cal, y_cal)

intervals = htc.predict_interval(X_test, alpha=0.05)

A well-fitted linear model reduces mean interval width by ~25% relative to the model-free baseline, from the paper's experiments on personal injury claims.

Tweedie-specific scores

When you have a fitted Tweedie GLM or CatBoost model:

from insurance_conformal_claims import TweedePearsonScore, TwoStageLWConformal

# Direct score usage
score = TweedePearsonScore(p=1.5)  # p from your fitted GLM
residuals = score.score(y_cal, mu_cal)
q = np.quantile(residuals, 0.95)
upper_bounds = score.inverse(q, mu_new, upper=True)

# Or: full two-stage conformal (fits spread model automatically)
from catboost import CatBoostRegressor
lw = TwoStageLWConformal(
    mean_model=CatBoostRegressor(loss_function='Tweedie:variance_power=1.5'),
    p=1.5,
)
lw.fit(X_train, y_train)
lw.calibrate(X_cal, y_cal)
intervals = lw.predict_interval(X_new, alpha=0.005)

Coverage diagnostics

from insurance_conformal_claims import conditional_coverage_gap, calibration_plot

# Per-segment coverage (diagnose conditional vs marginal gap)
result = conditional_coverage_gap(
    hc, X_test, y_test, alpha=0.05,
    groups=X_test[:, 0].astype(int)  # e.g. vehicle class
)
print(result["group_results"])

# Calibration plot (nominal vs empirical coverage)
import matplotlib.pyplot as plt
ax = calibration_plot(hc, X_test, y_test)
plt.show()

SCR report

report = SCRReport(hc)
df = report.coverage_table(X_test, y_test, alphas=[0.005, 0.01, 0.05, 0.10])
print(df)

# HTML for regulatory submission
html = report.to_html(X_test, y_test)
with open("scr_report.html", "w") as f:
    f.write(html)

# JSON for downstream systems
import json
payload = json.loads(report.to_json(X_test, y_test))

The order-statistic shortcut

Standard full conformal prediction is impractical: for each candidate response value y, you augment the calibration set and recompute a nonconformity score — effectively an infinite grid search.

Hong (2025) shows that for a specific nonconformity measure that is linear in y, the prediction region collapses to a single order statistic. The adjusted score for training observation i given new point x is:

W_i = Y_i + (1/n) * sum_j (x_j - X_{ij})

The 100(1-alpha)% prediction region is (0, W_{(k)}) where k = min(n, floor((n+1)(1-alpha) + 1)). Sort {W_1, ..., W_n}, pick index k. That is the entire algorithm.

Coverage is guaranteed finite-sample (Theorem 1): P(Y in C) >= 1-alpha for all n and all exchangeable distributions. No model. No distributional assumption. The GLM misspecification problem disappears.

Coverage under misspecification

From Table 2 of Hong (2025), on personal injury claims data where the true distribution is Pareto but the model assumes log-normal:

Method Actual coverage Interval width
Conformal (this library) 99.6% 1.13x oracle
GLM (misspecified) 57.8% 0.07x oracle
Random forest 98.8% 0.31x oracle

The GLM intervals are narrow and wrong. The conformal intervals are conservative (a feature, not a bug — they are guaranteed valid).

Solvency II application

Setting alpha=0.005 gives a 99.5% prediction interval — exactly the Solvency Capital Requirement calibration under Solvency II Article 101, maintained in UK regulation via PRA PS9/24.

scr = report.solvency_capital_requirement(X_portfolio, alpha=0.005)
# "If the insurer wants to comply with Solvency II, they can set their
#  risk capital level to [scr] for this line of business."
# — Hong (2025), Section 5

Coverage is marginal, not conditional. The guarantee is P(Y <= SCR) >= 99.5% averaged over the portfolio distribution, not for each individual risk. For per-policy capital, see insurance-multivariate-conformal.

Limitations

Marginal coverage only. All three methods guarantee P(Y in I) >= 1-alpha averaged over the covariate distribution. Conditional on X=x, coverage may be lower in thin regions of covariate space. Use conditional_coverage_gap() to diagnose this for your portfolio.

Exchangeability assumption. Hong's guarantee requires iid observations. UK motor claims spanning more than 2-3 years are non-stationary (claims inflation, AY effects, mix shift). Restrict calibration data to recent years.

Tweedie p is an input. The nonconformity scores require the Tweedie power p as a parameter. Estimate it from your data via MLE (e.g., statsmodels.genmod.families.Tweedie, or read from a fitted CatBoost model).

Two-stage spread model is in-sample. TwoStageLWConformal.fit() fits the spread model on training Pearson residuals, which are in-sample and optimistic. For unbiased spread estimation, use a third data split or cross-fitting.

References

  • Hong (2025) "Conformal Prediction of Future Insurance Claims in the Regression Problem" arXiv:2503.03659
  • Hong (2026) "A New Strategy for Finite-Sample Valid Prediction of Future Insurance Claims in the Regression Setting" arXiv:2601.21153
  • Manna et al. (2025) "Conformal Prediction Inference in Regularized Insurance Models" ASMBI asmb.70045

Licence

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_conformal_claims-0.1.0.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_conformal_claims-0.1.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file insurance_conformal_claims-0.1.0.tar.gz.

File metadata

  • Download URL: insurance_conformal_claims-0.1.0.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_conformal_claims-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6d7816df6410f0f8f3d270d1be2d01015cab6b300e1b5502342b624fd0b3d644
MD5 1d21e22122a075db214f3c106d645a2e
BLAKE2b-256 88ea2f1b6beb0acfe9d41dde776e9a853f4b5e63fc6662aece0f16fb0817fdf7

See more details on using hashes here.

File details

Details for the file insurance_conformal_claims-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: insurance_conformal_claims-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_conformal_claims-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3816ff09bceb3db4b8d2bfa7a05e9f323999088c072788ffcc9b962de0d507a2
MD5 bcdab00d5d9fd5f64f410df661e212d5
BLAKE2b-256 770c0b39784fab7a217a72cb1efac0d58392700625a9774611025f5de32f4eb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page