Conformal prediction intervals for insurance claims regression — Hong order-statistic shortcut, Tweedie nonconformity scores, and Solvency II SCR reporting

These details have not been verified by PyPI

Project links

Project description

insurance-conformal-claims

GLM prediction intervals are wrong. Not "a bit off" — structurally wrong. A log-normal GLM fitted to Pareto-distributed personal injury claims achieves 57.8% coverage when the nominal level is 99.5%. This is from Table 2 of Hong (2025), and it means that a Solvency II capital model built on GLM intervals is invalid.

Conformal prediction fixes this without requiring you to know the true distribution. This library implements the three most useful conformal methods for insurance claims regression, plus a direct interface to the Solvency II 99.5% capital requirement.

What this is

Three methods from two peer-reviewed papers, implemented in Python for the first time:

Hong's order-statistic shortcut (arXiv:2503.03659, 2601.21153): Full conformal prediction that reduces to sorting a single array. No model required. Finite-sample valid coverage for any exchangeable distribution. O(n log n) — not a grid search.

Tweedie nonconformity scores (Manna et al. 2025, ASMBI asmb.70045): Pearson, deviance, and Anscombe residuals using the Tweedie variance function V(mu) = mu^p. Implemented as split conformal with a two-stage locally weighted approach that auto-fits a CatBoost spread model.

SCR reporting: Thin wrapper that extracts the 99.5% upper bound and formats it as a Solvency II / UK Solvency UK report.

None of these have a Python implementation elsewhere. The R code at alokesh17/conformal_LightGBM_tweedie covers the Manna scores but is research scripts, not a package.

Installation

pip install insurance-conformal-claims

With optional dependencies:

pip install insurance-conformal-claims[all]   # sklearn, catboost, matplotlib
pip install insurance-conformal-claims[catboost]  # for TwoStageLWConformal
pip install insurance-conformal-claims[plot]   # for calibration_plot

Quick start

Model-free intervals (no GLM required)

import numpy as np
from insurance_conformal_claims import HongConformal, SCRReport

# Training data — any insurance claims dataset
# X: covariate matrix, y: claim amounts (non-negative)
hc = HongConformal()
hc.fit(X_train, y_train)

# 99.5% prediction intervals (Solvency II alpha)
intervals = hc.predict_interval(X_portfolio, alpha=0.005)
print(f"Upper bounds: {intervals[:, 1]}")

# Direct SCR extraction
report = SCRReport(hc)
scr = report.solvency_capital_requirement(X_portfolio, alpha=0.005)
print(f"SCR upper bound: £{scr:,.0f}")

With a regression model (narrower intervals)

The h-transformation framework (Hong 2026) uses any sklearn estimator to reduce interval width while preserving the finite-sample guarantee:

from sklearn.ensemble import GradientBoostingRegressor
from insurance_conformal_claims import HongTransformConformal

htc = HongTransformConformal(h_model=GradientBoostingRegressor())
htc.fit(X_train, y_train, X_cal, y_cal)

intervals = htc.predict_interval(X_test, alpha=0.05)

A well-fitted linear model reduces mean interval width by ~25% relative to the model-free baseline, from the paper's experiments on personal injury claims.

Tweedie-specific scores

When you have a fitted Tweedie GLM or CatBoost model:

from insurance_conformal_claims import TweedePearsonScore, TwoStageLWConformal

# Direct score usage
score = TweedePearsonScore(p=1.5)  # p from your fitted GLM
residuals = score.score(y_cal, mu_cal)
q = np.quantile(residuals, 0.95)
upper_bounds = score.inverse(q, mu_new, upper=True)

# Or: full two-stage conformal (fits spread model automatically)
from catboost import CatBoostRegressor
lw = TwoStageLWConformal(
    mean_model=CatBoostRegressor(loss_function='Tweedie:variance_power=1.5'),
    p=1.5,
)
lw.fit(X_train, y_train)
lw.calibrate(X_cal, y_cal)
intervals = lw.predict_interval(X_new, alpha=0.005)

Coverage diagnostics

from insurance_conformal_claims import conditional_coverage_gap, calibration_plot

# Per-segment coverage (diagnose conditional vs marginal gap)
result = conditional_coverage_gap(
    hc, X_test, y_test, alpha=0.05,
    groups=X_test[:, 0].astype(int)  # e.g. vehicle class
)
print(result["group_results"])

# Calibration plot (nominal vs empirical coverage)
import matplotlib.pyplot as plt
ax = calibration_plot(hc, X_test, y_test)
plt.show()

SCR report

report = SCRReport(hc)
df = report.coverage_table(X_test, y_test, alphas=[0.005, 0.01, 0.05, 0.10])
print(df)

# HTML for regulatory submission
html = report.to_html(X_test, y_test)
with open("scr_report.html", "w") as f:
    f.write(html)

# JSON for downstream systems
import json
payload = json.loads(report.to_json(X_test, y_test))

The order-statistic shortcut

Standard full conformal prediction is impractical: for each candidate response value y, you augment the calibration set and recompute a nonconformity score — effectively an infinite grid search.

Hong (2025) shows that for a specific nonconformity measure that is linear in y, the prediction region collapses to a single order statistic. The adjusted score for training observation i given new point x is:

W_i = Y_i + (1/n) * sum_j (x_j - X_{ij})

The 100(1-alpha)% prediction region is (0, W_{(k)}) where k = min(n, floor((n+1)(1-alpha) + 1)). Sort {W_1, ..., W_n}, pick index k. That is the entire algorithm.

Coverage is guaranteed finite-sample (Theorem 1): P(Y in C) >= 1-alpha for all n and all exchangeable distributions. No model. No distributional assumption. The GLM misspecification problem disappears.

Coverage under misspecification

From Table 2 of Hong (2025), on personal injury claims data where the true distribution is Pareto but the model assumes log-normal:

Method	Actual coverage	Interval width
Conformal (this library)	99.6%	1.13x oracle
GLM (misspecified)	57.8%	0.07x oracle
Random forest	98.8%	0.31x oracle

The GLM intervals are narrow and wrong. The conformal intervals are conservative (a feature, not a bug — they are guaranteed valid).

Solvency II application

Setting alpha=0.005 gives a 99.5% prediction interval — exactly the Solvency Capital Requirement calibration under Solvency II Article 101, maintained in UK regulation via PRA PS9/24.

scr = report.solvency_capital_requirement(X_portfolio, alpha=0.005)
# "If the insurer wants to comply with Solvency II, they can set their
#  risk capital level to [scr] for this line of business."
# — Hong (2025), Section 5

Coverage is marginal, not conditional. The guarantee is P(Y <= SCR) >= 99.5% averaged over the portfolio distribution, not for each individual risk. For per-policy capital, see insurance-multivariate-conformal.

Limitations

Marginal coverage only. All three methods guarantee P(Y in I) >= 1-alpha averaged over the covariate distribution. Conditional on X=x, coverage may be lower in thin regions of covariate space. Use conditional_coverage_gap() to diagnose this for your portfolio.

Exchangeability assumption. Hong's guarantee requires iid observations. UK motor claims spanning more than 2-3 years are non-stationary (claims inflation, AY effects, mix shift). Restrict calibration data to recent years.

Tweedie p is an input. The nonconformity scores require the Tweedie power p as a parameter. Estimate it from your data via MLE (e.g., statsmodels.genmod.families.Tweedie, or read from a fitted CatBoost model).

Two-stage spread model is in-sample. TwoStageLWConformal.fit() fits the spread model on training Pearson residuals, which are in-sample and optimistic. For unbiased spread estimation, use a third data split or cross-fitting.

References

Hong (2025) "Conformal Prediction of Future Insurance Claims in the Regression Problem" arXiv:2503.03659
Hong (2026) "A New Strategy for Finite-Sample Valid Prediction of Future Insurance Claims in the Regression Setting" arXiv:2601.21153
Manna et al. (2025) "Conformal Prediction Inference in Regularized Insurance Models" ASMBI asmb.70045

Licence

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_conformal_claims-0.1.0.tar.gz (31.2 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_conformal_claims-0.1.0-py3-none-any.whl (22.2 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file insurance_conformal_claims-0.1.0.tar.gz.

File metadata

Download URL: insurance_conformal_claims-0.1.0.tar.gz
Upload date: Mar 13, 2026
Size: 31.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_conformal_claims-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6d7816df6410f0f8f3d270d1be2d01015cab6b300e1b5502342b624fd0b3d644`
MD5	`1d21e22122a075db214f3c106d645a2e`
BLAKE2b-256	`88ea2f1b6beb0acfe9d41dde776e9a853f4b5e63fc6662aece0f16fb0817fdf7`

See more details on using hashes here.

File details

Details for the file insurance_conformal_claims-0.1.0-py3-none-any.whl.

File metadata

Download URL: insurance_conformal_claims-0.1.0-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 22.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_conformal_claims-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3816ff09bceb3db4b8d2bfa7a05e9f323999088c072788ffcc9b962de0d507a2`
MD5	`bcdab00d5d9fd5f64f410df661e212d5`
BLAKE2b-256	`770c0b39784fab7a217a72cb1efac0d58392700625a9774611025f5de32f4eb1`

See more details on using hashes here.

insurance-conformal-claims 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-conformal-claims

What this is

Installation

Quick start

Model-free intervals (no GLM required)

With a regression model (narrower intervals)

Tweedie-specific scores

Coverage diagnostics

SCR report

The order-statistic shortcut

Coverage under misspecification

Solvency II application

Limitations

References

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes