Skip to main content

Distribution-free prediction intervals for insurance pricing models: conformal coverage guarantees, Tweedie non-conformity scores, SCR bounds, and anytime-valid sequential monitoring

Project description

insurance-conformal

Tests PyPI Python License: MIT

Distribution-free prediction intervals for insurance pricing models — 13% narrower than parametric Tweedie, with a finite-sample coverage guarantee.

Blog post: Conformal Prediction Intervals for Insurance Pricing Models


The problem

Your pricing model gives point estimates. Your parametric prediction intervals assume variance scales as mu^p across the whole book — an assumption that breaks exactly where the stakes are highest: large, unusual risks.

On a heterogeneous UK motor portfolio, parametric Tweedie intervals over-cover low-risk policies (unnecessary width) and under-cover the top risk decile — which is what drives reinsurance attachment, reserving, and SCR calculations.

Conformal prediction fixes this. The guarantee is P(y in interval) >= 1 - alpha for any data distribution, as long as calibration and test data are exchangeable. No parametric family required.

The non-obvious implementation detail: most conformal libraries use raw absolute residuals |y - yhat|. For insurance data that is wrong — a £1 error on a £100 risk is not the same as a £1 error on a £10,000 risk. The correct score for Tweedie models is |y - yhat| / yhat^(p/2), which normalises by the Tweedie standard deviation and produces exchangeable scores across risk levels. That is what this library implements.


Quick start

from insurance_conformal import InsuranceConformalPredictor

# Wrap any fitted sklearn-compatible model
cp = InsuranceConformalPredictor(
    model=fitted_gbm,
    nonconformity="pearson_weighted",  # correct default for Tweedie
    tweedie_power=1.5,
)

# Calibrate on held-out data (must not overlap training)
cp.calibrate(X_cal, y_cal)

# 90% prediction intervals — polars DataFrame: lower, point, upper
intervals = cp.predict_interval(X_test, alpha=0.10)

# Always check per-decile coverage (marginal != conditional)
print(cp.coverage_by_decile(X_test, y_test, alpha=0.10))

For locally-adaptive intervals (narrower on low-variance risks, wider on high-variance risks):

from insurance_conformal import LocallyWeightedConformal

lw = LocallyWeightedConformal(model=fitted_gbm, tweedie_power=1.5)
lw.fit(X_train, y_train)
lw.calibrate(X_cal, y_cal)
intervals = lw.predict_interval(X_test, alpha=0.10)

Why a pricing actuary should care

Accuracy where it matters. Parametric Tweedie intervals produce 93% aggregate coverage at a 90% target — fine in aggregate, but that surplus width sits on low-risk policies. The top-risk decile that drives reinsurance and reserving gets marginal coverage at best, and on books with more pronounced tail heteroscedasticity it will miss the target.

Regulatory defensibility. The distribution-free guarantee does not rely on model fit. You can write "P(claim in interval) >= 90%, finite-sample valid, no parametric assumptions" in a PRA SS1/23 validation pack. You cannot write that for a parametric bootstrap interval.

SCR calculations. SCRReport produces per-risk 99.5% upper bounds with a coverage validation table — exactly the format needed for internal model stress-testing documentation.

Premium sufficiency control. PremiumSufficiencyController finds the smallest loading factor such that expected underpricing shortfall is bounded at alpha. A direct regulatory argument, not a statistical artefact.


Performance on a realistic motor book

CatBoost Tweedie(p=1.5), 50,000 synthetic UK motor policies, heteroskedastic Gamma DGP, temporal 60/20/20 split.

Parametric Tweedie Conformal (pearson_weighted) Locally-weighted conformal
Distribution assumption Tweedie Var ~ mu^p None None
Aggregate coverage @ 90% target 93.1% (over-covers) 90.2% 90.3%
Top-decile coverage @ 90% target 90.4% 87.9% 90.6%
Mean interval width £4,393 £3,806 (−13.4%) £3,881 (−11.7%)
Width adapts per risk segment No Partial Yes
Finite-sample valid guarantee No Yes Yes

The locally-weighted variant meets the 90% target in the top decile by construction — the parametric baseline only coincidentally passes it on this dataset. Run the validation: import notebooks/databricks_validation.py into Databricks.


Installation

pip install insurance-conformal

# With CatBoost support:
pip install "insurance-conformal[catboost]"

# With LightGBM support:
pip install "insurance-conformal[lightgbm]"

# With everything (CatBoost, LightGBM, plotting):
pip install "insurance-conformal[all]"

Or with uv:

uv add insurance-conformal

Dependencies: polars and pandas are both required. Polars is the primary output format — all prediction and diagnostic methods return pl.DataFrame. Pandas is required for binning utilities and for accepting pandas DataFrame inputs. Both install automatically.


Worked examples

1. Motor frequency-severity model with per-decile coverage audit

from sklearn.linear_model import PoissonRegressor, GammaRegressor
from insurance_conformal.claims import FrequencySeverityConformal
from insurance_conformal import subgroup_coverage

fs = FrequencySeverityConformal(
    freq_model=PoissonRegressor(),
    sev_model=GammaRegressor(),
)
fs.fit(X_train, d_train, y_train)   # d_train = observed claim counts
fs.calibrate(X_cal, d_cal, y_cal)
intervals = fs.predict_interval(X_test, alpha=0.10)

# Coverage by vehicle group
sg = subgroup_coverage(
    predictor=fs,
    X_test=X_test,
    y_test=y_test,
    alpha=0.10,
    groups=vehicle_group_band,
    group_name="vehicle_group_band",
)
print(sg)

The calibration subtlety here: using the observed claim count in the severity model at calibration time creates a distributional mismatch that breaks the coverage guarantee. FrequencySeverityConformal feeds the predicted frequency (not the observed count) into the severity model at both calibration and test time. See Graziadei et al. (2023) for the proof.

2. Premium sufficiency control — bound expected underpricing

Useful when a pricing review requires a documented guarantee that expected shortfall from underpriced policies stays below a threshold.

from insurance_conformal.risk import PremiumSufficiencyController

psc = PremiumSufficiencyController(alpha=0.05, B=5.0)
psc.calibrate(y_cal, premium_cal)   # calibrate on held-out year
result = psc.predict(premium_new)   # apply to next year's book

# result["lambda_hat"]: the loading factor such that E[shortfall] <= 5%
# result["upper_bound"]: risk-controlled loaded premium per policy
print(f"Required loading: {result['lambda_hat']:.3f}")

3. SCR bounds for internal model documentation

from insurance_conformal import InsuranceConformalPredictor, SCRReport

cp = InsuranceConformalPredictor(model=fitted_model)
cp.calibrate(X_cal, y_cal)

scr = SCRReport(predictor=cp)
scr_bounds = scr.solvency_capital_requirement(X_test, alpha=0.005)
val_table  = scr.coverage_validation_table(X_test, y_test)
print(scr.to_markdown())

Disclaimer: SCRReport is an internal stress-testing tool. Solvency II SCR calculations for regulatory purposes require sign-off under an approved internal model or the standard formula. Do not use this output in regulatory returns without appropriate actuarial review, governance sign-off, and alignment with your firm's approved methodology.

4. Recovering from mid-year claims inflation (Ogden rate change, CAT event)

Standard conformal with a static calibration set breaks when the book shifts mid-year. RetroAdj recovers within 1–3 steps by retroactively correcting all leave-one-out residuals in the sliding window simultaneously.

from insurance_conformal import RetroAdj

# Residual-only mode: wrap an existing GLM or GBM
resid_train = y_train - glm.predict(X_train)
resid_test  = y_test  - glm.predict(X_test)

model = RetroAdj(window_size=250, gamma=0.005)
model.fit(resid_train)
lower_r, upper_r = model.predict_interval(resid_test, alpha=0.10)

lower_claims = lower_r + glm.predict(X_test)
upper_claims = upper_r + glm.predict(X_test)
Metric RetroAdj Standard ACI
Steps to recover 90% coverage after +30% inflation shock ~15–30 ~80–150
Post-shift coverage (full window) ~88–91% ~80–87%

Features

  • InsuranceConformalPredictor — split conformal prediction wrapping any sklearn-compatible model. Non-conformity scores: pearson_weighted, pearson, deviance, anscombe, raw.
  • LocallyWeightedConformal — two-stage conformal with a secondary spread model. Meets per-decile coverage targets that standard conformal misses.
  • ConformalisedQuantileRegression — split CQR (Romano et al., 2019). Wraps pre-fitted quantile models. Works with CatBoost Quantile:alpha=, LightGBM objective=quantile.
  • FrequencySeverityConformal — correct conformity scoring for two-stage frequency-severity models (Graziadei et al., 2023).
  • SCRReport — per-risk 99.5% upper bounds with coverage validation table. For PRA SS1/23 model documentation.
  • solvency_capital_range() — functional API for SCR bounds inside pipelines.
  • insurance_conformal.risk — Conformal Risk Control (Angelopoulos et al., ICLR 2024). PremiumSufficiencyController, IntervalWidthController, SelectiveRiskController.
  • RetroAdj — online conformal with retrospective adjustment (Jun & Ohn, 2025). Recovers from abrupt distribution shifts within 1–3 steps.
  • CoverageDiagnostics — coverage-by-decile plots, interval width distributions, subgroup coverage by arbitrary segment.
  • insurance_conformal.multivariate — joint multi-output conformal for simultaneous frequency/severity intervals.

Non-conformity scores

Score Formula When to use
pearson_weighted |y - yhat| / yhat^(p/2) Default. Tweedie/Poisson pricing models.
pearson |y - yhat| / sqrt(yhat) Pure Poisson frequency models (p=1).
deviance Deviance residual When you want exact statistical optimality; slower.
anscombe Anscombe transform Variance-stabilising alternative to deviance.
raw |y - yhat| Baseline only. Not appropriate for insurance data.

Width hierarchy (narrowest first, coverage identical): pearson_weighted <= deviance <= anscombe < pearson < raw.


Temporal calibration

Calibrate on recent data to capture current loss trends:

from insurance_conformal.utils import temporal_split

X_train, X_cal, y_train, y_cal, _, _ = temporal_split(
    X, y,
    calibration_frac=0.20,
    date_col="accident_year",
)

model.fit(X_train, y_train)
cp.calibrate(X_cal, y_cal)

Target n_cal >= 2,000 for stable production use. The guarantee holds for any n_cal >= 1, but below 500 interval widths are materially wider and more variable.


Coverage guarantee

Split conformal provides:

P(y_test in [lower, upper]) >= 1 - alpha

Distribution-free — holds regardless of the true data distribution or model misspecification. The assumption is exchangeability: calibration and test observations drawn from the same distribution. Temporal covariate shift violates this — use temporal calibration splits and monitor coverage via RetroAdj if abrupt shifts are expected.


Design choices

Split conformal, not cross-conformal. Cross-conformal is more statistically efficient but requires refitting the model on each calibration fold. For GBMs that take hours to train, this is not practical. Split conformal trains once, calibrates once.

No MAPIE dependency. MAPIE is excellent but does not expose the insurance-specific scores implemented here. The split conformal algorithm is simple enough to own: 20 lines of code for conformal_quantile() plus the score functions.

Polars-native output. All prediction and diagnostic methods return pl.DataFrame. Pandas inputs are accepted.

Lower bound clipped at zero. Insurance losses are non-negative. Intervals with negative lower bounds are nonsensical. We clip at zero unconditionally.

Auto-detection of Tweedie power. For CatBoost, read from the loss function string. For sklearn TweedieRegressor, from model.power. Pass tweedie_power= explicitly to override.


Limitations

  • Coverage is marginal, not conditional. The guarantee holds on average. High-risk subgroups can be systematically under-covered even when aggregate coverage meets the target. Always run coverage_by_decile() after calibration.
  • Exchangeability is violated by portfolio drift. Mid-year claims inflation, Ogden rate changes, or significant portfolio mix shifts break the exchangeability assumption. Use temporal calibration splits and monitor via RetroAdj.
  • IBNR on recent accident years produces intervals that are too narrow. Calibrating on development-year 0 or 1 data means non-conformity scores are computed on understated claim totals. Use only accident years with at least 3 years of development, or apply IBNR chain-ladder factors to y_cal before calibration.
  • RetroAdj full method requires kernel ridge regression as the base model. Use residual-only mode for existing GLMs or GBMs.

Part of the Burning Cost stack

Takes any fitted model — Tweedie GBM, GAM, GLM, or the output of insurance-gam or insurance-frequency-severity. Feeds distribution-free prediction intervals into insurance-optimise (uncertainty-aware pricing) and insurance-governance (PRA SS1/23 validation packs). → See the full stack


References

  • Hong, L. (2025). "Conformal prediction of future insurance claims in the regression problem." arXiv:2503.03659.
  • Hong, L. (2026). "A new strategy for finite-sample valid prediction of future insurance claims in the regression setting." arXiv:2601.21153.
  • Graziadei, H., Janett, C., Embrechts, P. & Bucher, A. (2023). "Conformal Prediction for Insurance Data." arXiv:2307.13124.
  • Manna, S. et al. (2025). "Conformal Prediction Inference in Regularized Insurance Models." Wiley ASMB; arXiv:2507.06921.
  • Angelopoulos, A. N., Bates, S. et al. (2024). "Conformal Risk Control." ICLR 2024. arXiv:2208.02814.
  • Jun, J. & Ohn, I. (2025). "Online Conformal Inference with Retrospective Adjustment." arXiv:2511.04275.
  • Romano, Y., Patterson, E. & Candes, E. (2019). "Conformalized Quantile Regression." NeurIPS 2019. arXiv:1905.03222.

Related libraries

Library Description
insurance-monitoring Model drift detection — track coverage stability over time
insurance-conformal-ts Conformal prediction for non-exchangeable claims time series
insurance-causal Double Machine Learning for causal pricing inference
insurance-gam GAM pricing models that feed directly into this library

Other Burning Cost libraries

Model building

Library Description
shap-relativities Extract rating relativities from GBMs using SHAP
insurance-cv Walk-forward cross-validation respecting IBNR structure

Uncertainty quantification

Library Description
bayesian-pricing Hierarchical Bayesian models for thin-data segments
insurance-distributional Full conditional distribution per risk: mean, variance, CoV

Deployment and optimisation

Library Description
insurance-optimise Constrained rate change optimisation with FCA PS21/5 compliance

Governance

Library Description
insurance-fairness Proxy discrimination auditing for UK insurance models
insurance-monitoring Model monitoring: PSI, A/E ratios, Gini drift test

All libraries


Training Course

Want structured learning? Insurance Pricing in Python is a 12-module course covering the full pricing workflow. Module 11 covers conformal prediction — split conformal, CQR, and coverage guarantees for pricing models. £97 one-time.

Community

Licence

MIT. See LICENSE.

Contributing

Issues and pull requests welcome at github.com/burning-cost/insurance-conformal.


Need help implementing this? See our consulting services.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_conformal-0.7.1.tar.gz (390.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_conformal-0.7.1-py3-none-any.whl (157.8 kB view details)

Uploaded Python 3

File details

Details for the file insurance_conformal-0.7.1.tar.gz.

File metadata

  • Download URL: insurance_conformal-0.7.1.tar.gz
  • Upload date:
  • Size: 390.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_conformal-0.7.1.tar.gz
Algorithm Hash digest
SHA256 846a53c18002c62f5d426a0d976b122c62a3b2770012edaba7f3b64fa2b71734
MD5 e03ec6aa5fe3ffc3a58db25f8d38676b
BLAKE2b-256 f029f64e13ae646423ebab0d4c6b560759cfe740a1b51cd8e0c0db6ed49944f6

See more details on using hashes here.

File details

Details for the file insurance_conformal-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: insurance_conformal-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 157.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_conformal-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 556cd85613e7f59ac735a3faea506a182aa4a120f49d6382faff4d8305513401
MD5 4c4d148c3f308406c92a57bdbb7f7997
BLAKE2b-256 3ee6c2ff45c0794966f526d4e1fca52dfe4d4197443b1dc4aebb13445b7795d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page