Distribution-free prediction intervals for insurance GBM and GLM pricing models

These details have not been verified by PyPI

Project links

Project description

insurance-conformal

💬 Questions or feedback? Start a Discussion. Found it useful? A ⭐ helps others find it.

Distribution-free prediction intervals for insurance GBM and GLM pricing models — for pricing actuaries who need uncertainty quantification that holds regardless of model specification, without the coverage failures that parametric intervals produce on heterogeneous motor books.

Why bother

Benchmarked against naive parametric intervals (Poisson GLM residual sigma) on synthetic UK motor data — 50,000 policies, temporal 60/20/20 train/calibration/test split. Same CatBoost Poisson point forecast for both methods.

Metric	Naive parametric	Conformal (split)	Conformal (LW)
Coverage (90% target)	Often misses in high-risk tail	Meets by construction	Meets by construction
Worst-decile coverage	Can be 70-80%	Near target	Near target
Mean interval width	Reference	Comparable	~10-20% narrower
Calibration overhead	~0s	~1s	+2-5 min (secondary GBM)
Adaptive width	No	Partial (Pearson)	Yes

The 10-20 percentage point undercoverage in the top decile is the problem this library solves. Conformal intervals meet the stated target by construction — the only requirement is an exchangeable calibration set, which any temporal split provides.

Run on Databricks

The problem

Your Tweedie GBM gives point estimates. A pricing actuary needs to know the uncertainty around those estimates - not as a parametric confidence interval that depends on distributional assumptions, but as a guarantee: this interval will contain the actual loss at least 90% of the time, for any data distribution.

Conformal prediction provides that guarantee. The catch is that the choice of non-conformity score determines interval width. Most conformal implementations use the raw absolute residual |y - yhat|. For insurance data, that is wrong: it treats a 1-unit error on a £100 risk identically to a 1-unit error on a £10,000 risk, producing intervals that are too wide on low-risk policies and too narrow on large risks.

The solution

For Tweedie/Poisson models, Var(Y) ~ mu^p. The correct non-conformity score is the locally-weighted Pearson residual:

score(y, yhat) = |y - yhat| / yhat^(p/2)

This accounts for the inherent heteroscedasticity of insurance claims. The result: ~30% narrower intervals with identical coverage guarantees. Based on Manna et al. (2025) and arXiv 2507.06921.

Blog post

Conformal Prediction Intervals for Insurance Pricing Models

Installation

uv add insurance-conformal

# With CatBoost support:
uv add "insurance-conformal[catboost]"

# With plotting:
uv add "insurance-conformal[all]"

Quick start

import numpy as np
from insurance_conformal import InsuranceConformalPredictor

# Synthetic data: 50k training, 10k calibration, 10k test
rng = np.random.default_rng(42)
n_train, n_cal, n_test = 50_000, 10_000, 10_000
n_features = 6
X_train = rng.standard_normal((n_train, n_features))
X_cal   = rng.standard_normal((n_cal,   n_features))
X_test  = rng.standard_normal((n_test,  n_features))
y_train = rng.gamma(shape=1.5, scale=500, size=n_train)
y_cal   = rng.gamma(shape=1.5, scale=500, size=n_cal)
y_test  = rng.gamma(shape=1.5, scale=500, size=n_test)

# Fit your model however you normally would
import catboost
model = catboost.CatBoostRegressor(
    loss_function="Tweedie:variance_power=1.5",
    iterations=300,
    learning_rate=0.05,
    depth=6,
    verbose=0,
)
model.fit(X_train, y_train)

# Wrap it
cp = InsuranceConformalPredictor(
    model=model,
    nonconformity="pearson_weighted",  # default, recommended for insurance
    distribution="tweedie",
    tweedie_power=1.5,
)

# Calibrate on held-out data (must not overlap with training set)
cp.calibrate(X_cal, y_cal)

# Generate 90% prediction intervals
intervals = cp.predict_interval(X_test, alpha=0.10)
# DataFrame with columns: lower, point, upper

print(intervals.head())
# shape: (5, 3)
# ┌───────┬────────────┬─────────────┐
# │ lower ┆ point      ┆ upper       │
# │ ---   ┆ ---        ┆ ---         │
# │ f64   ┆ f64        ┆ f64         │
# ╞═══════╪════════════╪═════════════╡
# │ 0.0   ┆ 787.800176 ┆ 1629.240867 │
# │ 0.0   ┆ 652.927728 ┆ 1383.831645 │
# │ 0.0   ┆ 741.107597 ┆ 1544.860221 │
# │ 0.0   ┆ 763.402341 ┆ 1585.222083 │
# │ 0.0   ┆ 734.043618 ┆ 1532.043552 │
# └───────┴────────────┴─────────────┘
# Note: lower=0.0 is expected — insurance losses are non-negative and the predictor clips at zero.

Worked Example

conformal_prediction_intervals.py compares Tweedie conformal prediction intervals against a parametric bootstrap baseline on a synthetic motor book, then drills into per-segment coverage analysis across risk deciles and vehicle groups. It shows exactly where the bootstrap fails to meet its stated 90% coverage target — and confirms that the conformal approach holds by construction.

A Databricks-importable version is also available: Databricks notebook.

Coverage diagnostics

The marginal coverage guarantee means P(y in interval) >= 1 - alpha averaged over all observations. In insurance, you also need to check that coverage is uniform across risk deciles - a model can achieve 90% overall while only covering 65% of high-risk policies.

# THE key diagnostic
diag = cp.coverage_by_decile(X_test, y_test, alpha=0.10)
print(diag)
#    decile  mean_predicted  n_obs  coverage  target_coverage
# 0       1          0.0234    400     0.923             0.90
# 1       2          0.0512    400     0.910             0.90
# ...
# 9      10          2.3410    400     0.905             0.90

# Full summary: marginal coverage + decile breakdown
cp.summary(X_test, y_test, alpha=0.10)

# Matplotlib plots - use CoverageDiagnostics for coverage_plot and interval_width_distribution
from insurance_conformal import CoverageDiagnostics
intervals_for_diag = cp.predict_interval(X_test, alpha=0.10)
diag_tool = CoverageDiagnostics(
    y_true=y_test,
    y_lower=intervals_for_diag["lower"].to_numpy(),
    y_upper=intervals_for_diag["upper"].to_numpy(),
    y_pred=intervals_for_diag["point"].to_numpy(),
    alpha=0.10,
)
fig = diag_tool.coverage_plot()
fig.savefig("coverage_by_decile.png", dpi=150)

# Interval width distribution
fig = diag_tool.interval_width_distribution()

Non-conformity scores

Score	Formula	When to use
`pearson_weighted`	`\|y - yhat\| / yhat^(p/2)`	Default. Tweedie/Poisson pricing models.
`pearson`	`\|y - yhat\| / sqrt(yhat)`	Pure Poisson frequency models (p=1).
`deviance`	Deviance residual	When you want exact statistical optimality; slower.
`anscombe`	Anscombe transform	Variance-stabilising alternative to deviance.
`raw`	`\|y - yhat\|`	Baseline only. Not appropriate for insurance data.

The score hierarchy for interval width (narrowest first, coverage identical): pearson_weighted <= deviance <= anscombe < pearson < raw

Temporal calibration

In insurance, you should calibrate on recent data to capture current loss trends, not a random subsample of all years:

from insurance_conformal.utils import temporal_split

# Split by date - calibration gets the most recent 20%
X_train, X_cal, y_train, y_cal, _, _ = temporal_split(
    X, y,
    calibration_frac=0.20,
    date_col="accident_year",  # column in X DataFrame
)

model.fit(X_train, y_train)
cp.calibrate(X_cal, y_cal)

Use insurance-cv if you need full walk-forward cross-validation respecting IBNR development structure.

Coverage guarantee

Split conformal prediction provides the following guarantee for exchangeable data:

P(y_test in [lower, upper]) >= 1 - alpha

This is distribution-free — it holds regardless of the true data distribution or model misspecification. The core assumption is exchangeability: calibration and test observations must be drawn from the same distribution and be interchangeable in order. Temporal covariate shift — where the risk profile of test data differs from calibration data — violates this assumption and can degrade coverage in practice. Use temporal calibration splits (calibrate on the most recent accident year before the test period) to minimise the distribution gap. The temporal_split utility is provided for this purpose.

"Exchangeable" roughly means "drawn from the same distribution in the same order". For insurance, this means you should not calibrate on year 5 and test on year 1. Use temporal splits.

Design choices

Split conformal, not cross-conformal. Cross-conformal is more statistically efficient but requires refitting the model on each calibration fold. For GBMs that take hours to train, this is not practical. Split conformal trains once, calibrates once.

No MAPIE dependency. MAPIE is excellent but it does not expose the insurance-specific scores implemented here. The split conformal algorithm is simple enough to own: 20 lines of code for conformal_quantile() plus the score functions.

Lower bound clipped at 0. Insurance losses are non-negative. Prediction intervals with negative lower bounds are nonsensical. We clip at 0 unconditionally.

Auto-detection of Tweedie power. For CatBoost, the power parameter is read from the loss function string. For sklearn TweedieRegressor, from model.power. If detection fails, we warn and default to p=1.5. Pass tweedie_power= explicitly if you know the correct value.

Conformal Risk Control

Standard conformal prediction controls coverage probability: P(Y in C(X)) >= 1 - alpha. That guarantees a fraction of intervals contain the true outcome — but says nothing about how badly wrong the misses are. For insurance pricing, the question that matters is different: how much are we underpriced, in expectation?

The insurance_conformal.risk subpackage implements Conformal Risk Control (CRC, Angelopoulos et al., ICLR 2024), which controls expected loss directly:

E[L(C_lambda(X), Y)] <= alpha

for any bounded monotone loss L. No parametric assumptions. Finite-sample valid.

Lead use case: premium sufficiency control

Given a GBM that outputs predicted pure premium p(X), find the smallest loading factor lambda* such that the expected shortfall from underpriced policies is bounded:

from insurance_conformal.risk import PremiumSufficiencyController

psc = PremiumSufficiencyController(alpha=0.05, B=5.0)
psc.calibrate(y_cal, premium_cal)   # calibrate on held-out year
result = psc.predict(premium_new)   # apply to next year's book
# result["upper_bound"]: risk-controlled loading factor per policy
# result["lambda_hat"]: the single lambda* that achieves E[shortfall] <= 5%

Three controllers

Controller	Use case
`PremiumSufficiencyController`	Bound expected underpricing shortfall: E[max(claim - lambda * premium, 0) / premium] <= alpha
`IntervalWidthController`	Find the most efficient conformal quantile level that still bounds expected interval width
`SelectiveRiskController`	Accept/reject risks to bound expected loss on the accepted book

Import path

from insurance_conformal.risk import (
    PremiumSufficiencyController,
    IntervalWidthController,
    SelectiveRiskController,
    conformal_risk_calibration,
    shortfall_loss,
    premium_sufficiency_report,
)

References

Angelopoulos, A. N., Bates, S., Fisch, A., Lei, L., & Schuster, T. (2024). Conformal Risk Control. ICLR 2024. arXiv:2208.02814.
Selective CRC: arXiv:2512.12844 (2025).

SCRReport

SCRReport wraps a calibrated conformal predictor and produces per-risk 99.5% upper bounds suitable for internal stress-testing and model validation.

Disclaimer: SCRReport is an internal stress-testing tool. Solvency II SCR calculations for regulatory purposes require sign-off under an approved internal model or the standard formula. Do not use this output in regulatory returns without appropriate actuarial review, governance sign-off, and alignment with your firm's approved methodology.

from insurance_conformal.scr import SCRReport

scr = SCRReport(predictor=cp)
scr_bounds = scr.solvency_capital_requirement(X_test, alpha=0.005)
val_table = scr.coverage_validation_table(X_test, y_test)
print(scr.to_markdown())

Limitations

Exchangeability assumption. Split conformal requires calibration and test data to be exchangeable. Temporal covariate shift — changes in portfolio mix, inflation, or risk profile between calibration and test periods — weakens this assumption. Use temporal calibration splits and monitor coverage drift over time.

IBNR on recent accident years. For severity and pure premium models, calibrating on the most recent accident year means calibrating on incomplete claims. IBNR (incurred but not reported) development causes non-conformity scores to be computed on understated y_cal values, producing intervals that are too narrow for open development periods. Recommend using only fully-developed accident years (typically 3+ years prior) for calibration, or applying a development factor to y_cal before calibration.

Marginal vs. conditional coverage. The conformal guarantee is marginal: it holds on average across all observations. High-risk subgroups can still be systematically under-covered if the non-conformity score does not fully account for heteroscedasticity. Always check coverage_by_decile() after calibration.

Score choice matters. The raw score produces valid but very wide intervals on insurance data. Use pearson_weighted for Tweedie/Poisson models. If you switch scores, recalibrate.

References

Manna, S. et al. (2025). "Distribution-free prediction sets for Tweedie regression." arXiv:2507.06921.
Angelopoulos, A. N., & Bates, S. (2023). "Conformal prediction: A gentle introduction." Foundations and Trends in Machine Learning, 16(4), 494-591.
Vovk, V., Gammerman, A., & Shafer, G. (2005). Algorithmic learning in a random world. Springer.

Related Libraries

Library	What it does
insurance-cv	Temporal cross-validation — provides the calibration splits conformal prediction requires to maintain coverage guarantees
insurance-distributional	Parametric severity distributions — alternative when closed-form tail quantities are needed rather than distribution-free intervals
insurance-quantile	Quantile GBM for tail risk — feeds directly into conformalized quantile regression for distribution-free coverage

Benchmark: Conformal vs naive parametric intervals

50,000 synthetic UK motor policies with a Gamma severity DGP (right-skewed, heteroscedastic). The shape parameter varies by risk level, producing more dispersion in the high-risk segment. Temporal 60/20/20 split: 30,000 train, 10,000 calibration, 10,000 test. The same Ridge regression baseline (log-link) is used for both methods. Target coverage: 90%.

Run on Databricks serverless compute (2026-03-16, seed=42).

Naive parametric baseline — global sigma estimated from log-scale calibration residuals, intervals constructed as yhat × exp(±1.645σ):

Decile	Avg predicted (£)	Coverage
1	925	0.936
2	1,132	0.903
3	1,274	0.929
4	1,401	0.922
5	1,528	0.919
6	1,658	0.914
7	1,803	0.911
8	1,982	0.919
9	2,223	0.904
10	2,731	0.917

Conformal (pearson_weighted score, tweedie_power=1.5):

Decile	Coverage
1–6	0.922–0.993
7	0.878
8	0.869
9	0.820
10	0.714

Summary:

Metric	Naive parametric	Conformal (pearson_weighted)
Aggregate coverage (target: 90%)	0.917	0.901
Worst-decile coverage	0.917	0.714
Coverage gap at highest-risk decile	−1.7pp (above target)	−18.6pp (below target)
Mean interval width	£6,445	£4,675
Width vs raw conformal	n/a	−2.2%
Distribution-free guarantee	No	Yes (marginal only)

Total benchmark time: 2.1s on Databricks serverless.

Key findings

In this scenario the naive parametric intervals achieve near-uniform coverage across all deciles (91.7% in the top decile vs 90% target), because the log-normal approximation happens to fit the DGP reasonably well in aggregate. This is the benchmark's null result: when model and DGP are reasonably well-matched, parametric intervals perform adequately.
The conformal pearson_weighted score undercovers the highest-risk decile at 71.4% — 18.6pp below the 90% target. The marginal coverage guarantee holds (90.1% in aggregate), but decile-level coverage can still fail badly. The pearson_weighted score divides non-conformity scores by yhat^0.75, which compresses scores for high-risk policies and effectively underestimates the quantile needed to cover them. The coverage guarantee is marginal, not conditional.
The interval width reduction is only 2.2% vs raw conformal — much smaller than the 15–30% cited in the literature. Width reduction depends heavily on the quality of the point forecast: a Ridge regression on log(y) with moderate predictive power will not produce strongly differential non-conformity scores, so the weighting gives limited benefit.

Practical implication: use pearson_weighted with a well-calibrated GBM point forecast, not a linear model. The coverage guarantee is marginal by construction — if you need conditional coverage guarantees by risk segment, that requires a conditional conformal approach (not currently in this library). Run benchmarks/benchmark.py on your own data before relying on any particular score choice.

Other Burning Cost libraries

Model building

Library	Description
shap-relativities	Extract rating relativities from GBMs using SHAP
insurance-interactions	Automated GLM interaction detection via CANN and NID scores
insurance-cv	Walk-forward cross-validation respecting IBNR structure

Uncertainty quantification

Library	Description
bayesian-pricing	Hierarchical Bayesian models for thin-data segments
insurance-credibility	Bühlmann-Straub credibility weighting
insurance-distributional	Full conditional distribution per risk: mean, variance, CoV

Deployment and optimisation

Library	Description
insurance-optimise	Constrained rate change optimisation with FCA PS21/5 compliance
insurance-demand	Conversion, retention, and price elasticity modelling

Governance

Library	Description
insurance-fairness	Proxy discrimination auditing for UK insurance models
insurance-causal	Double Machine Learning for causal pricing inference
insurance-monitoring	Model monitoring: PSI, A/E ratios, Gini drift test

Spatial

Library	Description
insurance-spatial	BYM2 spatial territory ratemaking for UK personal lines

All libraries

Licence

MIT. See LICENSE.

Contributing

Issues and pull requests welcome at github.com/burning-cost/insurance-conformal.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.2

Apr 4, 2026

1.3.1

Apr 4, 2026

1.2.0

Apr 1, 2026

0.9.0

Apr 1, 2026

0.8.0

Mar 31, 2026

0.7.1

Mar 31, 2026

0.6.4

Mar 27, 2026

0.6.3

Mar 25, 2026

This version

0.4.3

Mar 19, 2026

0.4.2

Mar 17, 2026

0.4.1

Mar 15, 2026

0.2.1

Mar 11, 2026

0.2.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_conformal-0.4.3.tar.gz (333.1 kB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_conformal-0.4.3-py3-none-any.whl (109.6 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file insurance_conformal-0.4.3.tar.gz.

File metadata

Download URL: insurance_conformal-0.4.3.tar.gz
Upload date: Mar 19, 2026
Size: 333.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for insurance_conformal-0.4.3.tar.gz
Algorithm	Hash digest
SHA256	`0ff4ca8fd060f998823839945f83f0cd0fa14bb69164ef5b7111646b078f0ee1`
MD5	`39f837020a24700ff2536edeef7eff29`
BLAKE2b-256	`134fb782edeb9ca3a1e53504ee176c9c2337d3ba0342e017beb930c0940bd45c`

See more details on using hashes here.

File details

Details for the file insurance_conformal-0.4.3-py3-none-any.whl.

File metadata

Download URL: insurance_conformal-0.4.3-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 109.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for insurance_conformal-0.4.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cee34e6a234066ea6052f65868aa939986e5d90f68df72179e055eca576bff27`
MD5	`ccb30b4251da2b0c0ac4e0aebf42a6c5`
BLAKE2b-256	`788465ae1c65b47d2b17d8041351b4b708457e3f3cb4cc149b2dc737124d92cf`

See more details on using hashes here.

insurance-conformal 0.4.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-conformal

Why bother

The problem

The solution

Blog post

Installation

Quick start

Worked Example

Coverage diagnostics

Non-conformity scores

Temporal calibration

Coverage guarantee

Design choices

Conformal Risk Control

Lead use case: premium sufficiency control

Three controllers

Import path

References

SCRReport

Limitations

References

Related Libraries

Benchmark: Conformal vs naive parametric intervals

Key findings

Other Burning Cost libraries

Licence

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes