Distribution-free prediction intervals for insurance GBM and GLM pricing models

These details have not been verified by PyPI

Project links

Project description

insurance-conformal

Python License: MIT PyPI

Distribution-free prediction intervals for insurance GBM and GLM pricing models.

The problem

Your Tweedie GBM gives point estimates. A pricing actuary needs to know the uncertainty around those estimates - not as a parametric confidence interval that depends on distributional assumptions, but as a guarantee: this interval will contain the actual loss at least 90% of the time, for any data distribution.

Conformal prediction provides that guarantee. The catch is that the choice of non-conformity score determines interval width. Most conformal implementations use the raw absolute residual |y - yhat|. For insurance data, that is wrong: it treats a 1-unit error on a £100 risk identically to a 1-unit error on a £10,000 risk, producing intervals that are too wide on low-risk policies and too narrow on large risks.

The solution

For Tweedie/Poisson models, Var(Y) ~ mu^p. The correct non-conformity score is the locally-weighted Pearson residual:

score(y, yhat) = |y - yhat| / yhat^(p/2)

This accounts for the inherent heteroscedasticity of insurance claims. The result: ~30% narrower intervals with identical coverage guarantees. Based on Manna et al. (2025) and arXiv 2507.06921.

Installation

uv add insurance-conformal

# With CatBoost support:
uv add "insurance-conformal[catboost]"

# With plotting:
uv add "insurance-conformal[all]"

Quick start

from insurance_conformal import InsuranceConformalPredictor

# Fit your model however you normally would
import catboost
model = catboost.CatBoostRegressor(
    loss_function="Tweedie:variance_power=1.5",
    iterations=300,
    learning_rate=0.05,
    depth=6,
    verbose=0,
)
model.fit(X_train_pd, y_train)

# Wrap it
cp = InsuranceConformalPredictor(
    model=model,
    nonconformity="pearson_weighted",  # default, recommended for insurance
    distribution="tweedie",
    tweedie_power=1.5,
)

# Calibrate on held-out data (must not overlap with training set)
cp.calibrate(X_cal, y_cal)

# Generate 90% prediction intervals
intervals = cp.predict_interval(X_test, alpha=0.10)
# DataFrame with columns: lower, point, upper

print(intervals.head())
#       lower   point    upper
# 0    0.0121  0.0845   0.3291
# 1    0.0034  0.0231   0.0901
# 2    0.1820  1.2742   4.9621

Coverage diagnostics

The marginal coverage guarantee means P(y in interval) >= 1 - alpha averaged over all observations. In insurance, you also need to check that coverage is uniform across risk deciles - a model can achieve 90% overall while only covering 65% of high-risk policies.

# THE key diagnostic
diag = cp.coverage_by_decile(X_test, y_test, alpha=0.10)
print(diag)
#    decile  mean_predicted  n_obs  coverage  target_coverage
# 0       1          0.0234    400     0.923             0.90
# 1       2          0.0512    400     0.910             0.90
# ...
# 9      10          2.3410    400     0.905             0.90

# Full summary: marginal coverage + decile breakdown
cp.summary(X_test, y_test, alpha=0.10)

# Matplotlib plot with confidence bands
fig = cp.coverage_plot(X_test, y_test, alpha=0.10)
fig.savefig("coverage_by_decile.png", dpi=150)

# Interval width distribution
fig = cp.interval_width_distribution(X_test, alpha=0.10)

Non-conformity scores

Score	Formula	When to use
`pearson_weighted`	`\|y - yhat\| / yhat^(p/2)`	Default. Tweedie/Poisson pricing models.
`pearson`	`\|y - yhat\| / sqrt(yhat)`	Pure Poisson frequency models (p=1).
`deviance`	Deviance residual	When you want exact statistical optimality; slower.
`anscombe`	Anscombe transform	Variance-stabilising alternative to deviance.
`raw`	`\|y - yhat\|`	Baseline only. Not appropriate for insurance data.

The score hierarchy for interval width (narrowest first, coverage identical): pearson_weighted >= deviance >= anscombe > pearson > raw

Temporal calibration

In insurance, you should calibrate on recent data to capture current loss trends, not a random subsample of all years:

from insurance_conformal.utils import temporal_split

# Split by date - calibration gets the most recent 20%
X_train, X_cal, y_train, y_cal, _, _ = temporal_split(
    X, y,
    calibration_frac=0.20,
    date_col="accident_year",  # column in X DataFrame
)

model.fit(X_train, y_train)
cp.calibrate(X_cal, y_cal)

Coverage guarantee

Split conformal prediction provides the following guarantee for exchangeable data:

P(y_test in [lower, upper]) >= 1 - alpha

This is distribution-free - it holds regardless of the true data distribution, model misspecification, or covariate shift (as long as calibration and test data are exchangeable). The only assumption is that the calibration set is held out from model training.

"Exchangeable" roughly means "drawn from the same distribution in the same order". For insurance, this means you should not calibrate on year 5 and test on year 1. Use temporal splits.

Design choices

Split conformal, not cross-conformal. Cross-conformal is more statistically efficient but requires refitting the model on each calibration fold. For GBMs that take hours to train, this is not practical. Split conformal trains once, calibrates once.

No MAPIE dependency. MAPIE is excellent but it does not expose the insurance-specific scores implemented here. The split conformal algorithm is simple enough to own: 20 lines of code for conformal_quantile() plus the score functions.

Lower bound clipped at 0. Insurance losses are non-negative. Prediction intervals with negative lower bounds are nonsensical. We clip at 0 unconditionally.

Auto-detection of Tweedie power. For CatBoost, the power parameter is read from the loss function string. For sklearn TweedieRegressor, from model.power. If detection fails, we warn and default to p=1.5. Pass tweedie_power= explicitly if you know the correct value.

References

Manna, S. et al. (2025). "Distribution-free prediction sets for Tweedie regression." arXiv:2507.06921.
Angelopoulos, A. N., & Bates, S. (2023). "Conformal prediction: A gentle introduction." Foundations and Trends in Machine Learning, 16(4), 494-591.
Vovk, V., Gammerman, A., & Shafer, G. (2005). Algorithmic learning in a random world. Springer.

Other Burning Cost libraries

Model building

Library	Description
shap-relativities	Extract rating relativities from GBMs using SHAP
insurance-interactions	Automated GLM interaction detection via CANN and NID scores
insurance-cv	Walk-forward cross-validation respecting IBNR structure

Uncertainty quantification

Library	Description
bayesian-pricing	Hierarchical Bayesian models for thin-data segments
credibility	Bühlmann-Straub credibility weighting

Deployment and optimisation

Library	Description
rate-optimiser	Constrained rate change optimisation with FCA PS21/5 compliance
insurance-demand	Conversion, retention, and price elasticity modelling

Governance

Library	Description
insurance-fairness	Proxy discrimination auditing for UK insurance models
insurance-causal	Double Machine Learning for causal pricing inference
insurance-monitoring	Model monitoring: PSI, A/E ratios, Gini drift test

Spatial

Library	Description
insurance-spatial	BYM2 spatial territory ratemaking for UK personal lines

All libraries →

License

MIT. See LICENSE.

Contributing

Issues and pull requests welcome at github.com/burning-cost/insurance-conformal.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.2

Apr 4, 2026

1.3.1

Apr 4, 2026

1.2.0

Apr 1, 2026

0.9.0

Apr 1, 2026

0.8.0

Mar 31, 2026

0.7.1

Mar 31, 2026

0.6.4

Mar 27, 2026

0.6.3

Mar 25, 2026

0.4.3

Mar 19, 2026

0.4.2

Mar 17, 2026

0.4.1

Mar 15, 2026

0.2.1

Mar 11, 2026

This version

0.2.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_conformal-0.2.0.tar.gz (187.9 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_conformal-0.2.0-py3-none-any.whl (21.0 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file insurance_conformal-0.2.0.tar.gz.

File metadata

Download URL: insurance_conformal-0.2.0.tar.gz
Upload date: Mar 9, 2026
Size: 187.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_conformal-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d324c827f332a00b1d6ed01903922b35810c3d3de9c9eb87bfe549b4edf9cd93`
MD5	`4219e606218538713f47e546f957c950`
BLAKE2b-256	`0517d3d1e6185a6650a8d1f7480c782c4e78f1ba228f4686d5cafc20ddfc4ed3`

See more details on using hashes here.

File details

Details for the file insurance_conformal-0.2.0-py3-none-any.whl.

File metadata

Download URL: insurance_conformal-0.2.0-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 21.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_conformal-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a39ecd2c9f5feaaf353e2b5c1db4084301b859cc752fc6db0021efe9d7c0fcfc`
MD5	`5839cf8502b91b4216c839d94a0c4d01`
BLAKE2b-256	`40e64c98a54d56ebd8818f74c8a75022662976838d7253c9c36e39d3fefa60c8`

See more details on using hashes here.

insurance-conformal 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-conformal

The problem

The solution

Installation

Quick start

Coverage diagnostics

Non-conformity scores

Temporal calibration

Coverage guarantee

Design choices

References

Other Burning Cost libraries

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes