Joint multi-output conformal prediction intervals for insurance pricing models
Project description
insurance-multivariate-conformal
Joint multi-output conformal prediction intervals for insurance pricing models.
The problem
UK pricing teams run separate GLMs for claim frequency (Poisson, lambda ~ 0.05–0.30) and claim severity (Gamma, mu ~ £500–£8,000). The standard workflow produces point estimates. Actuaries asking "how uncertain is this pricing?" have no rigorous answer.
The naive approach — running split conformal prediction separately on each output — gives marginal coverage. If frequency has 95% coverage and severity has 95% coverage, joint coverage (both simultaneously correct) could be as low as 90%. For Solvency II SCR at 99.5%, this difference is not acceptable.
This library solves that. It produces hyperrectangular prediction sets [L_freq, U_freq] × [L_sev, U_sev] with a finite-sample joint coverage guarantee:
P(freq ∈ [L_freq, U_freq] AND sev ∈ [L_sev, U_sev]) ≥ 1 - alpha
No distributional assumptions. No asymptotics. Works with any base model (GLM, GBM, Random Forest) via sklearn's .predict() interface.
The scale problem and why it matters
Frequency residuals are on the order of 0.1–2 claims. Severity residuals are on the order of £500–£3,000. If you aggregate residuals naively (e.g. take the max), severity always dominates. The resulting joint interval degenerates to a severity interval with a token frequency constraint.
The solution, from Fan & Sesia (arXiv:2512.15383): coordinate-wise standardization. For each output dimension j, compute the mean mu_j and std sigma_j of calibration residuals. Then the standardized score (E_j - mu_j) / sigma_j is dimensionless — directly comparable across frequency and severity.
Installation
pip install insurance-multivariate-conformal
Dependencies: numpy, scikit-learn, polars. No PyTorch, no JAX.
Quick start: motor frequency + severity
from insurance_multivariate_conformal import JointConformalPredictor
# You have fitted these separately on a training set
# freq_glm: any model with .predict(X) returning shape (n,)
# sev_gbm: same
predictor = JointConformalPredictor(
models={'frequency': freq_glm, 'severity': sev_gbm},
alpha=0.05, # 95% joint coverage
method='lwc', # Local worst-case — the default, tightest valid method
)
# Calibrate on held-out data (NOT the training set)
predictor.calibrate(
X_cal=X_calibration,
Y_cal={'frequency': y_freq_cal, 'severity': y_sev_cal},
zero_claim_mask=zero_mask_cal, # True where claims == 0 (severity unobserved)
)
# Predict on new policies
joint_set = predictor.predict(X_new)
# Intervals per policy
print(joint_set.lower['frequency']) # Lower frequency bounds
print(joint_set.upper['severity']) # Upper severity bounds
# Verify coverage on test set
cov = joint_set.joint_coverage_check(Y_test)
print(f"Joint coverage: {cov:.1%}") # Should be >= 95%
Solvency II SCR
For Solvency II Article 101 (99.5% VaR), use SolvencyCapitalEstimator:
from insurance_multivariate_conformal import SolvencyCapitalEstimator
scr = SolvencyCapitalEstimator(
models={'frequency': freq_glm, 'severity': sev_gbm},
alpha=0.005, # 99.5% coverage
method='gwc', # GWC is more conservative — appropriate for regulatory use
)
scr.calibrate(X_cal, Y_cal)
result = scr.estimate(X_portfolio)
print(f"Aggregate SCR: £{result.aggregate_scr:,.0f}")
print(f"Coverage guarantee: {result.coverage_guarantee:.1%}")
print(f"Calibration set size: {result.n_cal}")
The coverage guarantee is finite-sample valid: P(loss ≤ SCR_upper) ≥ 99.5% with no distributional assumption. At n_cal=1000, the guarantee is ≥ 99.5% with at most 0.1% excess conservatism.
Methods
Four methods, in increasing order of statistical efficiency:
| Method | Joint coverage | Width | When to use |
|---|---|---|---|
bonferroni |
Valid | Widest | Baseline; guaranteed under any correlation |
sidak |
Valid (independence only) | Slightly narrower | Only if outputs are independent |
gwc |
Valid | Moderate | Regulatory use (conservative, simpler) |
lwc |
Valid | Tightest | Production pricing (default) |
LWC (Local Worst-Case, Fan & Sesia Algorithm 2) is the recommended default. It partitions calibration observations by which dimension is the binding constraint, then computes a group-specific quantile. This is 20–35% tighter than Bonferroni on typical insurance data while maintaining identical joint coverage guarantees.
Zero-claim masking
For policies where observed claims = 0, severity is unobserved. Pass a boolean mask to calibrate():
zero_mask = (y_freq_cal == 0) # True where no claims were made
predictor.calibrate(X_cal, Y_cal, zero_claim_mask=zero_mask)
This sets severity residuals to 0 for zero-claim observations — conservative but valid. The effect is to widen the severity interval slightly (treating zero-claim obs as perfectly predicted for severity), which maintains the joint coverage guarantee.
Diagnostics
from insurance_multivariate_conformal import coverage_report, compare_methods
# Validate coverage on a test set
report = coverage_report(predictor, X_test, Y_test)
print(report['joint_coverage']) # >= 1 - alpha?
print(report['marginal_coverages']) # Per-dimension
print(report['mean_widths']) # Interval width efficiency
# Compare all methods head-to-head
results = compare_methods(
models=models,
X_cal=X_cal, Y_cal=Y_cal,
X_test=X_test, Y_test=Y_test,
alpha=0.05,
)
for method, rep in results.items():
print(f"{method}: coverage={rep['joint_coverage']:.1%}, "
f"width_freq={rep['mean_widths']['frequency']:.4f}")
Multi-peril home insurance (d=3)
predictor_3d = JointConformalPredictor(
models={
'flood': flood_rf,
'fire': fire_glm,
'subsidence': sub_gbm,
},
alpha=0.05,
method='lwc',
)
predictor_3d.calibrate(X_cal, Y_cal_3d)
joint_3d = predictor_3d.predict(X_policies)
# Volume = width_flood * width_fire * width_sub
print(joint_3d.volume().mean())
Coverage guarantee
Under exchangeability of calibration and test data:
1 - alpha ≤ P(Y_new ∈ Ĉ(X_new)) ≤ 1 - alpha + 1/(n_cal + 1)
At n_cal=199: coverage ∈ [0.950, 0.955] for alpha=0.05. At n_cal=999: coverage ∈ [0.950, 0.951].
For SCR at 99.5%: n_cal=999 gives coverage ∈ [0.995, 0.996].
References
- Fan & Sesia (2025). Interpretable Multivariate Conformal Prediction with Fast Transductive Standardization. arXiv:2512.15383. — Primary algorithm (GWC, LWC).
- Hong (2025). Conformal prediction of future insurance claims in the regression problem. arXiv:2503.03659. — Solvency II SCR framing.
- Braun et al. (2025). Multivariate Standardized Residuals for Conformal Prediction. arXiv:2507.20941. — Ellipsoidal alternative (not implemented here; requires PyTorch).
License
Apache-2.0. Copyright 2026 Burning Cost.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_multivariate_conformal-0.1.0.tar.gz.
File metadata
- Download URL: insurance_multivariate_conformal-0.1.0.tar.gz
- Upload date:
- Size: 42.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f7a9ec0101341d1d0321e52956742f44ad3f51671c1b8b2a53ac31f84286257
|
|
| MD5 |
27b8b0d66bc0a3bc71311e47463c9110
|
|
| BLAKE2b-256 |
60c2f78dbcb55db65994d64d74bc8682ac324a21cb4f18e098cbd9a0f9b9445c
|
File details
Details for the file insurance_multivariate_conformal-0.1.0-py3-none-any.whl.
File metadata
- Download URL: insurance_multivariate_conformal-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
287106a7e242bed18f8af397548629c1824275f23369c77cb0c5ff414d0497dd
|
|
| MD5 |
b9896c3be95501ae4ccf6b509bb4133a
|
|
| BLAKE2b-256 |
4b4fb6d11c9a64366bdc1f1ef962e6e0476525fbbe5f52b12db4a2e6770d4108
|