Spatially weighted conformal prediction intervals for geographically calibrated insurance pricing
Project description
insurance-spatial-conformal
Spatially weighted conformal prediction intervals for geographically calibrated insurance pricing.
The problem
You've built a home insurance pricing model. You evaluate it nationally — 90% coverage on the test set, right on target. Your actuary signs off. Your model goes live.
Then someone runs a postcode-level diagnostic and finds that coverage in Hackney is 73% and coverage in rural Devon is 96%. Your nationally correct model is systematically under-covering urban risks and over-covering rural ones.
This is the exchangeability problem in conformal prediction. Standard split conformal assumes that calibration scores and test scores are drawn from the same distribution — interchangeable. That's fine nationally, but it breaks geographically. A semi-detached in Hackney and a farmhouse in Devon have materially different loss distributions, and treating all calibration scores as equally relevant to both is wrong.
The fix is geographic kernel weighting: when computing the prediction interval for a test property, weight the calibration scores by proximity. Properties in Hackney get high weight from the Hackney calibration data, low weight from the Devon data. The quantile you use for the interval reflects local behaviour, not national average behaviour.
This library implements that fix for UK insurance pricing.
What it does
- Spatially weighted conformal prediction using Gaussian, Epanechnikov, or uniform (nearest-neighbour) spatial kernels
- Tweedie Pearson non-conformity scores — variance-stabilised scores for GLM and GBM models with Tweedie/compound Poisson objectives
- Cross-validated bandwidth selection using spatial blocking CV with MACG objective — the bandwidth that minimises geographic coverage gaps
- MACG diagnostic (Mean Absolute Coverage Gap) across a spatial grid, plus per-region breakdown for FCA Consumer Duty reporting
- UK postcode geocoding via pgeocode with outward-code fallback
Installation
pip install insurance-spatial-conformal
Optional geographic visualisation dependencies:
pip install insurance-spatial-conformal[geo]
Quickstart
from insurance_spatial_conformal import SpatialConformalPredictor
# Your fitted pricing model (LightGBM, XGBoost, sklearn, CatBoost — anything with predict())
# Already split your data into train / calibration / test
scp = SpatialConformalPredictor(
model=fitted_lgbm,
nonconformity='pearson_tweedie',
tweedie_power=1.5,
bandwidth_km=20.0, # 20 km Gaussian kernel; or None to auto-select
)
# Calibrate on holdout set with coordinates
scp.calibrate(X_cal, y_cal, lat=lat_cal, lon=lon_cal)
# Predict intervals for new business
result = scp.predict_interval(X_test, lat=lat_test, lon=lon_test, alpha=0.10)
print(result.lower[:5]) # lower bounds
print(result.upper[:5]) # upper bounds
print(result.point[:5]) # point predictions from model
Using postcodes instead of coordinates:
from insurance_spatial_conformal import PostcodeGeocoder
gc = PostcodeGeocoder()
lat_cal, lon_cal = gc.geocode(postcode_list_cal)
scp.calibrate(X_cal, y_cal, lat=lat_cal, lon=lon_cal)
Auto-selecting bandwidth via cross-validation:
scp = SpatialConformalPredictor(model=fitted_model, bandwidth_km=None)
result = scp.calibrate(
X_cal, y_cal, lat=lat_cal, lon=lon_cal,
cv_candidates_km=[2, 5, 10, 20, 30, 50],
cv_folds=5,
)
print(f"CV-selected bandwidth: {result.bandwidth_km} km")
Coverage diagnostics
from insurance_spatial_conformal import SpatialCoverageReport
report = SpatialCoverageReport(scp)
result = report.evaluate(X_val, y_val, lat=lat_val, lon=lon_val, alpha=0.10)
print(report.summary())
# === Spatial Coverage Report ===
# Validation set: 5,000 observations
# Target coverage (1-alpha): 90.0%
# Marginal coverage: 0.901
# Coverage gap: -0.0010
# MACG (312 grid cells): 0.0187
# Bandwidth: 20.0 km
# Kernel: gaussian
# Coverage map — green = on target, red = under/over covered
fig = report.coverage_map(resolution=20)
fig.savefig("coverage_by_postcode.png", dpi=150)
# FCA Consumer Duty table — coverage by segment
table = report.fca_consumer_duty_table(region_labels=county_labels)
print(table.filter(pl.col("flag") == "REVIEW"))
Non-conformity score choice
The score determines the shape of the prediction interval. For insurance pricing:
| Score | Use when | Interval shape |
|---|---|---|
pearson_tweedie |
Tweedie GLM/GBM (default) | Width scales as yhat^(p/2) |
pearson |
Poisson frequency model | Width scales as sqrt(yhat) |
scaled_absolute |
Two-model approach with spread model | Width scales with difficulty |
absolute |
Baseline only | Fixed-width regardless of risk level |
# Tweedie power 1.5 = compound Poisson-Gamma (typical burning cost)
scp = SpatialConformalPredictor(
model=model, nonconformity='pearson_tweedie', tweedie_power=1.5
)
# Two-model approach: spread model predicts |y - yhat|
spread_model = LGBMRegressor().fit(X_cal, np.abs(y_cal - yhat_cal))
scp = SpatialConformalPredictor(
model=model, nonconformity='scaled_absolute', spread_model=spread_model
)
API reference
SpatialConformalPredictor
SpatialConformalPredictor(
model, # fitted sklearn-compatible model
nonconformity='pearson_tweedie',
tweedie_power=1.5,
spatial_kernel='gaussian', # 'gaussian' | 'epanechnikov' | 'uniform'
bandwidth_km=None, # None = CV-select; float = fixed
spread_model=None, # required for 'scaled_absolute'
n_eff_min=30, # warn if effective N < this threshold
)
.calibrate(X_cal, y_cal, lat=..., lon=..., postcodes=..., exposure=...)
→ CalibrationResult
.predict_interval(X_test, lat=..., lon=..., postcodes=..., alpha=0.10)
→ IntervalResult (.lower, .upper, .point, .n_effective, .bandwidth_km)
.spatial_coverage_report(X_val, y_val, lat=..., lon=...)
→ SpatialCoverageReport
BandwidthSelector
BandwidthSelector(
candidates_km=[2, 5, 10, 15, 20, 30, 50],
cv=5,
n_eff_min=30,
metric='macg',
grid_resolution=10,
)
.select(scores, lat, lon, alpha=0.10) → BandwidthCVResult
SpatialCoverageReport
SpatialCoverageReport(predictor)
.evaluate(X_val, y_val, lat=..., lon=..., alpha=0.10, grid_resolution=20)
→ CoverageResult (.marginal_coverage, .macg, .n_grid_cells)
.coverage_map(resolution=20) → matplotlib Figure
.fca_consumer_duty_table(region_labels=...) → polars DataFrame
.macg_by_region(region_labels) → polars DataFrame
.summary() → str
Design decisions
Haversine distance, not Euclidean. At 55°N (central Scotland), a degree of longitude is ~64 km but a degree of latitude is ~111 km. Euclidean distance on decimal degrees would produce elliptical kernels skewed north-south by ~42%. All distance calculations use haversine.
Bandwidth parameterisation as km, not eta. The Hjort et al. paper uses eta = bandwidth^2 internally. We expose the parameter in kilometres because that's what a pricing actuary can reason about — "20 km bandwidth" is meaningful, "eta = 400,000 m²" is not.
Tibshirani (2019) augmentation. The finite-sample coverage guarantee requires augmenting the calibration distribution with a point at +∞ with weight proportional to 1/(n+1). This ensures the marginal guarantee holds exactly at 1−α, not just approximately.
Spatial blocking CV, not random folds. Random CV folds allow geographically proximate calibration and validation points into the same split, which leaks spatial information and makes the CV loss overly optimistic. K-means on coordinates gives spatially contiguous folds.
Kish effective N warning. In rural areas with sparse data, a narrow bandwidth might have effective N < 30 at some test points. The predictor warns rather than erroring — the interval is still produced, but flagged. In practice, the CV bandwidth selector includes a floor on effective N.
Polars output for DataFrames. Diagnostics and the FCA table return Polars DataFrames rather than pandas. Polars is faster for the typical operations (group-by, filter, sort) and has cleaner null semantics. Call .to_pandas() if your downstream tools need pandas.
References
Hjort, N. L., Jullum, M., & Loland, A. (2025). Uncertainty quantification in automated valuation models with spatially weighted conformal prediction. International Journal of Data Science and Analytics (Springer). doi:10.1007/s41060-025-00862-4. arXiv:2312.06531.
Tibshirani, R. J., Barber, R. F., Candes, E. J., & Ramdas, A. (2019). Conformal prediction under covariate shift. NeurIPS 2019.
Manna, S. et al. (2025). Distribution-free prediction sets for Tweedie regression. arXiv:2507.06921.
Kish, L. (1965). Survey Sampling. Wiley.
Roberts, D. R. et al. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8), 913-929.
Licence
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_spatial_conformal-0.1.0.tar.gz.
File metadata
- Download URL: insurance_spatial_conformal-0.1.0.tar.gz
- Upload date:
- Size: 39.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2ee55cf4351505cbad446845f4e87ad650f3a73bdc25b2aeec44bf84eb8fec2
|
|
| MD5 |
3f14a112047858adb41b0f32040eb921
|
|
| BLAKE2b-256 |
63985fb9ebbd6d94870d7480d4604a47b0500be4343520830834aff5e5ba1303
|
File details
Details for the file insurance_spatial_conformal-0.1.0-py3-none-any.whl.
File metadata
- Download URL: insurance_spatial_conformal-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c13db5b1051fa38d9246dbd1af5d404e0d73c656bf8fed0c4ff6259aeb1dd3b
|
|
| MD5 |
6c73f8b1871849ed9febf28b28342299
|
|
| BLAKE2b-256 |
1151b2d63dee777217689e113268ee0418d309dc7416d18ba3fcadf35b59bb2a
|