Skip to main content

Spatially weighted conformal prediction intervals for geographically calibrated insurance pricing

Project description

insurance-spatial-conformal

Spatially weighted conformal prediction intervals for geographically calibrated insurance pricing.

The problem

You've built a home insurance pricing model. You evaluate it nationally — 90% coverage on the test set, right on target. Your actuary signs off. Your model goes live.

Then someone runs a postcode-level diagnostic and finds that coverage in Hackney is 73% and coverage in rural Devon is 96%. Your nationally correct model is systematically under-covering urban risks and over-covering rural ones.

This is the exchangeability problem in conformal prediction. Standard split conformal assumes that calibration scores and test scores are drawn from the same distribution — interchangeable. That's fine nationally, but it breaks geographically. A semi-detached in Hackney and a farmhouse in Devon have materially different loss distributions, and treating all calibration scores as equally relevant to both is wrong.

The fix is geographic kernel weighting: when computing the prediction interval for a test property, weight the calibration scores by proximity. Properties in Hackney get high weight from the Hackney calibration data, low weight from the Devon data. The quantile you use for the interval reflects local behaviour, not national average behaviour.

This library implements that fix for UK insurance pricing.

What it does

  • Spatially weighted conformal prediction using Gaussian, Epanechnikov, or uniform (nearest-neighbour) spatial kernels
  • Tweedie Pearson non-conformity scores — variance-stabilised scores for GLM and GBM models with Tweedie/compound Poisson objectives
  • Cross-validated bandwidth selection using spatial blocking CV with MACG objective — the bandwidth that minimises geographic coverage gaps
  • MACG diagnostic (Mean Absolute Coverage Gap) across a spatial grid, plus per-region breakdown for FCA Consumer Duty reporting
  • UK postcode geocoding via pgeocode with outward-code fallback

Installation

pip install insurance-spatial-conformal

Optional geographic visualisation dependencies:

pip install insurance-spatial-conformal[geo]

Quickstart

from insurance_spatial_conformal import SpatialConformalPredictor

# Your fitted pricing model (LightGBM, XGBoost, sklearn, CatBoost — anything with predict())
# Already split your data into train / calibration / test

scp = SpatialConformalPredictor(
    model=fitted_lgbm,
    nonconformity='pearson_tweedie',
    tweedie_power=1.5,
    bandwidth_km=20.0,       # 20 km Gaussian kernel; or None to auto-select
)

# Calibrate on holdout set with coordinates
scp.calibrate(X_cal, y_cal, lat=lat_cal, lon=lon_cal)

# Predict intervals for new business
result = scp.predict_interval(X_test, lat=lat_test, lon=lon_test, alpha=0.10)

print(result.lower[:5])   # lower bounds
print(result.upper[:5])   # upper bounds
print(result.point[:5])   # point predictions from model

Using postcodes instead of coordinates:

from insurance_spatial_conformal import PostcodeGeocoder

gc = PostcodeGeocoder()
lat_cal, lon_cal = gc.geocode(postcode_list_cal)
scp.calibrate(X_cal, y_cal, lat=lat_cal, lon=lon_cal)

Auto-selecting bandwidth via cross-validation:

scp = SpatialConformalPredictor(model=fitted_model, bandwidth_km=None)
result = scp.calibrate(
    X_cal, y_cal, lat=lat_cal, lon=lon_cal,
    cv_candidates_km=[2, 5, 10, 20, 30, 50],
    cv_folds=5,
)
print(f"CV-selected bandwidth: {result.bandwidth_km} km")

Coverage diagnostics

from insurance_spatial_conformal import SpatialCoverageReport

report = SpatialCoverageReport(scp)
result = report.evaluate(X_val, y_val, lat=lat_val, lon=lon_val, alpha=0.10)

print(report.summary())
# === Spatial Coverage Report ===
#   Validation set: 5,000 observations
#   Target coverage (1-alpha): 90.0%
#   Marginal coverage: 0.901
#   Coverage gap: -0.0010
#   MACG (312 grid cells): 0.0187
#   Bandwidth: 20.0 km
#   Kernel: gaussian

# Coverage map — green = on target, red = under/over covered
fig = report.coverage_map(resolution=20)
fig.savefig("coverage_by_postcode.png", dpi=150)

# FCA Consumer Duty table — coverage by segment
table = report.fca_consumer_duty_table(region_labels=county_labels)
print(table.filter(pl.col("flag") == "REVIEW"))

Non-conformity score choice

The score determines the shape of the prediction interval. For insurance pricing:

Score Use when Interval shape
pearson_tweedie Tweedie GLM/GBM (default) Width scales as yhat^(p/2)
pearson Poisson frequency model Width scales as sqrt(yhat)
scaled_absolute Two-model approach with spread model Width scales with difficulty
absolute Baseline only Fixed-width regardless of risk level
# Tweedie power 1.5 = compound Poisson-Gamma (typical burning cost)
scp = SpatialConformalPredictor(
    model=model, nonconformity='pearson_tweedie', tweedie_power=1.5
)

# Two-model approach: spread model predicts |y - yhat|
spread_model = LGBMRegressor().fit(X_cal, np.abs(y_cal - yhat_cal))
scp = SpatialConformalPredictor(
    model=model, nonconformity='scaled_absolute', spread_model=spread_model
)

API reference

SpatialConformalPredictor

SpatialConformalPredictor(
    model,                        # fitted sklearn-compatible model
    nonconformity='pearson_tweedie',
    tweedie_power=1.5,
    spatial_kernel='gaussian',    # 'gaussian' | 'epanechnikov' | 'uniform'
    bandwidth_km=None,            # None = CV-select; float = fixed
    spread_model=None,            # required for 'scaled_absolute'
    n_eff_min=30,                 # warn if effective N < this threshold
)

.calibrate(X_cal, y_cal, lat=..., lon=..., postcodes=..., exposure=...)
     CalibrationResult

.predict_interval(X_test, lat=..., lon=..., postcodes=..., alpha=0.10)
     IntervalResult  (.lower, .upper, .point, .n_effective, .bandwidth_km)

.spatial_coverage_report(X_val, y_val, lat=..., lon=...)
     SpatialCoverageReport

BandwidthSelector

BandwidthSelector(
    candidates_km=[2, 5, 10, 15, 20, 30, 50],
    cv=5,
    n_eff_min=30,
    metric='macg',
    grid_resolution=10,
)

.select(scores, lat, lon, alpha=0.10)  BandwidthCVResult

SpatialCoverageReport

SpatialCoverageReport(predictor)

.evaluate(X_val, y_val, lat=..., lon=..., alpha=0.10, grid_resolution=20)
     CoverageResult  (.marginal_coverage, .macg, .n_grid_cells)

.coverage_map(resolution=20)  matplotlib Figure
.fca_consumer_duty_table(region_labels=...)  polars DataFrame
.macg_by_region(region_labels)  polars DataFrame
.summary()  str

Design decisions

Haversine distance, not Euclidean. At 55°N (central Scotland), a degree of longitude is ~64 km but a degree of latitude is ~111 km. Euclidean distance on decimal degrees would produce elliptical kernels skewed north-south by ~42%. All distance calculations use haversine.

Bandwidth parameterisation as km, not eta. The Hjort et al. paper uses eta = bandwidth^2 internally. We expose the parameter in kilometres because that's what a pricing actuary can reason about — "20 km bandwidth" is meaningful, "eta = 400,000 m²" is not.

Tibshirani (2019) augmentation. The finite-sample coverage guarantee requires augmenting the calibration distribution with a point at +∞ with weight proportional to 1/(n+1). This ensures the marginal guarantee holds exactly at 1−α, not just approximately.

Spatial blocking CV, not random folds. Random CV folds allow geographically proximate calibration and validation points into the same split, which leaks spatial information and makes the CV loss overly optimistic. K-means on coordinates gives spatially contiguous folds.

Kish effective N warning. In rural areas with sparse data, a narrow bandwidth might have effective N < 30 at some test points. The predictor warns rather than erroring — the interval is still produced, but flagged. In practice, the CV bandwidth selector includes a floor on effective N.

Polars output for DataFrames. Diagnostics and the FCA table return Polars DataFrames rather than pandas. Polars is faster for the typical operations (group-by, filter, sort) and has cleaner null semantics. Call .to_pandas() if your downstream tools need pandas.

References

Hjort, N. L., Jullum, M., & Loland, A. (2025). Uncertainty quantification in automated valuation models with spatially weighted conformal prediction. International Journal of Data Science and Analytics (Springer). doi:10.1007/s41060-025-00862-4. arXiv:2312.06531.

Tibshirani, R. J., Barber, R. F., Candes, E. J., & Ramdas, A. (2019). Conformal prediction under covariate shift. NeurIPS 2019.

Manna, S. et al. (2025). Distribution-free prediction sets for Tweedie regression. arXiv:2507.06921.

Kish, L. (1965). Survey Sampling. Wiley.

Roberts, D. R. et al. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8), 913-929.

Licence

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_spatial_conformal-0.1.0.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_spatial_conformal-0.1.0-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file insurance_spatial_conformal-0.1.0.tar.gz.

File metadata

  • Download URL: insurance_spatial_conformal-0.1.0.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_spatial_conformal-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a2ee55cf4351505cbad446845f4e87ad650f3a73bdc25b2aeec44bf84eb8fec2
MD5 3f14a112047858adb41b0f32040eb921
BLAKE2b-256 63985fb9ebbd6d94870d7480d4604a47b0500be4343520830834aff5e5ba1303

See more details on using hashes here.

File details

Details for the file insurance_spatial_conformal-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: insurance_spatial_conformal-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_spatial_conformal-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c13db5b1051fa38d9246dbd1af5d404e0d73c656bf8fed0c4ff6259aeb1dd3b
MD5 6c73f8b1871849ed9febf28b28342299
BLAKE2b-256 1151b2d63dee777217689e113268ee0418d309dc7416d18ba3fcadf35b59bb2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page