Spatially weighted conformal prediction intervals for geographically calibrated insurance pricing

These details have not been verified by PyPI

Project links

Project description

insurance-spatial-conformal

Spatially weighted conformal prediction intervals for geographically calibrated insurance pricing.

The problem

You've built a home insurance pricing model. You evaluate it nationally — 90% coverage on the test set, right on target. Your actuary signs off. Your model goes live.

Then someone runs a postcode-level diagnostic and finds that coverage in Hackney is 73% and coverage in rural Devon is 96%. Your nationally correct model is systematically under-covering urban risks and over-covering rural ones.

This is the exchangeability problem in conformal prediction. Standard split conformal assumes that calibration scores and test scores are drawn from the same distribution — interchangeable. That's fine nationally, but it breaks geographically. A semi-detached in Hackney and a farmhouse in Devon have materially different loss distributions, and treating all calibration scores as equally relevant to both is wrong.

The fix is geographic kernel weighting: when computing the prediction interval for a test property, weight the calibration scores by proximity. Properties in Hackney get high weight from the Hackney calibration data, low weight from the Devon data. The quantile you use for the interval reflects local behaviour, not national average behaviour.

This library implements that fix for UK insurance pricing.

What it does

Spatially weighted conformal prediction using Gaussian, Epanechnikov, or uniform (nearest-neighbour) spatial kernels
Tweedie Pearson non-conformity scores — variance-stabilised scores for GLM and GBM models with Tweedie/compound Poisson objectives
Cross-validated bandwidth selection using spatial blocking CV with MACG objective — the bandwidth that minimises geographic coverage gaps
MACG diagnostic (Mean Absolute Coverage Gap) across a spatial grid, plus per-region breakdown for FCA Consumer Duty reporting
UK postcode geocoding via pgeocode with outward-code fallback

Installation

pip install insurance-spatial-conformal

Optional geographic visualisation dependencies:

pip install insurance-spatial-conformal[geo]

Quickstart

from insurance_spatial_conformal import SpatialConformalPredictor

# Your fitted pricing model (LightGBM, XGBoost, sklearn, CatBoost — anything with predict())
# Already split your data into train / calibration / test

scp = SpatialConformalPredictor(
    model=fitted_lgbm,
    nonconformity='pearson_tweedie',
    tweedie_power=1.5,
    bandwidth_km=20.0,       # 20 km Gaussian kernel; or None to auto-select
)

# Calibrate on holdout set with coordinates
scp.calibrate(X_cal, y_cal, lat=lat_cal, lon=lon_cal)

# Predict intervals for new business
result = scp.predict_interval(X_test, lat=lat_test, lon=lon_test, alpha=0.10)

print(result.lower[:5])   # lower bounds
print(result.upper[:5])   # upper bounds
print(result.point[:5])   # point predictions from model

Using postcodes instead of coordinates:

from insurance_spatial_conformal import PostcodeGeocoder

gc = PostcodeGeocoder()
lat_cal, lon_cal = gc.geocode(postcode_list_cal)
scp.calibrate(X_cal, y_cal, lat=lat_cal, lon=lon_cal)

Auto-selecting bandwidth via cross-validation:

scp = SpatialConformalPredictor(model=fitted_model, bandwidth_km=None)
result = scp.calibrate(
    X_cal, y_cal, lat=lat_cal, lon=lon_cal,
    cv_candidates_km=[2, 5, 10, 20, 30, 50],
    cv_folds=5,
)
print(f"CV-selected bandwidth: {result.bandwidth_km} km")

Coverage diagnostics

from insurance_spatial_conformal import SpatialCoverageReport

report = SpatialCoverageReport(scp)
result = report.evaluate(X_val, y_val, lat=lat_val, lon=lon_val, alpha=0.10)

print(report.summary())
# === Spatial Coverage Report ===
#   Validation set: 5,000 observations
#   Target coverage (1-alpha): 90.0%
#   Marginal coverage: 0.901
#   Coverage gap: -0.0010
#   MACG (312 grid cells): 0.0187
#   Bandwidth: 20.0 km
#   Kernel: gaussian

# Coverage map — green = on target, red = under/over covered
fig = report.coverage_map(resolution=20)
fig.savefig("coverage_by_postcode.png", dpi=150)

# FCA Consumer Duty table — coverage by segment
table = report.fca_consumer_duty_table(region_labels=county_labels)
print(table.filter(pl.col("flag") == "REVIEW"))

Non-conformity score choice

The score determines the shape of the prediction interval. For insurance pricing:

Score	Use when	Interval shape
`pearson_tweedie`	Tweedie GLM/GBM (default)	Width scales as yhat^(p/2)
`pearson`	Poisson frequency model	Width scales as sqrt(yhat)
`scaled_absolute`	Two-model approach with spread model	Width scales with difficulty
`absolute`	Baseline only	Fixed-width regardless of risk level

# Tweedie power 1.5 = compound Poisson-Gamma (typical burning cost)
scp = SpatialConformalPredictor(
    model=model, nonconformity='pearson_tweedie', tweedie_power=1.5
)

# Two-model approach: spread model predicts |y - yhat|
spread_model = LGBMRegressor().fit(X_cal, np.abs(y_cal - yhat_cal))
scp = SpatialConformalPredictor(
    model=model, nonconformity='scaled_absolute', spread_model=spread_model
)

API reference

SpatialConformalPredictor

SpatialConformalPredictor(
    model,                        # fitted sklearn-compatible model
    nonconformity='pearson_tweedie',
    tweedie_power=1.5,
    spatial_kernel='gaussian',    # 'gaussian' | 'epanechnikov' | 'uniform'
    bandwidth_km=None,            # None = CV-select; float = fixed
    spread_model=None,            # required for 'scaled_absolute'
    n_eff_min=30,                 # warn if effective N < this threshold
)

.calibrate(X_cal, y_cal, lat=..., lon=..., postcodes=..., exposure=...)
    → CalibrationResult

.predict_interval(X_test, lat=..., lon=..., postcodes=..., alpha=0.10)
    → IntervalResult  (.lower, .upper, .point, .n_effective, .bandwidth_km)

.spatial_coverage_report(X_val, y_val, lat=..., lon=...)
    → SpatialCoverageReport

BandwidthSelector

BandwidthSelector(
    candidates_km=[2, 5, 10, 15, 20, 30, 50],
    cv=5,
    n_eff_min=30,
    metric='macg',
    grid_resolution=10,
)

.select(scores, lat, lon, alpha=0.10) → BandwidthCVResult

SpatialCoverageReport

SpatialCoverageReport(predictor)

.evaluate(X_val, y_val, lat=..., lon=..., alpha=0.10, grid_resolution=20)
    → CoverageResult  (.marginal_coverage, .macg, .n_grid_cells)

.coverage_map(resolution=20) → matplotlib Figure
.fca_consumer_duty_table(region_labels=...) → polars DataFrame
.macg_by_region(region_labels) → polars DataFrame
.summary() → str

Design decisions

Haversine distance, not Euclidean. At 55°N (central Scotland), a degree of longitude is ~64 km but a degree of latitude is ~111 km. Euclidean distance on decimal degrees would produce elliptical kernels skewed north-south by ~42%. All distance calculations use haversine.

Bandwidth parameterisation as km, not eta. The Hjort et al. paper uses eta = bandwidth^2 internally. We expose the parameter in kilometres because that's what a pricing actuary can reason about — "20 km bandwidth" is meaningful, "eta = 400,000 m²" is not.

Tibshirani (2019) augmentation. The finite-sample coverage guarantee requires augmenting the calibration distribution with a point at +∞ with weight proportional to 1/(n+1). This ensures the marginal guarantee holds exactly at 1−α, not just approximately.

Spatial blocking CV, not random folds. Random CV folds allow geographically proximate calibration and validation points into the same split, which leaks spatial information and makes the CV loss overly optimistic. K-means on coordinates gives spatially contiguous folds.

Kish effective N warning. In rural areas with sparse data, a narrow bandwidth might have effective N < 30 at some test points. The predictor warns rather than erroring — the interval is still produced, but flagged. In practice, the CV bandwidth selector includes a floor on effective N.

Polars output for DataFrames. Diagnostics and the FCA table return Polars DataFrames rather than pandas. Polars is faster for the typical operations (group-by, filter, sort) and has cleaner null semantics. Call .to_pandas() if your downstream tools need pandas.

References

Hjort, N. L., Jullum, M., & Loland, A. (2025). Uncertainty quantification in automated valuation models with spatially weighted conformal prediction. International Journal of Data Science and Analytics (Springer). doi:10.1007/s41060-025-00862-4. arXiv:2312.06531.

Tibshirani, R. J., Barber, R. F., Candes, E. J., & Ramdas, A. (2019). Conformal prediction under covariate shift. NeurIPS 2019.

Manna, S. et al. (2025). Distribution-free prediction sets for Tweedie regression. arXiv:2507.06921.

Kish, L. (1965). Survey Sampling. Wiley.

Roberts, D. R. et al. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8), 913-929.

Licence

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_spatial_conformal-0.1.0.tar.gz (39.9 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_spatial_conformal-0.1.0-py3-none-any.whl (31.0 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file insurance_spatial_conformal-0.1.0.tar.gz.

File metadata

Download URL: insurance_spatial_conformal-0.1.0.tar.gz
Upload date: Mar 13, 2026
Size: 39.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_spatial_conformal-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a2ee55cf4351505cbad446845f4e87ad650f3a73bdc25b2aeec44bf84eb8fec2`
MD5	`3f14a112047858adb41b0f32040eb921`
BLAKE2b-256	`63985fb9ebbd6d94870d7480d4604a47b0500be4343520830834aff5e5ba1303`

See more details on using hashes here.

File details

Details for the file insurance_spatial_conformal-0.1.0-py3-none-any.whl.

File metadata

Download URL: insurance_spatial_conformal-0.1.0-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 31.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_spatial_conformal-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5c13db5b1051fa38d9246dbd1af5d404e0d73c656bf8fed0c4ff6259aeb1dd3b`
MD5	`6c73f8b1871849ed9febf28b28342299`
BLAKE2b-256	`1151b2d63dee777217689e113268ee0418d309dc7416d18ba3fcadf35b59bb2a`

See more details on using hashes here.

insurance-spatial-conformal 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-spatial-conformal

The problem

What it does

Installation

Quickstart

Coverage diagnostics

Non-conformity score choice

API reference

SpatialConformalPredictor

BandwidthSelector

SpatialCoverageReport

Design decisions

References

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes