Skip to main content

Spatially Clustered Mixture of Experts for joint frequency-severity insurance pricing

Project description

insurance-scmoe

Spatially Clustered Mixture of Experts for joint frequency-severity insurance pricing.

The problem

Standard insurance pricing models treat frequency and severity as separate GLMs. This misses the joint structure: high-frequency policyholders are often also high-severity. On top of that, a typical two-GLM approach applies a territorial factor as a post-hoc correction — either a manual zone scheme or a spatial GLM — but this is independent of the risk segmentation.

SC-MoE (Spatially Clustered Mixture of Experts, NAAJ 2025) solves both problems simultaneously. It discovers K latent risk types — low-frequency/low-severity, medium, high — and enforces geographic continuity so that nearby postcodes are encouraged to belong to the same risk class. The territory structure emerges from the model rather than being imposed on top of it.

How it works

Each latent class k has:

  • Poisson(lambda_k) claim frequency
  • Gamma(alpha_k, beta_k) severity per claim

A gating network assigns each policyholder a probability of belonging to each class:

pi_k(x_i) = exp(alpha_k' x_i) / sum_j exp(alpha_j' x_j)

The spatial penalty (lambda/2) * trace(alpha_area' L alpha_area) applied to the graph Laplacian L of the geographic adjacency graph encourages neighbouring areas to have similar class memberships.

Estimation uses MM-ADMM: a quadratic MM surrogate (Böhning 1992) handles the non-linear logistic objective, and ADMM (Boyd et al. 2010) solves the penalised quadratic with the sparse Laplacian structure.

Installation

pip install insurance-scmoe

For building spatial graphs from shapefiles (requires geopandas and libpysal):

pip install insurance-scmoe[spatial]

Quick start

from insurance_scmoe import SCMoE, simulate_scmoe

# Simulate 2,000 policyholders across 100 postcode sectors (10x10 grid)
X, y_freq, y_sev, area_ids, graph, truth = simulate_scmoe(
    n=2000, K=3, n_areas=100, n_rows=10, n_cols=10, seed=42
)

# Fit the model
model = SCMoE(n_components=3, lam=1.0, max_iter=100, random_state=42)
model.fit(X, y_freq, y_sev, graph, area_ids)

# Pure premium predictions
pp = model.predict_pure_premium(X, area_ids)
print(f"Mean pure premium: {pp.mean():.4f}")

# Fitted expert parameters (ordered by ascending frequency rate)
print(f"Poisson rates lambda_k:  {model.expert_.lambda_}")
print(f"Gamma shapes alpha_k:    {model.expert_.alpha_}")
print(f"Gamma rates beta_k:      {model.expert_.beta_}")

# Class membership probabilities
pi = model.predict_proba(X, area_ids)  # shape (n, K)

Building from a real spatial dataset

If you have a GeoDataFrame of postcode sectors:

import geopandas as gpd
from insurance_scmoe import SpatialGraph

geo = gpd.read_file("postcode_sectors.shp")
graph = SpatialGraph.from_geodataframe(geo, id_col="sector_code", method="queen")

Or from a pre-computed adjacency edge list:

graph = SpatialGraph.from_adjacency_csv("adjacency.csv")
# CSV must have columns: area_i, area_j (zero-based integer indices)

Model selection

from insurance_scmoe import ModelSelector

sel = ModelSelector(
    k_range=(2, 6),
    lam_grid=[0.0, 0.5, 1.0, 2.0, 5.0],
    max_iter=100,
    random_state=0,
    verbose=True,
)
sel.fit(X, y_freq, y_sev, graph, area_ids)

print(f"Best K: {sel.best_k_},  Best lambda: {sel.best_lam_}")
print(sel.summary())

best_model = sel.best_model_

API reference

SCMoE(n_components, lam, rho, max_iter, tol, admm_max_iter, random_state, verbose)

Main model class.

Method Description
fit(X, y_freq, y_sev, graph, area_ids) Fit via MM-ADMM ECM
predict_proba(X, area_ids) Class membership probabilities, shape (n, K)
predict_frequency(X, area_ids) E[N | x, area], shape (n,)
predict_severity(X, area_ids) E[X | x, area], shape (n,)
predict_pure_premium(X, area_ids) E[N*S | x, area], shape (n,)
log_likelihood(X, y_freq, y_sev, area_ids) Observed-data log-likelihood
bic(...) / aic(...) Information criteria

SpatialGraph

Method Description
from_adjacency_matrix(W) From dense or sparse binary matrix
from_adjacency_csv(path) From edge-list CSV
from_geodataframe(geo_df, ...) From GeoDataFrame (requires [spatial])
adjacency() Returns sparse W
laplacian() Returns sparse L = D - W

ModelSelector(k_range, lam_grid, ...)

Grid search over K and lambda by BIC. sel.fit(...) then sel.best_model_, sel.summary().

simulate_scmoe(n, K, n_areas, n_rows, n_cols, lam_spatial, seed, ...)

Generates synthetic portfolio data from a ground-truth K-class SC-MoE. Returns (X, y_freq, y_sev, area_ids, graph, truth).

Design notes

Why not PyTorch? All operations are closed-form ADMM linear systems and weighted MLEs. For K up to 8 components and n up to 500k policyholders, numpy/scipy is faster and has no GPU dependency. The Laplacian linear system is sparse and well-conditioned — spsolve handles it cleanly.

Why not Bayesian (PyMC/Stan)? SC-MoE is frequentist penalised likelihood. The spatial term is regularisation, not a prior. MCMC would be 10-100x slower for no gain given the MM-ADMM algorithm's convergence properties.

Gamma parameterisation: f(x; alpha, beta) = beta^alpha / Gamma(alpha) * x^(alpha-1) * exp(-beta*x). Mean = alpha/beta, Var = alpha/beta^2. Newton-Raphson on the profile log-likelihood gives fast, stable shape estimation.

Label switching: Components are re-ordered by ascending lambda_k after fitting. This is not perfect — if two classes have identical rates, order is arbitrary — but it gives reproducible output for most practical cases.

Gating architecture: The ADMM operates at the area level (one alpha vector per postcode sector per class), averaging down to policy level for predictions. This means the spatial penalty acts on the area-level assignment, not on individual policies.

Reference

NAAJ 2025, DOI: 10.1080/10920277.2025.2567283.

LRMoE: Fung, Badescu, Lin (2019). GitHub.

Related packages

  • insurance-spatial: BYM2 spatial random effects in a GLM — territorial smoothing within a single risk model.
  • insurance-nested-glm: Neural embeddings + contiguity-constrained clustering for territory factor construction.
  • insurance-glm-cluster: Fused lasso clustering of factor levels within a GLM.

SC-MoE uniquely combines latent risk type discovery, joint frequency-severity modelling, and geographic continuity enforcement in a single penalised mixture model.

Licence

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_scmoe-0.1.0.tar.gz (36.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_scmoe-0.1.0-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file insurance_scmoe-0.1.0.tar.gz.

File metadata

  • Download URL: insurance_scmoe-0.1.0.tar.gz
  • Upload date:
  • Size: 36.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_scmoe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2c4a8182eeba727b0cea6049b43369d6ad1a5c0169bd30eaf3bd1abbdffdecd7
MD5 01a2ed3f9de00d2a32f120178937e13b
BLAKE2b-256 2c68ef217012827ca6486a6029e944350fcb23f4744be1c79b85f16aa3749cee

See more details on using hashes here.

File details

Details for the file insurance_scmoe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: insurance_scmoe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_scmoe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97e8267c525a2b154a1bbb6fb632f2bf3abefeeb53cef9124c97e6d7b5e697c5
MD5 80b49e32fb376d4929dfb1dc762ddd8d
BLAKE2b-256 67d5bd737f4a810aacb0a144088f5f97d86106801b674d520994bbd4f0c38fb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page