Spatially Clustered Mixture of Experts for joint frequency-severity insurance pricing

These details have not been verified by PyPI

Project links

Project description

insurance-scmoe

Spatially Clustered Mixture of Experts for joint frequency-severity insurance pricing.

The problem

Standard insurance pricing models treat frequency and severity as separate GLMs. This misses the joint structure: high-frequency policyholders are often also high-severity. On top of that, a typical two-GLM approach applies a territorial factor as a post-hoc correction — either a manual zone scheme or a spatial GLM — but this is independent of the risk segmentation.

SC-MoE (Spatially Clustered Mixture of Experts, NAAJ 2025) solves both problems simultaneously. It discovers K latent risk types — low-frequency/low-severity, medium, high — and enforces geographic continuity so that nearby postcodes are encouraged to belong to the same risk class. The territory structure emerges from the model rather than being imposed on top of it.

How it works

Each latent class k has:

Poisson(lambda_k) claim frequency
Gamma(alpha_k, beta_k) severity per claim

A gating network assigns each policyholder a probability of belonging to each class:

pi_k(x_i) = exp(alpha_k' x_i) / sum_j exp(alpha_j' x_j)

The spatial penalty (lambda/2) * trace(alpha_area' L alpha_area) applied to the graph Laplacian L of the geographic adjacency graph encourages neighbouring areas to have similar class memberships.

Estimation uses MM-ADMM: a quadratic MM surrogate (Böhning 1992) handles the non-linear logistic objective, and ADMM (Boyd et al. 2010) solves the penalised quadratic with the sparse Laplacian structure.

Installation

pip install insurance-scmoe

For building spatial graphs from shapefiles (requires geopandas and libpysal):

pip install insurance-scmoe[spatial]

Quick start

from insurance_scmoe import SCMoE, simulate_scmoe

# Simulate 2,000 policyholders across 100 postcode sectors (10x10 grid)
X, y_freq, y_sev, area_ids, graph, truth = simulate_scmoe(
    n=2000, K=3, n_areas=100, n_rows=10, n_cols=10, seed=42
)

# Fit the model
model = SCMoE(n_components=3, lam=1.0, max_iter=100, random_state=42)
model.fit(X, y_freq, y_sev, graph, area_ids)

# Pure premium predictions
pp = model.predict_pure_premium(X, area_ids)
print(f"Mean pure premium: {pp.mean():.4f}")

# Fitted expert parameters (ordered by ascending frequency rate)
print(f"Poisson rates lambda_k:  {model.expert_.lambda_}")
print(f"Gamma shapes alpha_k:    {model.expert_.alpha_}")
print(f"Gamma rates beta_k:      {model.expert_.beta_}")

# Class membership probabilities
pi = model.predict_proba(X, area_ids)  # shape (n, K)

Building from a real spatial dataset

If you have a GeoDataFrame of postcode sectors:

import geopandas as gpd
from insurance_scmoe import SpatialGraph

geo = gpd.read_file("postcode_sectors.shp")
graph = SpatialGraph.from_geodataframe(geo, id_col="sector_code", method="queen")

Or from a pre-computed adjacency edge list:

graph = SpatialGraph.from_adjacency_csv("adjacency.csv")
# CSV must have columns: area_i, area_j (zero-based integer indices)

Model selection

from insurance_scmoe import ModelSelector

sel = ModelSelector(
    k_range=(2, 6),
    lam_grid=[0.0, 0.5, 1.0, 2.0, 5.0],
    max_iter=100,
    random_state=0,
    verbose=True,
)
sel.fit(X, y_freq, y_sev, graph, area_ids)

print(f"Best K: {sel.best_k_},  Best lambda: {sel.best_lam_}")
print(sel.summary())

best_model = sel.best_model_

API reference

`SCMoE(n_components, lam, rho, max_iter, tol, admm_max_iter, random_state, verbose)`

Main model class.

Method	Description
`fit(X, y_freq, y_sev, graph, area_ids)`	Fit via MM-ADMM ECM
`predict_proba(X, area_ids)`	Class membership probabilities, shape (n, K)
`predict_frequency(X, area_ids)`	E[N \| x, area], shape (n,)
`predict_severity(X, area_ids)`	E[X \| x, area], shape (n,)
`predict_pure_premium(X, area_ids)`	E[N*S \| x, area], shape (n,)
`log_likelihood(X, y_freq, y_sev, area_ids)`	Observed-data log-likelihood
`bic(...)` / `aic(...)`	Information criteria

`SpatialGraph`

Method	Description
`from_adjacency_matrix(W)`	From dense or sparse binary matrix
`from_adjacency_csv(path)`	From edge-list CSV
`from_geodataframe(geo_df, ...)`	From GeoDataFrame (requires `[spatial]`)
`adjacency()`	Returns sparse W
`laplacian()`	Returns sparse L = D - W

`ModelSelector(k_range, lam_grid, ...)`

Grid search over K and lambda by BIC. sel.fit(...) then sel.best_model_, sel.summary().

`simulate_scmoe(n, K, n_areas, n_rows, n_cols, lam_spatial, seed, ...)`

Generates synthetic portfolio data from a ground-truth K-class SC-MoE. Returns (X, y_freq, y_sev, area_ids, graph, truth).

Design notes

Why not PyTorch? All operations are closed-form ADMM linear systems and weighted MLEs. For K up to 8 components and n up to 500k policyholders, numpy/scipy is faster and has no GPU dependency. The Laplacian linear system is sparse and well-conditioned — spsolve handles it cleanly.

Why not Bayesian (PyMC/Stan)? SC-MoE is frequentist penalised likelihood. The spatial term is regularisation, not a prior. MCMC would be 10-100x slower for no gain given the MM-ADMM algorithm's convergence properties.

Gamma parameterisation: f(x; alpha, beta) = beta^alpha / Gamma(alpha) * x^(alpha-1) * exp(-beta*x). Mean = alpha/beta, Var = alpha/beta^2. Newton-Raphson on the profile log-likelihood gives fast, stable shape estimation.

Label switching: Components are re-ordered by ascending lambda_k after fitting. This is not perfect — if two classes have identical rates, order is arbitrary — but it gives reproducible output for most practical cases.

Gating architecture: The ADMM operates at the area level (one alpha vector per postcode sector per class), averaging down to policy level for predictions. This means the spatial penalty acts on the area-level assignment, not on individual policies.

Reference

NAAJ 2025, DOI: 10.1080/10920277.2025.2567283.

LRMoE: Fung, Badescu, Lin (2019). GitHub.

Related packages

insurance-spatial: BYM2 spatial random effects in a GLM — territorial smoothing within a single risk model.
insurance-nested-glm: Neural embeddings + contiguity-constrained clustering for territory factor construction.
insurance-glm-cluster: Fused lasso clustering of factor levels within a GLM.

SC-MoE uniquely combines latent risk type discovery, joint frequency-severity modelling, and geographic continuity enforcement in a single penalised mixture model.

Licence

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_scmoe-0.1.0.tar.gz (36.7 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_scmoe-0.1.0-py3-none-any.whl (27.7 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file insurance_scmoe-0.1.0.tar.gz.

File metadata

Download URL: insurance_scmoe-0.1.0.tar.gz
Upload date: Mar 13, 2026
Size: 36.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_scmoe-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2c4a8182eeba727b0cea6049b43369d6ad1a5c0169bd30eaf3bd1abbdffdecd7`
MD5	`01a2ed3f9de00d2a32f120178937e13b`
BLAKE2b-256	`2c68ef217012827ca6486a6029e944350fcb23f4744be1c79b85f16aa3749cee`

See more details on using hashes here.

File details

Details for the file insurance_scmoe-0.1.0-py3-none-any.whl.

File metadata

Download URL: insurance_scmoe-0.1.0-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 27.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_scmoe-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`97e8267c525a2b154a1bbb6fb632f2bf3abefeeb53cef9124c97e6d7b5e697c5`
MD5	`80b49e32fb376d4929dfb1dc762ddd8d`
BLAKE2b-256	`67d5bd737f4a810aacb0a144088f5f97d86106801b674d520994bbd4f0c38fb3`

See more details on using hashes here.

insurance-scmoe 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-scmoe

The problem

How it works

Installation

Quick start

Building from a real spatial dataset

Model selection

API reference

SCMoE(n_components, lam, rho, max_iter, tol, admm_max_iter, random_state, verbose)

SpatialGraph

ModelSelector(k_range, lam_grid, ...)

simulate_scmoe(n, K, n_areas, n_rows, n_cols, lam_spatial, seed, ...)

Design notes

Reference

Related packages

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`SCMoE(n_components, lam, rho, max_iter, tol, admm_max_iter, random_state, verbose)`

`SpatialGraph`

`ModelSelector(k_range, lam_grid, ...)`

`simulate_scmoe(n, K, n_areas, n_rows, n_cols, lam_spatial, seed, ...)`