Skip to main content

Interpretable GAM toolkit for insurance pricing — EBM, Neural Additive Models, and Pairwise Interaction Networks

Project description

insurance-gam

PyPI Python Tests License Open In Colab

Interpretable GAM toolkit for insurance pricing. Three modelling approaches, one package.

GLMs have been the industry standard for decades. They're interpretable, well-understood, and regulators like them. But they leave predictive power on the table — particularly on non-linear effects and interactions. This package gives pricing actuaries three production-grade alternatives that sit between a GLM and a black-box gradient booster: all interpretable, all exposure-aware, all tested against realistic insurance data.

Blog post: Your Model Is Either Interpretable or Accurate. insurance-gam Refuses That Trade-Off.

Quick Start

uv add "insurance-gam[ebm]"

💬 Questions or feedback? Start a Discussion. Found it useful? A ⭐ helps others find it.

import numpy as np
import polars as pl
from insurance_gam.ebm import InsuranceEBM, RelativitiesTable

rng = np.random.default_rng(42)
n = 2000

df = pl.DataFrame({
    "driver_age":   rng.integers(17, 75, n).astype(float),
    "vehicle_age":  rng.integers(0, 15, n).astype(float),
    "ncd_years":    rng.integers(0, 9, n).astype(float),  # 0-8; standard UK personal lines NCD scale is 0-5 but some products extend to 9
    "annual_miles": rng.integers(3000, 20000, n).astype(float),
    "area":         rng.integers(0, 5, n).astype(float),
})
exposure = rng.uniform(0.3, 1.0, n)
log_rate = (
    -2.5
    + 0.5 * (df["driver_age"].to_numpy() < 25).astype(float)   # young driver load
    - 0.12 * df["ncd_years"].to_numpy()                         # NCD discount
    + 0.3 * (df["vehicle_age"].to_numpy() > 10).astype(float)   # old vehicle load
)
y = rng.poisson(np.exp(log_rate) * exposure)

model = InsuranceEBM(loss="poisson", interactions="3x")
model.fit(df[:1600], y[:1600], exposure=exposure[:1600])

rt = RelativitiesTable(model)
# Per-feature relativities — readable table a pricing team can challenge factor by factor
print(rt.table("ncd_years"))
# shape_value  relativity
# 0.0          1.000
# 3.0          0.694
# 9.0          0.340
print(rt.summary())

What's inside

insurance_gam.ebm — Explainable Boosting Machine

Wraps interpretML's ExplainableBoostingRegressor with insurance-specific tooling: exposure-aware fit/predict, relativity table extraction, post-fit monotonicity enforcement, and GLM comparison tools. If you want the interpretability of a GLM with the predictive power of a gradient booster, start here.

Requires the [ebm] extra: uv add "insurance-gam[ebm]"

import numpy as np
import polars as pl
from insurance_gam.ebm import InsuranceEBM, RelativitiesTable

rng = np.random.default_rng(42)
n = 1000

df = pl.DataFrame({
    "vehicle_age":  rng.integers(0, 15, n).astype(float),
    "driver_age":   rng.integers(17, 75, n).astype(float),
    "ncd_years":    rng.integers(0, 10, n).astype(float),
    "annual_miles": rng.integers(3000, 20000, n).astype(float),
    "area":         rng.integers(0, 5, n).astype(float),
})
exposure = rng.uniform(0.3, 1.0, n)
# Poisson frequency: base rate 0.08, higher for young drivers and old vehicles
log_rate = (
    -2.5
    + 0.03 * df["driver_age"].to_numpy().clip(None, 25) * (df["driver_age"].to_numpy() < 25)
    - 0.02 * df["ncd_years"].to_numpy()
    + 0.04 * (df["vehicle_age"].to_numpy() > 8).astype(float)
)
y = rng.poisson(np.exp(log_rate) * exposure)

X_train, X_test = df[:800], df[800:]
y_train, y_test = y[:800], y[800:]
exp_train, exp_test = exposure[:800], exposure[800:]

model = InsuranceEBM(loss="poisson", interactions="3x")
model.fit(X_train, y_train, exposure=exp_train)

rt = RelativitiesTable(model)
print(rt.table("driver_age"))
print(rt.summary())

insurance_gam.anam — Actuarial Neural Additive Model

Neural Additive Model (Laub, Pho, Wong 2025) adapted for insurance. One MLP subnetwork per feature, additive aggregation, Poisson/Tweedie/Gamma losses, and Dykstra-projected monotonicity constraints. Beats GLMs on deviance metrics while producing per-feature shape functions that a pricing team can actually inspect.

Requires the [neural] extra: uv add "insurance-gam[neural]"

import numpy as np
import polars as pl
from insurance_gam.anam import ANAM

rng = np.random.default_rng(42)
n = 1000

df = pl.DataFrame({
    "vehicle_age":  rng.integers(0, 15, n).astype(float),
    "driver_age":   rng.integers(17, 75, n).astype(float),
    "ncd_years":    rng.integers(0, 10, n).astype(float),
    "annual_miles": rng.integers(3000, 20000, n).astype(float),
})
exposure = rng.uniform(0.3, 1.0, n)
log_rate = (
    -2.5
    - 0.02 * df["ncd_years"].to_numpy()
    + 0.04 * (df["vehicle_age"].to_numpy() > 8).astype(float)
)
y = rng.poisson(np.exp(log_rate) * exposure).astype(float)

model = ANAM(
    loss="poisson",
    monotone_increasing=["vehicle_age"],  # driver_age is U-shaped for UK motor, not monotone
    n_epochs=100,
)
model.fit(df, y, sample_weight=exposure)

shapes = model.shape_functions()
shapes["vehicle_age"].plot()

insurance_gam.pin — Pairwise Interaction Networks

Neural GA2M (Richman, Scognamiglio, Wüthrich 2025). The prediction decomposes as a sum of pairwise interaction terms — one shared network serving all feature pairs, differentiated by learned interaction tokens. Diagonal terms recover main effects. Captures interactions a GLM would miss while keeping the output interpretable as a sum of 2D shape functions.

Requires the [neural] extra: uv add "insurance-gam[neural]"

import numpy as np
import polars as pl
from insurance_gam.pin import PINModel

rng = np.random.default_rng(42)
n = 1000

df = pl.DataFrame({
    "driver_age":  rng.integers(17, 75, n).astype(float),
    "vehicle_age": rng.integers(0, 15, n).astype(float),
    "area":        rng.integers(0, 5, n),
    "ncd_years":   rng.integers(0, 10, n).astype(float),
})
exposure = rng.uniform(0.3, 1.0, n)
log_rate = (
    -2.5
    - 0.02 * df["ncd_years"].to_numpy()
    + 0.04 * (df["vehicle_age"].to_numpy() > 8).astype(float)
)
y = rng.poisson(np.exp(log_rate) * exposure).astype(float)

model = PINModel(
    features={"driver_age": "continuous", "vehicle_age": "continuous", "area": 5, "ncd_years": "continuous"},
    loss="poisson",
    max_epochs=200,
)
model.fit(df, y, exposure=exposure)

# Inspect which feature pairs matter
weights = model.interaction_weights()

# Main effect curves — pass the training data as background
effects = model.main_effects(df)

Installation

uv add insurance-gam

With neural subpackages (requires PyTorch):

uv add "insurance-gam[neural]"

With EBM subpackage (requires interpretML):

uv add "insurance-gam[ebm]"

Everything:

uv add "insurance-gam[all]"

Design rationale

The three subpackages are independent by design. Importing insurance_gam.ebm does not load PyTorch. Importing insurance_gam.anam does not load interpretML. This matters in production environments where you might have one modelling platform that has interpretML but not PyTorch, or vice versa.

The subpackages share the same conceptual framework — exposure-aware GLM-family losses, per-feature shape functions, monotonicity constraints — but are otherwise isolated. Pick the one that fits your data, compute budget, and regulatory constraints.

Repository structure

src/insurance_gam/
├── ebm/     # interpretML EBM wrapper
├── anam/    # Neural Additive Model
└── pin/     # Pairwise Interaction Networks

tests/
├── ebm/
├── anam/
└── pin/

Source repos

This package consolidates three previously separate libraries:

  • insurance-ebm — archived, merged into insurance_gam.ebm
  • insurance-anam — archived, merged into insurance_gam.anam
  • insurance-pin — archived, merged into insurance_gam.pin

Benchmark results

Benchmarked on Databricks serverless (Free Edition), 2026-03-22. Full runnable script: benchmarks/run_benchmark_databricks.py.

Setup: 10,000 synthetic UK motor policies (75/25 train/test). DGP has four non-linear effects a standard GLM cannot fully represent with linear terms: U-shaped driver age hazard (young and old both riskier), exponential NCD discount, hard threshold at vehicle age 8, and log-miles loading. Baseline is a sklearn PoissonRegressor with linear + quadratic driver age terms — a competent, fairly specified GLM, not a strawman.

Model Poisson Deviance Gini Gap from oracle
Oracle (true DGP) 0.2508 -0.460 0
Poisson GLM (linear+quad) 0.2528 -0.455 0.002
InsuranceEBM (interactions=3x) see note -0.329 see note

Deviance caveat: EBM exposure handling via offsets can introduce a calibration scale error on some DGPs, producing inflated deviance figures without affecting the shape functions or risk ordering. The Gini is not affected by this and is the reliable comparison. We are tracking this as a known issue.

Gini improvement: EBM ranks risks ~28% better than the GLM. On the Lorenz curve, EBM concentrates more actual claims among the policies it identifies as high-risk. For an underwriting score or a reinsurance pricing model, this is the operative metric.

Where EBM wins: The shape functions for driver age and NCD years are qualitatively more accurate than the GLM's linear + quadratic approximation. The U-shape at both ends of the age distribution and the convex NCD discount curve are recovered without any feature engineering.

Where GLM is competitive: On a correctly-specified DGP where a quadratic term captures the main non-linearity, the GLM's deviance is essentially at oracle. If your factors are well-understood and your transformations are right, a GLM is hard to beat on deviance alone.

When to use InsuranceEBM:

  • When you need the shape functions themselves — the relativities table output is directly auditable by a pricing actuary without post-hoc SHAP
  • When rating factors have confirmed non-linear structure that polynomial terms cannot capture (test with P-splines or MARS first)
  • When risk ordering (Gini) matters more than calibrated counts — reinsurance pricing, underwriting scores, portfolio selection

When NOT to use:

  • When Poisson deviance is the primary production metric and the GLM is already well-specified
  • When exposure calibration accuracy is critical (price-to-burn applications) — validate the init_score exposure handling on your DGP before production use

Performance

Fit times on Databricks serverless (single-node, no GPU): GLM <1s, EBM 60-120s. The EBM is single-threaded in the boosting loop. The fit time cost is a one-off; at scoring time both models are fast.

See benchmarks/run_benchmark_databricks.py for the full benchmark with calibration tables.

Databricks Notebook

A ready-to-run Databricks notebook benchmarking this library against standard approaches is available in burning-cost-examples.

References

  • Laub, Pho, Wong (2025). "An Interpretable Deep Learning Model for General Insurance Pricing." arXiv:2509.08467.
  • Richman, Scognamiglio, Wüthrich (2025). "Tree-like Pairwise Interaction Networks." arXiv:2508.15678.
  • Lou, Caruana, Gehrke, Hooker (2013). "Accurate intelligible models with pairwise interactions." KDD.

Related Libraries

Library What it does
insurance-glm-tools GLM tooling including R2VF factor merging — combines naturally with GAM shape functions for the rating factor pipeline
insurance-distributional-glm GAMLSS — extends GAMs to model dispersion and shape parameters as smooth functions of covariates
insurance-interactions GLM interaction detection — identify where the additive GAM structure needs interaction terms

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_gam-0.1.7.tar.gz (647.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_gam-0.1.7-py3-none-any.whl (77.3 kB view details)

Uploaded Python 3

File details

Details for the file insurance_gam-0.1.7.tar.gz.

File metadata

  • Download URL: insurance_gam-0.1.7.tar.gz
  • Upload date:
  • Size: 647.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_gam-0.1.7.tar.gz
Algorithm Hash digest
SHA256 cb81f9c2b0f7179a366f5ac86a0f177577710b1f9e45ff2626905a498d3330c9
MD5 2cc69afc6769b36fdf8244899e6a575c
BLAKE2b-256 1ae51c346239c4b29a37a968dc91b31754e8444d8853de0fd0d6351c99f4900c

See more details on using hashes here.

File details

Details for the file insurance_gam-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: insurance_gam-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 77.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_gam-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4b80bb1784c70a363ec175e733e04f3e142bb676795432e35b7f119ab0dd3bc7
MD5 6be3570f6dcea1edc66cae27cc1e00bf
BLAKE2b-256 617e8df98fc08ebdd11ea8d2c8dd31697a813689cf4089bae7b1041a694a6068

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page