Skip to main content

Interpretable GAM toolkit for insurance pricing — EBM, Neural Additive Models, and Pairwise Interaction Networks

Project description

insurance-gam

Interpretable GAM toolkit for insurance pricing. Three modelling approaches, one package.

GLMs have been the industry standard for decades. They're interpretable, well-understood, and regulators like them. But they leave predictive power on the table — particularly on non-linear effects and interactions. This package gives pricing actuaries three production-grade alternatives that sit between a GLM and a black-box gradient booster: all interpretable, all exposure-aware, all tested against realistic insurance data.

What's inside

insurance_gam.ebm — Explainable Boosting Machine

Wraps interpretML's ExplainableBoostingRegressor with insurance-specific tooling: exposure-aware fit/predict, relativity table extraction, post-fit monotonicity enforcement, and GLM comparison tools. If you want the interpretability of a GLM with the predictive power of a gradient booster, start here.

Requires the [ebm] extra: pip install "insurance-gam[ebm]"

import numpy as np
import polars as pl
from insurance_gam.ebm import InsuranceEBM, RelativitiesTable

rng = np.random.default_rng(42)
n = 1000

df = pl.DataFrame({
    "vehicle_age":  rng.integers(0, 15, n).astype(float),
    "driver_age":   rng.integers(17, 75, n).astype(float),
    "ncd_years":    rng.integers(0, 10, n).astype(float),
    "annual_miles": rng.integers(3000, 20000, n).astype(float),
    "area":         rng.integers(0, 5, n).astype(float),
})
exposure = rng.uniform(0.3, 1.0, n)
# Poisson frequency: base rate 0.08, higher for young drivers and old vehicles
log_rate = (
    -2.5
    + 0.03 * df["driver_age"].to_numpy().clip(None, 25) * (df["driver_age"].to_numpy() < 25)
    - 0.02 * df["ncd_years"].to_numpy()
    + 0.04 * (df["vehicle_age"].to_numpy() > 8).astype(float)
)
y = rng.poisson(np.exp(log_rate) * exposure)

X_train, X_test = df[:800], df[800:]
y_train, y_test = y[:800], y[800:]
exp_train, exp_test = exposure[:800], exposure[800:]

model = InsuranceEBM(loss="poisson", interactions="3x")
model.fit(X_train, y_train, exposure=exp_train)

rt = RelativitiesTable(model)
print(rt.table("driver_age"))
print(rt.summary())

insurance_gam.anam — Actuarial Neural Additive Model

Neural Additive Model (Laub, Pho, Wong 2025) adapted for insurance. One MLP subnetwork per feature, additive aggregation, Poisson/Tweedie/Gamma losses, and Dykstra-projected monotonicity constraints. Beats GLMs on deviance metrics while producing per-feature shape functions that a pricing team can actually inspect.

Requires the [neural] extra: pip install "insurance-gam[neural]"

import numpy as np
import polars as pl
from insurance_gam.anam import ANAM

rng = np.random.default_rng(42)
n = 1000

df = pl.DataFrame({
    "vehicle_age":  rng.integers(0, 15, n).astype(float),
    "driver_age":   rng.integers(17, 75, n).astype(float),
    "ncd_years":    rng.integers(0, 10, n).astype(float),
    "annual_miles": rng.integers(3000, 20000, n).astype(float),
})
exposure = rng.uniform(0.3, 1.0, n)
log_rate = (
    -2.5
    - 0.02 * df["ncd_years"].to_numpy()
    + 0.04 * (df["vehicle_age"].to_numpy() > 8).astype(float)
)
y = rng.poisson(np.exp(log_rate) * exposure).astype(float)

model = ANAM(
    loss="poisson",
    monotone_increasing=["vehicle_age", "driver_age"],
    n_epochs=100,
)
model.fit(df, y, sample_weight=exposure)

shapes = model.shape_functions()
shapes["vehicle_age"].plot()

insurance_gam.pin — Pairwise Interaction Networks

Neural GA2M (Richman, Scognamiglio, Wüthrich 2025). The prediction decomposes as a sum of pairwise interaction terms — one shared network serving all feature pairs, differentiated by learned interaction tokens. Diagonal terms recover main effects. Captures interactions a GLM would miss while keeping the output interpretable as a sum of 2D shape functions.

Requires the [neural] extra: pip install "insurance-gam[neural]"

import numpy as np
import polars as pl
from insurance_gam.pin import PINModel

rng = np.random.default_rng(42)
n = 1000

df = pl.DataFrame({
    "driver_age":  rng.integers(17, 75, n).astype(float),
    "vehicle_age": rng.integers(0, 15, n).astype(float),
    "area":        rng.integers(0, 5, n),
    "ncd_years":   rng.integers(0, 10, n).astype(float),
})
exposure = rng.uniform(0.3, 1.0, n)
log_rate = (
    -2.5
    - 0.02 * df["ncd_years"].to_numpy()
    + 0.04 * (df["vehicle_age"].to_numpy() > 8).astype(float)
)
y = rng.poisson(np.exp(log_rate) * exposure).astype(float)

model = PINModel(
    features={"driver_age": "continuous", "vehicle_age": "continuous", "area": 5, "ncd_years": "continuous"},
    loss="poisson",
    max_epochs=200,
)
model.fit(df, y, exposure=exposure)

# Inspect which feature pairs matter
weights = model.interaction_weights()

# Main effect curves — pass the training data as background
effects = model.main_effects(df)

Installation

pip install insurance-gam

With neural subpackages (requires PyTorch):

pip install "insurance-gam[neural]"

With EBM subpackage (requires interpretML):

pip install "insurance-gam[ebm]"

Everything:

pip install "insurance-gam[all]"

Design rationale

The three subpackages are independent by design. Importing insurance_gam.ebm does not load PyTorch. Importing insurance_gam.anam does not load interpretML. This matters in production environments where you might have one modelling platform that has interpretML but not PyTorch, or vice versa.

The subpackages share the same conceptual framework — exposure-aware GLM-family losses, per-feature shape functions, monotonicity constraints — but are otherwise isolated. Pick the one that fits your data, compute budget, and regulatory constraints.

Repository structure

src/insurance_gam/
├── ebm/     # interpretML EBM wrapper
├── anam/    # Neural Additive Model
└── pin/     # Pairwise Interaction Networks

tests/
├── ebm/     # 136 tests
├── anam/    # 151 tests
└── pin/     # 136 tests

Source repos

This package consolidates three previously separate libraries:

  • insurance-ebm — archived, merged into insurance_gam.ebm
  • insurance-anam — archived, merged into insurance_gam.anam
  • insurance-pin — archived, merged into insurance_gam.pin

Performance

Benchmarked against Poisson GLM (statsmodels, main effects only) and CatBoost Poisson GBM on synthetic UK motor data — 50,000 policies, known DGP, temporal train/test split. Full notebook: notebooks/benchmark.py.

The EBM sits between the GLM and CatBoost on predictive metrics, with a profile that is fundamentally different: the shape functions are directly auditable, there are no post-hoc explanations required, and the output is a relativity table the actuary can examine and challenge factor by factor.

Metric Poisson GLM EBM (insurance-gam) CatBoost GBM
Poisson deviance highest between GLM and GBM lowest
Gini coefficient lowest between GLM and GBM highest
Interpretability full (coefficients) full (shape functions) requires post-hoc SHAP
Auditability for FCA straightforward straightforward requires explanation layer

The benchmark measures Poisson deviance, Gini, and double-lift chart on the held-out test set. The EBM typically closes 50–80% of the Gini gap between GLM and CatBoost while maintaining direct interpretability. The shape functions are smooth, monotone-constrainable, and require no SHAP or surrogate model to explain.

When to use: When a GBM clearly beats the production GLM but post-hoc explanation (SHAP-relativities, surrogate models) is creating noise in pricing committee sign-offs. The EBM offers comparable or better predictive performance than a GLM with hand-crafted interactions, with a shape function per feature rather than a coefficient per dummy level.

When NOT to use: When the portfolio has strong multiplicative interactions between rating factors that an additive model cannot capture. The EBM handles pairwise interactions via interaction terms, but the hierarchy is still additive and cannot represent three-way interactions without explicit specification.

References

  • Laub, Pho, Wong (2025). "An Interpretable Deep Learning Model for General Insurance Pricing." arXiv:2509.08467.
  • Richman, Scognamiglio, Wüthrich (2025). "Tree-like Pairwise Interaction Networks." arXiv:2508.15678.
  • Lou, Caruana, Gehrke, Hooker (2013). "Accurate intelligible models with pairwise interactions." KDD.

Related Libraries

Library What it does
insurance-glm-tools GLM tooling including R2VF factor merging — combines naturally with GAM shape functions for the rating factor pipeline
insurance-distributional-glm GAMLSS — extends GAMs to model dispersion and shape parameters as smooth functions of covariates
insurance-interactions GLM interaction detection — identify where the additive GAM structure needs interaction terms

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_gam-0.1.1.tar.gz (310.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_gam-0.1.1-py3-none-any.whl (73.4 kB view details)

Uploaded Python 3

File details

Details for the file insurance_gam-0.1.1.tar.gz.

File metadata

  • Download URL: insurance_gam-0.1.1.tar.gz
  • Upload date:
  • Size: 310.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_gam-0.1.1.tar.gz
Algorithm Hash digest
SHA256 98639c91a4eaf5eb80e38614c78358f578685cd36849d9dcfca99bcecfee9865
MD5 82eb23b8f3bab7a026cf76047f3c3c49
BLAKE2b-256 05c795993d55ee4a22d19f540ef0be6d52a423b836be63ec20d24c05512b0b12

See more details on using hashes here.

File details

Details for the file insurance_gam-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: insurance_gam-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 73.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_gam-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e982d9b14a6df77e650c90e913b5d8904df1111df021cd257df77db0d3813a47
MD5 8f0d7bafec74926e598cd8a8b61f3e96
BLAKE2b-256 78beee6654a7d9bef4f433bb12498fb4ef8d91953de9296abae343bab7875929

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page