Double GLM (DGLM) for joint modelling of mean and dispersion in insurance pricing

These details have not been verified by PyPI

Project links

Project description

insurance-dispersion

Double GLM (DGLM) for joint modelling of mean and dispersion in non-life insurance pricing.

The problem

Standard GLMs assume a single scalar dispersion parameter phi shared across all observations. For a Gamma severity model, that means your fleet broker policy and your personal lines online policy are assumed to have identical volatility around the fitted mean. That assumption is almost always wrong.

Dispersion varies systematically with the same risk factors that drive the mean — and often with different factors entirely. Broker-sourced business tends to be more volatile (larger phi) because brokers aggregate heterogeneous risks. Fleet accounts have more predictable frequencies. High-limit policies show fat-tailed severity that the Gamma captures poorly with a flat dispersion assumption.

The Double GLM (Smyth 1989) solves this by adding a second regression model for phi:

Mean submodel:        g(mu_i)  = x_i^T beta    [standard GLM]
Dispersion submodel:  h(phi_i) = z_i^T alpha   [new: each obs gets its own phi]

Var[Y_i] = phi_i * V(mu_i)

This matters for:

Risk-differentiated pricing: your pure premium estimate is mu_i, but quoting confidence depends on phi_i
Reinsurance pricing: the tail risk on a policy is driven by both mu_i and phi_i
Model validation: a well-specified mean model with poor dispersion fit still mispredicts volatility
Credibility: low-phi risks can be priced more confidently than high-phi risks

Installation

pip install insurance-dispersion

Or from source:

git clone https://github.com/burning-cost/insurance-dispersion
cd insurance-dispersion
uv pip install -e .

Quick start

import numpy as np
import pandas as pd
from insurance_dispersion import DGLM
import insurance_dispersion.families as fam

# Synthetic claim severity data
rng = np.random.default_rng(42)
n = 500
df = pd.DataFrame({
    "vehicle_class":  rng.choice(["A", "B", "C"], size=n),
    "age_band":       rng.choice(["17-24", "25-35", "36-60"], size=n),
    "vehicle_value":  rng.uniform(5000, 40000, size=n),
    "channel":        rng.choice(["direct", "broker"], size=n),
    "limit_band":     rng.choice(["50k", "100k", "250k"], size=n),
    "earned_premium": rng.uniform(0.5, 1.0, size=n),
})
df["claim_amount"] = rng.gamma(shape=2.0, scale=1500.0, size=n)

# Fit a Gamma DGLM for claim severity
# Mean model: severity depends on vehicle class and age band
# Dispersion model: volatility depends on distribution channel and limit band
model = DGLM(
    formula="claim_amount ~ C(vehicle_class) + C(age_band) + log(vehicle_value)",
    dformula="~ C(channel) + C(limit_band)",
    family=fam.Gamma(),
    data=df,
    exposure="earned_premium",  # log-offset in mean only
    method="reml",              # REML correction (recommended)
)

result = model.fit()
print(result.summary())

Output:

Double GLM (DGLM) Results
============================================================
Family:      Gamma(link='log')
Method:      REML
Observations:500
Converged:   True (after 8 iterations)
Log-lik:     -4182.3521
AIC:         8398.7042

Mean Submodel Coefficients:
------------------------------------------------------------
                            coef  exp_coef    se       z  p_value
Intercept               2.1543    8.6224  0.0321  67.12    0.0000
C(vehicle_class)[T.B]   0.1823    1.1999  0.0211   8.64    0.0000
...

Dispersion Submodel Coefficients:
------------------------------------------------------------
                          coef  exp_coef    se       z  p_value
Intercept             -0.8234    0.4390  0.0412 -19.99    0.0000
C(channel)[T.broker]   0.6112    1.8426  0.0518  11.80    0.0000
...

Factor tables

# Mean relativities: exp(beta) for each level vs. base
mean_rel = result.mean_relativities()
print(mean_rel[["exp_coef", "se", "p_value"]])

# Dispersion relativities: exp(alpha)
# Broker channel has 1.84x the dispersion of direct channel
disp_rel = result.dispersion_relativities()
print(disp_rel[["exp_coef", "se", "p_value"]])

Predictions

new_risk = pd.DataFrame({
    "vehicle_class": ["A", "B"],
    "age_band": ["25-35", "17-24"],
    "vehicle_value": [15000, 8000],
    "channel": ["direct", "broker"],
    "limit_band": ["100k", "50k"],
    "earned_premium": [1.0, 1.0],
})

# Expected severity
mu_pred = result.predict(new_risk, which="mean")

# Observation-level dispersion
phi_pred = result.predict(new_risk, which="dispersion")

# Predicted variance = phi_i * V(mu_i) = phi_i * mu_i^2 (Gamma)
var_pred = result.predict(new_risk, which="variance")

Overdispersion test

# Likelihood ratio test: constant phi vs. phi = f(channel, limit_band)
test = result.overdispersion_test()
print(f"LRT statistic: {test['statistic']:.2f}")
print(f"df: {test['df']}")
print(f"p-value: {test['p_value']:.4f}")
print(test["conclusion"])

Diagnostics

from insurance_dispersion import diagnostics

# Residuals
pearson_r = diagnostics.pearson_residuals(result)
deviance_r = diagnostics.deviance_residuals(result)
qr = diagnostics.quantile_residuals(result)  # ~ N(0,1) under true model

# QQ plot data
qq = diagnostics.qq_plot_data(result)
import matplotlib.pyplot as plt
plt.scatter(qq["theoretical"], qq["observed"], alpha=0.3, s=10)
plt.plot([-3, 3], [-3, 3], "r--")
plt.xlabel("N(0,1) quantiles")
plt.ylabel("Observed quantile residuals")

# Dispersion diagnostic
diag = diagnostics.dispersion_diagnostic(result)
plt.scatter(diag["fitted_phi"], diag["scaled_deviance"], alpha=0.2, s=8)
plt.axhline(1.0, color="red", linestyle="--")  # E[delta_i] = 1 under model
plt.xlabel("Fitted phi")
plt.ylabel("Scaled unit deviance")

Supported families

Family	Default link	Use case
`Gamma()`	log	Claim severity
`InverseGaussian()`	log	Heavy-tail severity
`Tweedie(p=1.5)`	log	Pure premium (compound Poisson-Gamma)
`Gaussian()`	identity	Reserve amounts, Gaussian responses
`Poisson()`	log	Claim frequency (extra-Poisson variation)
`NegativeBinomial(alpha=1.0)`	log	Overdispersed frequency

Algorithm

Alternating IRLS (Smyth 1989, Smyth & Verbyla 1999):

Initialise mu from intercept-only GLM, phi = 1
Mean step: IRLS for GLM(y ~ X, family, weights = prior_weights / phi_i)
Dispersion step: compute unit deviances d_i; fit Gamma GLM on delta_i = d_i / phi_i with log link
REML correction (method='reml'): subtract hat-matrix diagonal from delta_i before dispersion fit. Recommended when the mean model has many parameters.
Check convergence: relative change in -2*loglik < epsilon
Repeat until convergence or maxit reached

Pure numpy/scipy. No ML frameworks, no statsmodels dependency.

Design choices

formulaic not patsy: patsy is unmaintained. formulaic has an active development community, cleaner model matrix schemas for prediction on new data, and better handling of interactions and transformations.

method='reml' default: the REML correction removes the contribution of estimating beta from the dispersion score. With even 10 mean parameters in a dataset of 500 observations this makes a material difference to the dispersion estimates. The correction is cheap (hat diagonal via QR) and almost always helps.

Exposure on mean only: log(exposure) enters as an offset in the mean linear predictor. Dispersion phi_i is per-unit-exposure: a 6-month policy has the same dispersion per claim as a 12-month policy with identical risk characteristics.

Log link for dispersion default: ensures phi_i > 0 always. The identity link is available but requires careful monitoring — it can produce negative phi_i estimates for extrapolation.

Databricks Notebook

A ready-to-run Databricks notebook benchmarking this library against standard approaches is available in burning-cost-examples.

Reference

Smyth (1989): "Generalized Linear Models with Varying Dispersion", JRSS-B 51:47-60
Smyth & Verbyla (1999): "Adjusted likelihood methods for modelling dispersion in GLMs", Environmetrics 10:695-709
R dglm package: https://github.com/cran/dglm

Performance

Benchmarked against a constant-phi Tweedie GLM (statsmodels) on synthetic UK commercial property pure premium data: 25,000 policies, known DGP where phi varies 3–6x across distribution channels (direct vs broker SME vs broker large), temporal 70/30 train/test split. See notebooks/benchmark_dispersion.py for full methodology.

Metric	Tweedie GLM (const phi)	DGLM
Tweedie deviance (test)	—	comparable
Phi MAE vs true	higher	lower
Max channel A/E deviation	higher	lower
Variance ratio by channel	miscalibrated in tails	closer to 1.0
Overdispersion LRT p-value	not applicable	< 0.001
Fit time	faster	3–6x slower

The Tweedie GLM assigns the same phi to a direct retail policy and a broker-placed large commercial account. The DGLM captures the 3–6x dispersion difference between channels, materially improving variance calibration for the segments where it matters most (reinsurance pricing, capital loading). The LRT test (overdispersion_test()) flags whether varying phi adds value on your specific portfolio. On homogeneous books a constant-phi Tweedie is adequate and faster.

Related Libraries

Library	What it does
insurance-distributional-glm	GAMLSS — the full RS algorithm for jointly modelling mean and all distributional parameters including shape
insurance-frequency-severity	Joint frequency-severity models with Sarmanov copula — extends dispersion modelling to the two-part structure

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.4

Mar 25, 2026

0.1.2

Mar 17, 2026

This version

0.1.1

Mar 15, 2026

0.1.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_dispersion-0.1.1.tar.gz (144.4 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_dispersion-0.1.1-py3-none-any.whl (22.4 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file insurance_dispersion-0.1.1.tar.gz.

File metadata

Download URL: insurance_dispersion-0.1.1.tar.gz
Upload date: Mar 15, 2026
Size: 144.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_dispersion-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c927772dbff89c82f1adde981181e6e0fc014b51ddf5415a83056020541d96c5`
MD5	`f44d01d170387f42a25e510fb0ada650`
BLAKE2b-256	`cb50a85099b7099c8c0e7a36a1ce4623eb8704237aafded25599c21554385210`

See more details on using hashes here.

File details

Details for the file insurance_dispersion-0.1.1-py3-none-any.whl.

File metadata

Download URL: insurance_dispersion-0.1.1-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 22.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_dispersion-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8e3661c865c5b4d910b131c2406e3832d19524f39c727dae2960bf4ae4bb239`
MD5	`d14191c27dfad5a2ac176dbf81b1c3b8`
BLAKE2b-256	`a9d4eb29237cf501e772c23031b5f7c6e44f5c55f4350e3ebc35ae7855058442`

See more details on using hashes here.

insurance-dispersion 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-dispersion

The problem

Installation

Quick start

Factor tables

Predictions

Overdispersion test

Diagnostics

Supported families

Algorithm

Design choices

Databricks Notebook

Reference

Performance

Related Libraries

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes