Skip to main content

Dependent frequency-severity neural two-part model for insurance pricing

Project description

insurance-dependent-fs

A dependent frequency-severity neural two-part model for insurance pricing.

The problem

Standard insurance pricing models fit frequency and severity independently, then multiply them together to get a pure premium. That multiplication step assumes E[N·Y] = E[N]·E[Y] — which is only true if claim count and average claim size are statistically independent.

They rarely are. In UK motor, the negative correlation is well-documented: high-frequency policyholders (young drivers, urban, high-NCD) tend to have lower average severity. The UK Civil Liability Act (2021) whiplash reforms amplified this: frequent small claims are now subject to capped portal payouts, while large structural claims are not. An independence assumption leads to measurable cross-subsidy in your risk factors.

The correction is not huge for an average risk — roughly 2-4% at typical motor frequencies — but it compounds with risk factor interactions, and the tail effect is larger than the mean effect.

What this library does

It trains a single neural network with a shared encoder trunk and two output heads (Poisson frequency, Gamma severity). Gradients from both the Poisson loss and the Gamma loss flow through the shared trunk simultaneously. The trunk learns features that are jointly informative for both tasks, which is exactly where the frequency-severity dependence information lives in the data.

On top of this implicit latent dependence, you can optionally add the explicit Garrido-Genest-Schulz conditional covariate (log μ += γ·N). This gives you a directly interpretable γ parameter and a semi-analytical pure premium correction via the Poisson moment generating function.

How this differs from insurance-frequency-severity

insurance-frequency-severity models dependence via a Sarmanov copula — a parametric bivariate distribution fitted with EM or profile likelihood. It's interpretable, analytically tractable, and fast to fit. Use it when you want an auditable bivariate density and a single omega parameter for a regulator.

This library uses multi-task neural learning. Use it when you have a large dataset (100k+ policies), suspect nonlinear feature interactions, and want a single model that learns both tasks jointly from gradient descent.

The two libraries model the same economic phenomenon using fundamentally different statistical frameworks.

Installation

pip install insurance-dependent-fs

For diagnostic plots:

pip install "insurance-dependent-fs[plot]"

Quick start

from insurance_dependent_fs import DependentFSModel, make_dependent_claims
from insurance_dependent_fs.benchmarks import feature_cols

# Generate synthetic data with known γ=-0.15 (typical motor pattern)
df_train, df_test = make_dependent_claims(n_policies=50_000, gamma=-0.15)
fc = feature_cols(df_train)

model = DependentFSModel(use_explicit_gamma=True)
model.fit(
    df_train[fc].values,
    df_train["n_claims"].values,
    df_train["avg_severity"].values,
    df_train["exposure"].values,
)

print(f"Recovered γ = {model.gamma_:.4f}  (true: -0.15)")

pp = model.predict_pure_premium(df_test[fc].values, df_test["exposure"].values)

Architecture

x ∈ R^p  →  SharedTrunk  →  h ∈ R^d_latent
                                  │
             ┌────────────────────┴─────────────────────┐
        FrequencyHead                              SeverityHead
        log λ + log t                          log μ [+ γ·N]
             │                                       │
        Poisson NLL                             Gamma NLL
             └──────────── joint backprop ───────────┘
                              (shared trunk)

The shared trunk has BatchNorm + ELU hidden layers, configurable width and depth. The default is two hidden layers of [128, 64] with a 32-dimensional latent space.

Configuration

from insurance_dependent_fs import DependentFSModel, SharedTrunkConfig
from insurance_dependent_fs.training import TrainingConfig

model = DependentFSModel(
    trunk_config=SharedTrunkConfig(
        hidden_dims=[128, 64],
        latent_dim=32,
        dropout=0.1,
        activation="elu",
        use_batch_norm=True,
    ),
    training_config=TrainingConfig(
        max_epochs=100,
        batch_size=512,
        lr=1e-3,
        auto_balance=True,      # equalise Poisson and Gamma loss magnitudes
        patience=15,            # early stopping
    ),
    use_explicit_gamma=True,    # learn γ·N conditional covariate
    val_fraction=0.1,           # held-out fraction for early stopping
)

Diagnostics

from insurance_dependent_fs import DependentFSDiagnostics

diag = DependentFSDiagnostics(model, X_test, n_claims_test, avg_sev_test, exposure_test)

# Lorenz curve and Gini for frequency and pure premium
gini = diag.gini_summary()

# Calibration in deciles
cal = diag.calibration(target="pure_premium")

# Latent correlation structure
lc = diag.latent_correlation()

# Head-to-head vs independence assumption
comparison = diag.vs_independent()
print(f"MSE reduction vs independence: {comparison['mse_reduction_pct']:.1f}%")

# Plots (requires matplotlib)
fig, ax = diag.plot_lorenz(target="frequency")
fig, ax = diag.plot_calibration(target="pure_premium")

Pure premium methods

Two methods are available:

Monte Carlo (always available): samples N ~ Poisson(λ) and Y ~ Gamma for each realisation. General, captures all dependence sources.

Semi-analytical (when use_explicit_gamma=True): uses the Poisson MGF closed form from Garrido-Genest-Schulz (2016):

E[Z | x] = exp(SevHead(h) + γ) · exp(λ(eᵞ − 1)) · λ

Faster at large portfolio size, but assumes γ·N is the only dependence mechanism (ignores residual latent dependence from the trunk).

pp_mc = model.predict_pure_premium(X, exposure, method="mc", n_mc=5000)
pp_an = model.predict_pure_premium(X, exposure, method="analytical")

References

  • Garrido, Genest, Schulz (2016). Generalized linear models for dependent frequency and severity of insurance claims. IME 70: 205-215.
  • arXiv:2106.10770v2. A Neural Frequency-Severity Model and Its Application to Insurance Claims (NeurFS paper).
  • Shi & Shi (2024). A Sparse Deep Two-part Model for Nonlife Insurance Claims with Dependent Frequency and Severity. Variance 17(1).

Databricks notebook

See notebooks/dependent_fs_demo.py for a full workflow on synthetic data, including model fitting, diagnostics, and comparison against independence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_dependent_fs-0.1.1.tar.gz (39.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_dependent_fs-0.1.1-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file insurance_dependent_fs-0.1.1.tar.gz.

File metadata

  • Download URL: insurance_dependent_fs-0.1.1.tar.gz
  • Upload date:
  • Size: 39.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_dependent_fs-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bd1027667151e3b464df249ccfa7661a3caec67d7a14e28e0011248fd4705506
MD5 c8ef1af2d810797994f791e2cc830a96
BLAKE2b-256 955943c8b1b394a1bde4fcc628f1d5f374f45b3ade5b0a812333fb60d2a332b0

See more details on using hashes here.

File details

Details for the file insurance_dependent_fs-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: insurance_dependent_fs-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_dependent_fs-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8bea63db9243795b7ac487995ee9f7fc41c9626dec2214c9167a4379c9b1d50a
MD5 1a6060efc3ecb9b115cf4240a36e2777
BLAKE2b-256 14e4e2e742b3fb10623ba48c7bfc9e4c76caa8de7db132dc96350912d26c1ec4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page