Skip to main content

Shared frailty models for within-policyholder claim recurrence in insurance pricing

Project description

insurance-recurrent

Shared frailty models for within-policyholder claim recurrence in insurance pricing.

The problem

Your Poisson GLM treats every policy-year as independent once you've conditioned on rating factors. It doesn't know that the policyholder who claimed three times last year is probably going to claim again — beyond what their age, vehicle class, and postcode predict.

This unobserved tendency to repeat-claim is the frailty. Some policyholders are just more claim-prone in ways that no rating factor captures. They're the ones who make fleet insurance portfolios behave badly, who drive up pet insurance renewal costs, who account for the long tail in home claims. A shared frailty model estimates this latent heterogeneity from the claim history and turns it into a credibility-adjusted risk score.

The practical output: a per-policy multiplier (the posterior frailty) that says "after seeing this policyholder's full claim history, we believe they're 1.8x as likely to claim as an average risk with the same rating factors." Use it to load renewal premiums, trigger referrals to underwriters, or identify mis-priced portfolios.

Why there's no existing Python library for this

lifelines is the standard Python survival library. GitHub issue #878 requested shared frailty in 2017 and was closed as "maybe someday." scikit-survival doesn't support recurrent events at all. The production tools are frailtypack and reReg in R.

This library fills that gap.

What it does

  • SharedFrailtyModel: fits gamma shared frailty via EM algorithm. Each policyholder gets a latent frailty term that multiplicatively shifts their claim hazard. The EM algorithm alternates between computing the posterior frailty given the data (E-step) and updating the regression coefficients and frailty variance (M-step).

  • JointFrailtyModel: extends the shared frailty model to handle informative censoring from lapse. If high-frailty policyholders lapse more often, standard models underestimate their claim rate because you never see their full claim history. The joint model links the claim and lapse processes through the same frailty term.

  • RecurrentEventData: converts policy claims histories to counting-process format. Handles gap time vs calendar time, left truncation (mid-term inception), and multiple claim types.

  • RecurrentEventSimulator: generates synthetic data with known frailty structure. Essential for validating that the EM algorithm recovers the true parameters.

  • Diagnostics: frailty QQ plots, Cox-Snell residuals, event rate by frailty decile.

  • FrailtyReport: HTML report suitable for sharing with pricing teams.

Bühlmann credibility connection

The posterior frailty mean is exactly the Bühlmann-Straub credibility estimate:

E[u_i | data] = Z_i * (observed_rate_i) + (1 - Z_i) * 1.0

where Z_i = Lambda_i / (Lambda_i + 1/theta) is the credibility factor. If you're already loading renewal rates with Bühlmann credibility, this model is doing the same thing but in a proper survival analysis framework that handles exposure, censoring, and covariates correctly. theta is the between-policyholder variance — the same parameter as in credibility theory.

Installation

pip install insurance-recurrent

Requires Python 3.10+, NumPy, SciPy, and pandas. No R dependencies.

Optional for HTML reports:

pip install insurance-recurrent[report]

Quick start

from insurance_recurrent import (
    RecurrentEventSimulator,
    RecurrentEventData,
    SharedFrailtyModel,
    FrailtyReport,
)

# Simulate fleet insurance data: 500 trucks, frailty variance = 0.6
sim = RecurrentEventSimulator(
    n_policies=500,
    theta=0.6,
    baseline_rate=0.3,
    coef={"vehicle_age": 0.4, "driver_age_band": -0.2},
    seed=42,
)
data, true_frailty = sim.simulate(return_true_frailty=True)

print(data.summary())
# RecurrentEventData (gap time)
#   Policies:    487
#   Events:      642
#   Intervals:   1129
#   Events/policy: mean=1.32, max=8

# Fit the model
model = SharedFrailtyModel(theta_init=1.0, max_iter=100)
model.fit(data, covariates=["vehicle_age", "driver_age_band"])
model.print_summary()
# ======================================================
# SharedFrailtyModel Summary
# ======================================================
#   Policies:          487
#   Events:            642
#   Log-likelihood:    -1247.3
#   Frailty variance (theta): 0.5812
#   Converged:         True (47 iters)
#
#   Coefficients:
#     vehicle_age               +0.3891
#     driver_age_band           -0.1947

# Get per-policy frailty scores
frailty_scores = model.predict_frailty(data)
# frailty_scores[0] = {
#     'policy_id': 'P000042',
#     'frailty_mean': 1.84,      # 84% more likely to claim than average
#     'credibility_factor': 0.73, # 73% weight on own history
#     'n_events': 4,
#     'exposure': 2.5,
# }

# Generate an HTML report
report = FrailtyReport(model, data, model_name="Fleet Q1 2026")
report.save("frailty_report.html")

Loading claim histories from policy tables

Most actuarial teams have a policies table and a claims table in date format:

import pandas as pd
from insurance_recurrent import RecurrentEventData, SharedFrailtyModel

policies = pd.read_csv("policies.csv")  # policy_id, inception_date, expiry_date, ...
claims = pd.read_csv("claims.csv")      # policy_id, claim_date

data = RecurrentEventData.from_policy_claims(
    policies=policies,
    claims=claims,
    covariate_cols=["vehicle_class", "region"],
    time_scale="gap",  # time resets after each claim
)

model = SharedFrailtyModel()
model.fit(data, covariates=["vehicle_class", "region"])

Handling informative lapse

If you suspect that high-risk policyholders leave your book faster (they buy cheaply elsewhere, or you non-renew them), use the joint model:

from insurance_recurrent import JointFrailtyModel

# lapse_data: one row per policy with lapse_time and lapsed columns
model = JointFrailtyModel(alpha_init=0.5)
model.fit(
    recurrent_data=claim_data,
    lapse_data=lapse_df,
    recurrent_covariates=["vehicle_class"],
)
print(f"Association alpha: {model.association_:.3f}")
# Positive alpha: high-frailty policyholders lapse faster
# => standard model underestimates their true claim rate

When to use this

Good candidates:

  • Fleet insurance: one policy covers multiple vehicles/drivers, frequent events
  • Pet insurance: chronic conditions, repeat treatments
  • Home insurance: maintenance-related claims repeat for the same property

Less useful:

  • Personal motor: most policyholders have 0 or 1 claim per year — not enough within-policy data to estimate individual frailty
  • Single-event products: travel, single-trip

Diagnostics

from insurance_recurrent import (
    frailty_qq_data,
    cox_snell_residuals,
    event_rate_by_frailty_decile,
)

# QQ plot data: compare posterior frailty distribution to gamma prior
qq = frailty_qq_data(model, data)
# Plot qq["theoretical"] vs qq["empirical"] — straight line = good fit

# Cox-Snell residuals: should be ~Exp(1) if model correct
resid = cox_snell_residuals(model, data)

# Lift by frailty decile: key diagnostic for actuarial audiences
decile = event_rate_by_frailty_decile(model, data)
print(decile[["decile", "frailty_mean_avg", "observed_rate", "lift"]])

References

  • Cook, R.J. & Lawless, J.F. (2007). The Statistical Analysis of Recurrent Events. Springer.
  • Rondeau, V. et al. (2003). Maximum penalized likelihood estimation in a gamma-frailty model. Lifetime Data Analysis.
  • Bühlmann, H. & Gisler, A. (2005). A Course in Credibility Theory. Springer.
  • Vaupel, J.W. et al. (1979). The impact of heterogeneity in individual frailty. Demography, 16(3):439–454.
  • Andersen, P.K. & Gill, R.D. (1982). Cox's regression model for counting processes. Annals of Statistics, 10(4):1100–1120.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_recurrent-0.1.0.tar.gz (39.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_recurrent-0.1.0-py3-none-any.whl (34.2 kB view details)

Uploaded Python 3

File details

Details for the file insurance_recurrent-0.1.0.tar.gz.

File metadata

  • Download URL: insurance_recurrent-0.1.0.tar.gz
  • Upload date:
  • Size: 39.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_recurrent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a1d5d2d370be73a3f8062d6184e4bab3c1e85aedb71bb9a08504660c23615e26
MD5 d3eeae70e5764d7a464a2f9214721936
BLAKE2b-256 d6ed4a4a8d424687211bfa94792ddf61ffafb83134f76d99bd1d22ef5e629fe8

See more details on using hashes here.

File details

Details for the file insurance_recurrent-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: insurance_recurrent-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_recurrent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0a46bc6bb95c0d4ff158e2475de65827f3b66a32657a78638a83ec0e8570acfd
MD5 e3dad2834ed5657468878c2bc4c92d8f
BLAKE2b-256 e3d592819b704ce0849dac057158d46fa6a979640c21642ff9dc1413399698ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page