Shared frailty models for within-policyholder claim recurrence in insurance pricing

These details have not been verified by PyPI

Project links

Project description

insurance-recurrent

Shared frailty models for within-policyholder claim recurrence in insurance pricing.

The problem

Your Poisson GLM treats every policy-year as independent once you've conditioned on rating factors. It doesn't know that the policyholder who claimed three times last year is probably going to claim again — beyond what their age, vehicle class, and postcode predict.

This unobserved tendency to repeat-claim is the frailty. Some policyholders are just more claim-prone in ways that no rating factor captures. They're the ones who make fleet insurance portfolios behave badly, who drive up pet insurance renewal costs, who account for the long tail in home claims. A shared frailty model estimates this latent heterogeneity from the claim history and turns it into a credibility-adjusted risk score.

The practical output: a per-policy multiplier (the posterior frailty) that says "after seeing this policyholder's full claim history, we believe they're 1.8x as likely to claim as an average risk with the same rating factors." Use it to load renewal premiums, trigger referrals to underwriters, or identify mis-priced portfolios.

Why there's no existing Python library for this

lifelines is the standard Python survival library. GitHub issue #878 requested shared frailty in 2017 and was closed as "maybe someday." scikit-survival doesn't support recurrent events at all. The production tools are frailtypack and reReg in R.

This library fills that gap.

What it does

SharedFrailtyModel: fits gamma shared frailty via EM algorithm. Each policyholder gets a latent frailty term that multiplicatively shifts their claim hazard. The EM algorithm alternates between computing the posterior frailty given the data (E-step) and updating the regression coefficients and frailty variance (M-step).
JointFrailtyModel: extends the shared frailty model to handle informative censoring from lapse. If high-frailty policyholders lapse more often, standard models underestimate their claim rate because you never see their full claim history. The joint model links the claim and lapse processes through the same frailty term.
RecurrentEventData: converts policy claims histories to counting-process format. Handles gap time vs calendar time, left truncation (mid-term inception), and multiple claim types.
RecurrentEventSimulator: generates synthetic data with known frailty structure. Essential for validating that the EM algorithm recovers the true parameters.
Diagnostics: frailty QQ plots, Cox-Snell residuals, event rate by frailty decile.
FrailtyReport: HTML report suitable for sharing with pricing teams.

Bühlmann credibility connection

The posterior frailty mean is exactly the Bühlmann-Straub credibility estimate:

E[u_i | data] = Z_i * (observed_rate_i) + (1 - Z_i) * 1.0

where Z_i = Lambda_i / (Lambda_i + 1/theta) is the credibility factor. If you're already loading renewal rates with Bühlmann credibility, this model is doing the same thing but in a proper survival analysis framework that handles exposure, censoring, and covariates correctly. theta is the between-policyholder variance — the same parameter as in credibility theory.

Installation

pip install insurance-recurrent

Requires Python 3.10+, NumPy, SciPy, and pandas. No R dependencies.

Optional for HTML reports:

pip install insurance-recurrent[report]

Quick start

from insurance_recurrent import (
    RecurrentEventSimulator,
    RecurrentEventData,
    SharedFrailtyModel,
    FrailtyReport,
)

# Simulate fleet insurance data: 500 trucks, frailty variance = 0.6
sim = RecurrentEventSimulator(
    n_policies=500,
    theta=0.6,
    baseline_rate=0.3,
    coef={"vehicle_age": 0.4, "driver_age_band": -0.2},
    seed=42,
)
data, true_frailty = sim.simulate(return_true_frailty=True)

print(data.summary())
# RecurrentEventData (gap time)
#   Policies:    487
#   Events:      642
#   Intervals:   1129
#   Events/policy: mean=1.32, max=8

# Fit the model
model = SharedFrailtyModel(theta_init=1.0, max_iter=100)
model.fit(data, covariates=["vehicle_age", "driver_age_band"])
model.print_summary()
# ======================================================
# SharedFrailtyModel Summary
# ======================================================
#   Policies:          487
#   Events:            642
#   Log-likelihood:    -1247.3
#   Frailty variance (theta): 0.5812
#   Converged:         True (47 iters)
#
#   Coefficients:
#     vehicle_age               +0.3891
#     driver_age_band           -0.1947

# Get per-policy frailty scores
frailty_scores = model.predict_frailty(data)
# frailty_scores[0] = {
#     'policy_id': 'P000042',
#     'frailty_mean': 1.84,      # 84% more likely to claim than average
#     'credibility_factor': 0.73, # 73% weight on own history
#     'n_events': 4,
#     'exposure': 2.5,
# }

# Generate an HTML report
report = FrailtyReport(model, data, model_name="Fleet Q1 2026")
report.save("frailty_report.html")

Loading claim histories from policy tables

Most actuarial teams have a policies table and a claims table in date format:

import pandas as pd
from insurance_recurrent import RecurrentEventData, SharedFrailtyModel

policies = pd.read_csv("policies.csv")  # policy_id, inception_date, expiry_date, ...
claims = pd.read_csv("claims.csv")      # policy_id, claim_date

data = RecurrentEventData.from_policy_claims(
    policies=policies,
    claims=claims,
    covariate_cols=["vehicle_class", "region"],
    time_scale="gap",  # time resets after each claim
)

model = SharedFrailtyModel()
model.fit(data, covariates=["vehicle_class", "region"])

Handling informative lapse

If you suspect that high-risk policyholders leave your book faster (they buy cheaply elsewhere, or you non-renew them), use the joint model:

from insurance_recurrent import JointFrailtyModel

# lapse_data: one row per policy with lapse_time and lapsed columns
model = JointFrailtyModel(alpha_init=0.5)
model.fit(
    recurrent_data=claim_data,
    lapse_data=lapse_df,
    recurrent_covariates=["vehicle_class"],
)
print(f"Association alpha: {model.association_:.3f}")
# Positive alpha: high-frailty policyholders lapse faster
# => standard model underestimates their true claim rate

When to use this

Good candidates:

Fleet insurance: one policy covers multiple vehicles/drivers, frequent events
Pet insurance: chronic conditions, repeat treatments
Home insurance: maintenance-related claims repeat for the same property

Less useful:

Personal motor: most policyholders have 0 or 1 claim per year — not enough within-policy data to estimate individual frailty
Single-event products: travel, single-trip

Diagnostics

from insurance_recurrent import (
    frailty_qq_data,
    cox_snell_residuals,
    event_rate_by_frailty_decile,
)

# QQ plot data: compare posterior frailty distribution to gamma prior
qq = frailty_qq_data(model, data)
# Plot qq["theoretical"] vs qq["empirical"] — straight line = good fit

# Cox-Snell residuals: should be ~Exp(1) if model correct
resid = cox_snell_residuals(model, data)

# Lift by frailty decile: key diagnostic for actuarial audiences
decile = event_rate_by_frailty_decile(model, data)
print(decile[["decile", "frailty_mean_avg", "observed_rate", "lift"]])

References

Cook, R.J. & Lawless, J.F. (2007). The Statistical Analysis of Recurrent Events. Springer.
Rondeau, V. et al. (2003). Maximum penalized likelihood estimation in a gamma-frailty model. Lifetime Data Analysis.
Bühlmann, H. & Gisler, A. (2005). A Course in Credibility Theory. Springer.
Vaupel, J.W. et al. (1979). The impact of heterogeneity in individual frailty. Demography, 16(3):439–454.
Andersen, P.K. & Gill, R.D. (1982). Cox's regression model for counting processes. Annals of Statistics, 10(4):1100–1120.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_recurrent-0.1.0.tar.gz (39.6 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_recurrent-0.1.0-py3-none-any.whl (34.2 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file insurance_recurrent-0.1.0.tar.gz.

File metadata

Download URL: insurance_recurrent-0.1.0.tar.gz
Upload date: Mar 12, 2026
Size: 39.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_recurrent-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a1d5d2d370be73a3f8062d6184e4bab3c1e85aedb71bb9a08504660c23615e26`
MD5	`d3eeae70e5764d7a464a2f9214721936`
BLAKE2b-256	`d6ed4a4a8d424687211bfa94792ddf61ffafb83134f76d99bd1d22ef5e629fe8`

See more details on using hashes here.

File details

Details for the file insurance_recurrent-0.1.0-py3-none-any.whl.

File metadata

Download URL: insurance_recurrent-0.1.0-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 34.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_recurrent-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a46bc6bb95c0d4ff158e2475de65827f3b66a32657a78638a83ec0e8570acfd`
MD5	`e3dad2834ed5657468878c2bc4c92d8f`
BLAKE2b-256	`e3d592819b704ce0849dac057158d46fa6a979640c21642ff9dc1413399698ba`

See more details on using hashes here.

insurance-recurrent 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-recurrent

The problem

Why there's no existing Python library for this

What it does

Bühlmann credibility connection

Installation

Quick start

Loading claim histories from policy tables

Handling informative lapse

When to use this

Diagnostics

References

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes