Skip to main content

Fine-Gray subdistribution hazard regression for competing risks — built for insurance pricing

Project description

insurance-competing-risks

Fine-Gray subdistribution hazard regression for competing risks — built for insurance pricing.

The problem

When a policy can exit in more than one way, standard survival models are wrong.

A motor policy that lapses cannot also generate a mid-term cancellation. A house that burns cannot also flood. Once one event happens, the others are permanently prevented. These are competing risks, and they require a different statistical framework.

The standard fix — fitting a separate Cox model per cause and treating the other causes as censored — answers the wrong question. It tells you how the hazard rate among currently-at-risk subjects changes with covariates. It does not tell you how the probability of a specific exit route changes. For pricing, underwriting, and retention analysis, you almost always want the probability.

Fine and Gray (1999) solved this. Their subdistribution hazard model has a one-to-one correspondence with the Cumulative Incidence Function (CIF): the probability that cause k occurs before time t, given covariates. Fit a Fine-Gray model, and you can directly predict "what is the probability this customer lapses within 12 months?" while properly accounting for mid-term cancellation and claim-driven churn as competing events.

The gap this fills

No pure-Python, pip-installable library provides Fine-Gray regression:

  • lifelines: has Aalen-Johansen CIF, no Fine-Gray regression
  • scikit-survival: non-parametric CIF from v0.24, no regression
  • hazardous: gradient-boosted CIF, no interpretable SHRs
  • cmprsk (Python): wraps R via rpy2, requires R runtime
  • pydts: discrete time only

insurance-competing-risks fills the gap with a pure NumPy/SciPy implementation.

Insurance use cases

Home insurance — competing perils: model time-to-first-claim where causes are fire, escape of water, flood, and subsidence. The Fine-Gray CIF gives the probability of each peril being the first reported, accounting for the fact that claiming flood prevents a separate subsidence claim on the same policy.

Retention analysis: a policy exits via lapse, mid-term cancellation (MTC), non-taken-up (NTU), or claim-driven churn. Fine-Gray on premium uplift and tenure directly estimates the lapse probability at renewal, properly accounting for competing exits.

Motor claims: first claim type (own damage, TPPD, TPBI, windscreen, theft) as competing events. Useful for understanding which perils drive early claims by risk segment.

Installation

pip install insurance-competing-risks

Quick start

from insurance_competing_risks import FineGrayFitter, AalenJohansenFitter
from insurance_competing_risks.datasets import simulate_insurance_retention

df = simulate_insurance_retention(n=1000, seed=0)

# 1. Non-parametric CIF: what is the marginal lapse probability over time?
aj = AalenJohansenFitter()
aj.fit(df["T"], df["E"], event_of_interest=1)
aj.plot()  # step plot with 95% confidence band

# 2. Regression: how does premium uplift affect lapse probability?
fg = FineGrayFitter()
fg.fit(
    df[["T", "E", "premium_uplift", "tenure_years", "ncd_years"]],
    duration_col="T",
    event_col="E",
    event_of_interest=1,  # lapse
)
print(fg.summary)  # SHR, 95% CI, p-value per covariate

# 3. Predict CIF for new customers
import numpy as np
times = np.array([0.25, 0.5, 1.0])  # policy years
cif = fg.predict_cumulative_incidence(df.head(5), times=times)
print(cif)  # shape (5, 3): probability of lapsing before each time

# 4. Partial effects: how does a 20% vs 5% premium uplift change lapse risk?
fg.plot_partial_effects_on_outcome("premium_uplift", values=[-0.05, 0.10, 0.30])

Modules

Module What it does
cif Aalen-Johansen non-parametric CIF estimator with confidence bands
fine_gray Fine-Gray regression: FineGrayFitter with lifelines-compatible API
gray_test Gray's K-sample test for CIF equality across groups
metrics IPCW Brier score, integrated Brier score, cause-specific C-index, calibration curves
datasets Bone marrow transplant benchmark; synthetic insurance retention data
plots Forest plot, stacked CIF, Brier score over time

Fine-Gray: the key ideas

The subdistribution hazard for cause k is:

lambda_k(t) = -d/dt log(1 - F_k(t))

where F_k(t) is the CIF. This is modelled proportionally:

lambda_k(t | x) = lambda_k0(t) * exp(beta_k' x)

Because of the one-to-one relationship between the subdistribution hazard and the CIF, exp(beta_k) is the subdistribution hazard ratio (SHR). An SHR of 1.5 for premium uplift means the subdistribution hazard for lapse is 50% higher for each unit increase in premium uplift — which translates directly to a higher CIF (higher lapse probability), though not proportionally.

The key estimation challenge is the extended risk set: subjects who already experienced a competing event remain in the risk set (with downweighted IPCW weights), reflecting that they are still "at risk" of the cause-k event in the subdistribution sense. This is what makes Fine-Gray different from cause-specific Cox.

Model summary output

Fine-Gray Subdistribution Hazard Model
Event of interest: 1
Duration column: T
Event column: E
Log partial-likelihood: -487.3201

                coef  exp(coef)  se(coef)      z         p  lower_95%  upper_95%
covariate
premium_uplift  1.52       4.57      0.21   7.24  4.5e-13       1.11       1.93
tenure_years   -0.14       0.87      0.03  -4.81  1.5e-06      -0.20      -0.08
ncd_years      -0.05       0.95      0.02  -2.50  1.2e-02      -0.09      -0.01

Gray's test

Before fitting a regression model, test whether the CIFs differ between groups:

from insurance_competing_risks import gray_test

result = gray_test(df["T"], df["E"], df["rating_band"], event_of_interest=1)
print(result)
# Gray's 3-Sample CIF Test (cause 1)
#   chi^2 = 12.34  df = 2  p = 0.0021

Evaluation

from insurance_competing_risks.metrics import (
    competing_risks_brier_score,
    integrated_brier_score,
    competing_risks_c_index,
)

times = np.linspace(0.1, 2.0, 20)
cif_test = fg.predict_cumulative_incidence(test_df, times=times)

# Brier score at each time
bs = competing_risks_brier_score(
    cif_test, test_df["T"], test_df["E"],
    train_df["T"], train_df["E"],
    times, event_of_interest=1
)

# Integrated Brier Score
ibs = integrated_brier_score(
    cif_test, test_df["T"], test_df["E"],
    train_df["T"], train_df["E"],
    times, event_of_interest=1
)
print(f"IBS: {ibs:.4f}")  # lower is better; 0.25 = useless model

References

Fine, J.P. & Gray, R.J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association, 94(446), 496–509.

Gray, R.J. (1988). A class of K-sample tests for comparing the cumulative incidence of a competing risk. Annals of Statistics, 16(3), 1141–1154.

Milhaud, X. & Dutang, C. (2018). Lapse tables for lapse risk management in insurance: a competing risk approach. European Actuarial Journal, 8(1), 97–126.

Putter, H., Fiocco, M. & Geskus, R.B. (2007). Tutorial in biostatistics: Competing risks and multi-state models. Statistics in Medicine, 26(11), 2389–2430.


Part of the Burning Cost insurance pricing library ecosystem.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_competing_risks-0.1.0.tar.gz (34.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_competing_risks-0.1.0-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file insurance_competing_risks-0.1.0.tar.gz.

File metadata

  • Download URL: insurance_competing_risks-0.1.0.tar.gz
  • Upload date:
  • Size: 34.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_competing_risks-0.1.0.tar.gz
Algorithm Hash digest
SHA256 667bdd1045f98ffa1f4e7c6c02a9e6e215a4234551029e0aa362eaca9a2e0d9d
MD5 c48d35093517dbfe0087e3b5457a40c4
BLAKE2b-256 7d9b7e69900373e89e0fc3a9970a7735c4a3ccd7933af63339e9af0514646d41

See more details on using hashes here.

File details

Details for the file insurance_competing_risks-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: insurance_competing_risks-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 28.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_competing_risks-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 82702143efb7c8a8fd8826e69540174571ff6667c7b70e6a9d06ff2ee6fe010d
MD5 0c80fe330c881d75badd3eca886d4f00
BLAKE2b-256 22e68473b66109983a7167c4ca200ffbc243507e4b63a22e078f2e187ab05c52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page