Fine-Gray subdistribution hazard regression for competing risks — built for insurance pricing
Project description
insurance-competing-risks
Fine-Gray subdistribution hazard regression for competing risks — built for insurance pricing.
The problem
When a policy can exit in more than one way, standard survival models are wrong.
A motor policy that lapses cannot also generate a mid-term cancellation. A house that burns cannot also flood. Once one event happens, the others are permanently prevented. These are competing risks, and they require a different statistical framework.
The standard fix — fitting a separate Cox model per cause and treating the other causes as censored — answers the wrong question. It tells you how the hazard rate among currently-at-risk subjects changes with covariates. It does not tell you how the probability of a specific exit route changes. For pricing, underwriting, and retention analysis, you almost always want the probability.
Fine and Gray (1999) solved this. Their subdistribution hazard model has a one-to-one correspondence with the Cumulative Incidence Function (CIF): the probability that cause k occurs before time t, given covariates. Fit a Fine-Gray model, and you can directly predict "what is the probability this customer lapses within 12 months?" while properly accounting for mid-term cancellation and claim-driven churn as competing events.
The gap this fills
No pure-Python, pip-installable library provides Fine-Gray regression:
- lifelines: has Aalen-Johansen CIF, no Fine-Gray regression
- scikit-survival: non-parametric CIF from v0.24, no regression
- hazardous: gradient-boosted CIF, no interpretable SHRs
- cmprsk (Python): wraps R via rpy2, requires R runtime
- pydts: discrete time only
insurance-competing-risks fills the gap with a pure NumPy/SciPy implementation.
Insurance use cases
Home insurance — competing perils: model time-to-first-claim where causes are fire, escape of water, flood, and subsidence. The Fine-Gray CIF gives the probability of each peril being the first reported, accounting for the fact that claiming flood prevents a separate subsidence claim on the same policy.
Retention analysis: a policy exits via lapse, mid-term cancellation (MTC), non-taken-up (NTU), or claim-driven churn. Fine-Gray on premium uplift and tenure directly estimates the lapse probability at renewal, properly accounting for competing exits.
Motor claims: first claim type (own damage, TPPD, TPBI, windscreen, theft) as competing events. Useful for understanding which perils drive early claims by risk segment.
Installation
pip install insurance-competing-risks
Quick start
from insurance_competing_risks import FineGrayFitter, AalenJohansenFitter
from insurance_competing_risks.datasets import simulate_insurance_retention
df = simulate_insurance_retention(n=1000, seed=0)
# 1. Non-parametric CIF: what is the marginal lapse probability over time?
aj = AalenJohansenFitter()
aj.fit(df["T"], df["E"], event_of_interest=1)
aj.plot() # step plot with 95% confidence band
# 2. Regression: how does premium uplift affect lapse probability?
fg = FineGrayFitter()
fg.fit(
df[["T", "E", "premium_uplift", "tenure_years", "ncd_years"]],
duration_col="T",
event_col="E",
event_of_interest=1, # lapse
)
print(fg.summary) # SHR, 95% CI, p-value per covariate
# 3. Predict CIF for new customers
import numpy as np
times = np.array([0.25, 0.5, 1.0]) # policy years
cif = fg.predict_cumulative_incidence(df.head(5), times=times)
print(cif) # shape (5, 3): probability of lapsing before each time
# 4. Partial effects: how does a 20% vs 5% premium uplift change lapse risk?
fg.plot_partial_effects_on_outcome("premium_uplift", values=[-0.05, 0.10, 0.30])
Modules
| Module | What it does |
|---|---|
cif |
Aalen-Johansen non-parametric CIF estimator with confidence bands |
fine_gray |
Fine-Gray regression: FineGrayFitter with lifelines-compatible API |
gray_test |
Gray's K-sample test for CIF equality across groups |
metrics |
IPCW Brier score, integrated Brier score, cause-specific C-index, calibration curves |
datasets |
Bone marrow transplant benchmark; synthetic insurance retention data |
plots |
Forest plot, stacked CIF, Brier score over time |
Fine-Gray: the key ideas
The subdistribution hazard for cause k is:
lambda_k(t) = -d/dt log(1 - F_k(t))
where F_k(t) is the CIF. This is modelled proportionally:
lambda_k(t | x) = lambda_k0(t) * exp(beta_k' x)
Because of the one-to-one relationship between the subdistribution hazard and the CIF, exp(beta_k) is the subdistribution hazard ratio (SHR). An SHR of 1.5 for premium uplift means the subdistribution hazard for lapse is 50% higher for each unit increase in premium uplift — which translates directly to a higher CIF (higher lapse probability), though not proportionally.
The key estimation challenge is the extended risk set: subjects who already experienced a competing event remain in the risk set (with downweighted IPCW weights), reflecting that they are still "at risk" of the cause-k event in the subdistribution sense. This is what makes Fine-Gray different from cause-specific Cox.
Model summary output
Fine-Gray Subdistribution Hazard Model
Event of interest: 1
Duration column: T
Event column: E
Log partial-likelihood: -487.3201
coef exp(coef) se(coef) z p lower_95% upper_95%
covariate
premium_uplift 1.52 4.57 0.21 7.24 4.5e-13 1.11 1.93
tenure_years -0.14 0.87 0.03 -4.81 1.5e-06 -0.20 -0.08
ncd_years -0.05 0.95 0.02 -2.50 1.2e-02 -0.09 -0.01
Gray's test
Before fitting a regression model, test whether the CIFs differ between groups:
from insurance_competing_risks import gray_test
result = gray_test(df["T"], df["E"], df["rating_band"], event_of_interest=1)
print(result)
# Gray's 3-Sample CIF Test (cause 1)
# chi^2 = 12.34 df = 2 p = 0.0021
Evaluation
from insurance_competing_risks.metrics import (
competing_risks_brier_score,
integrated_brier_score,
competing_risks_c_index,
)
times = np.linspace(0.1, 2.0, 20)
cif_test = fg.predict_cumulative_incidence(test_df, times=times)
# Brier score at each time
bs = competing_risks_brier_score(
cif_test, test_df["T"], test_df["E"],
train_df["T"], train_df["E"],
times, event_of_interest=1
)
# Integrated Brier Score
ibs = integrated_brier_score(
cif_test, test_df["T"], test_df["E"],
train_df["T"], train_df["E"],
times, event_of_interest=1
)
print(f"IBS: {ibs:.4f}") # lower is better; 0.25 = useless model
References
Fine, J.P. & Gray, R.J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association, 94(446), 496–509.
Gray, R.J. (1988). A class of K-sample tests for comparing the cumulative incidence of a competing risk. Annals of Statistics, 16(3), 1141–1154.
Milhaud, X. & Dutang, C. (2018). Lapse tables for lapse risk management in insurance: a competing risk approach. European Actuarial Journal, 8(1), 97–126.
Putter, H., Fiocco, M. & Geskus, R.B. (2007). Tutorial in biostatistics: Competing risks and multi-state models. Statistics in Medicine, 26(11), 2389–2430.
Part of the Burning Cost insurance pricing library ecosystem.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_competing_risks-0.1.0.tar.gz.
File metadata
- Download URL: insurance_competing_risks-0.1.0.tar.gz
- Upload date:
- Size: 34.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
667bdd1045f98ffa1f4e7c6c02a9e6e215a4234551029e0aa362eaca9a2e0d9d
|
|
| MD5 |
c48d35093517dbfe0087e3b5457a40c4
|
|
| BLAKE2b-256 |
7d9b7e69900373e89e0fc3a9970a7735c4a3ccd7933af63339e9af0514646d41
|
File details
Details for the file insurance_competing_risks-0.1.0-py3-none-any.whl.
File metadata
- Download URL: insurance_competing_risks-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82702143efb7c8a8fd8826e69540174571ff6667c7b70e6a9d06ff2ee6fe010d
|
|
| MD5 |
0c80fe330c881d75badd3eca886d4f00
|
|
| BLAKE2b-256 |
22e68473b66109983a7167c4ca200ffbc243507e4b63a22e078f2e187ab05c52
|