Joint frequency-severity modelling for insurance pricing: Sarmanov copula, compound distributions, and neural two-part models for UK personal lines claims.
Project description
insurance-frequency-severity
Sarmanov copula joint frequency-severity modelling — analytical premium correction without refitting your GLMs.
The problem
Every UK motor pricing team multiplies a Poisson frequency GLM by a Gamma severity GLM and calls it pure premium. This assumes claim count and average severity are independent given the rating factors — they are not.
In UK motor, the NCD structure suppresses borderline claims: policyholders aware of the NCD threshold do not report near-miss incidents. The result is a systematic negative correlation between claim count and average severity. Ignoring this biases the pure premium, and the bias concentrates in your highest-risk accounts. Vernic, Bolancé and Alemany (2022) measured this at €5–55+ per policyholder on a Spanish auto book. The directional effect in UK motor is the same.
Blog post: Your Frequency-Severity Independence Assumption Is Costing You Premium
Why this library?
Standard copulas (Gaussian, Clayton) require a probability integral transform for the discrete frequency margin — and Sklar's theorem is not unique for discrete distributions. The Sarmanov bivariate distribution sidesteps this entirely by working directly with the joint density, giving you analytically closed-form per-policy correction factors without simulation.
IFM estimation means you plug in your already-fitted statsmodels GLM objects. The library estimates the dependence parameter omega on top of your existing models. You do not refit the marginals.
Compared to alternatives
| Independent GLM multiplication | Gaussian copula | Tweedie single model | insurance-frequency-severity | |
|---|---|---|---|---|
| Handles discrete-continuous margins correctly | No (assumption) | Partial (PIT approximation) | N/A | Yes (Sarmanov) |
| Per-policy correction factors | No | Portfolio average only | N/A | Yes |
| Uses existing GLM objects | Yes | Requires refitting | No | Yes (IFM) |
| Test for dependence first | No | No | No | Yes (DependenceTest) |
| AIC/BIC copula comparison | No | No | No | Yes |
| HTML model report | No | No | No | Yes (JointModelReport) |
Quickstart
uv add insurance-frequency-severity
import pandas as pd
from insurance_frequency_severity import JointFreqSev, DependenceTest
# Test for dependence before committing to a correction
test = DependenceTest()
test.fit(n=claim_count[claims_mask], s=avg_severity[claims_mask])
print(test.summary()) # Kendall tau, Spearman rho, permutation p-values
# Fit joint model on top of your existing fitted GLMs
policy_df = pd.DataFrame({"claim_count": claim_count, "avg_severity": avg_severity})
model = JointFreqSev(freq_glm=my_nb_glm, sev_glm=my_gamma_glm, copula="sarmanov")
model.fit(policy_df, n_col="claim_count", s_col="avg_severity")
corrections = model.premium_correction()
print(corrections[["mu_n", "mu_s", "correction_factor", "premium_joint"]].describe())
The three methods
Sarmanov copula (primary) — the recommended approach for books with enough data (≥20,000 policyholder-years, ≥2,000 claims). Handles the discrete-continuous mixed margins problem correctly. Per-policy analytical correction factors, no simulation.
Gaussian copula (comparison) — the standard actuarial approach. Uses PIT approximation for the discrete frequency margin. Good for presenting results in familiar terms, or for comparing rho estimates. Returns a portfolio-average correction factor, not per-policy factors.
Garrido conditional fallback (ConditionalFreqSev) — adds claim count N as a covariate in the severity GLM. One extra GLM parameter. More stable on small books where omega estimation from the Sarmanov would be unreliable.
Complete example
import numpy as np
import pandas as pd
import statsmodels.api as sm
from insurance_frequency_severity import (
JointFreqSev,
ConditionalFreqSev,
DependenceTest,
compare_copulas,
JointModelReport,
)
rng = np.random.default_rng(42)
n_policies = 5000
claim_count = rng.poisson(0.10, size=n_policies)
avg_severity = np.where(
claim_count > 0,
rng.gamma(shape=3.0, scale=800.0, size=n_policies),
np.nan,
)
X = pd.DataFrame({
"age": rng.normal(35, 8, n_policies),
"ncb": rng.normal(5, 2, n_policies),
})
X_const = sm.add_constant(X)
claims_mask = claim_count > 0
my_nb_glm = sm.GLM(
claim_count, X_const,
family=sm.families.NegativeBinomial(alpha=0.8),
).fit()
my_gamma_glm = sm.GLM(
avg_severity[claims_mask], X_const[claims_mask],
family=sm.families.Gamma(link=sm.families.links.Log()),
).fit()
# Step 1: test for dependence
test = DependenceTest(n_permutations=1000)
test.fit(claim_count[claims_mask], avg_severity[claims_mask])
print(test.summary())
# Step 2: compare copula families
comparison = compare_copulas(claim_count, avg_severity, my_nb_glm, my_gamma_glm)
print(comparison) # sorted by AIC: sarmanov, gaussian, fgm
# Step 3: fit and correct
policy_df = pd.DataFrame({"claim_count": claim_count, "avg_severity": avg_severity})
model = JointFreqSev(freq_glm=my_nb_glm, sev_glm=my_gamma_glm, copula="sarmanov")
model.fit(policy_df, n_col="claim_count", s_col="avg_severity")
print(model.dependence_summary()) # omega, CI, Spearman rho, AIC/BIC
corrections = model.premium_correction()
# Step 4: generate model report
report = JointModelReport(model, dependence_test=test, copula_comparison=comparison)
report.to_html("pricing_review.html", n=claim_count, s=avg_severity, correction_df=corrections)
Garrido conditional fallback
from insurance_frequency_severity import ConditionalFreqSev
policy_df = pd.DataFrame({"claim_count": claim_count, "avg_severity": avg_severity})
model = ConditionalFreqSev(my_nb_glm, my_gamma_glm)
model.fit(policy_df, n_col="claim_count", s_col="avg_severity")
model.premium_correction()
Use this when you have fewer than 1,000 claims and cannot reliably estimate omega.
Reading the correction factors
premium_correction() returns the factor E[N×S] / (E[N] × E[S]) per policy:
< 1.0: negative dependence. High-count policyholders have lower severity than independence predicts. Independence model overstates their risk.= 1.0: independence holds.> 1.0: positive dependence — valid in some commercial lines where large customers have both high frequency and high severity.
For UK motor with typical NCD structure, expect the average correction to be 0.93–0.98, with larger corrections at the high-frequency tail.
Validated performance
On a 30,000-policy synthetic UK motor book with planted Sarmanov dependence (omega=3.5):
| Metric | Independence | Sarmanov copula |
|---|---|---|
| Portfolio premium bias | −3% to −8% | ~0% |
| High-risk decile correction factor | 1.00 | 1.05–1.15× |
| Omega recovery relative error | — | 10–20% |
| Fit time | < 1s | < 1s |
In a benchmark on 12,000 synthetic policies with latent freq-sev dependence, the Sarmanov correction reduced pure premium MAE vs oracle by 28.6% and portfolio bias from +22.95% to −6.77%.
Always run DependenceTest before fitting. If independence cannot be rejected (p > 0.05) and your book has fewer than 1,000 claims, use ConditionalFreqSev instead.
Full validation notebook: notebooks/databricks_validation.py.
Data requirements
Stable omega estimation requires approximately 20,000 policyholder-years with at least 2,000 claims. The library warns at < 1,000 policies and < 500 claims. Zero-claim policies contribute no information about the dependence parameter — only observed (n > 0, s) pairs enter the likelihood.
Theoretical background
The Sarmanov bivariate distribution:
f(n, s) = f_N(n) * f_S(s) * [1 + omega * phi_1(n) * phi_2(s)]
where phi_1 and phi_2 are bounded kernel functions with zero mean under their marginals. When omega=0 this reduces to the independence model. The key advantage: no probability integral transform is needed for the discrete frequency margin, which is required by Gaussian/Clayton copulas and is not well-defined for discrete distributions.
IFM estimation: fit frequency GLM → fit severity GLM → profile likelihood over omega using only observed (n > 0, s) pairs. Closed-form, no simulation.
Reference: Vernic, Bolancé, Alemany (2022), Insurance: Mathematics and Economics, 102, 111–125.
Limitations
- Stable omega estimation requires ≥20,000 policyholder-years and ≥2,000 claims. Smaller books produce wide confidence intervals. Always check
DependenceTestfirst. - Per-policy analytical corrections are only available with
copula="sarmanov". Gaussian and FGM copulas return a portfolio-average factor only. - The library wraps statsmodels GLM objects. Non-statsmodels models may work via
.predict()but kernel parameters are inferred from statsmodels-specific attributes. - The correction is not recalibrated as the portfolio evolves. If the NCD scale is restructured, re-estimate omega on recent data.
Part of the Burning Cost stack
Takes claims data and your existing fitted GLMs. Feeds Sarmanov-corrected joint premium estimates into insurance-optimise and insurance-conformal. See the full stack
| Library | Description |
|---|---|
| insurance-conformal | Distribution-free prediction intervals — joint frequency-severity coverage guarantees |
| insurance-credibility | Bühlmann-Straub credibility — blends frequency and severity estimates for thin segments |
| insurance-monitoring | Model drift detection — monitors frequency and severity calibration separately |
| insurance-governance | Model validation and MRM governance — sign-off pack for joint frequency-severity models |
References
Sarmanov copula foundations
- Sarmanov, O.V. (1966). "Generalized normal correlation and two-dimensional Fréchet classes." Soviet Mathematics Doklady, 7, 596–599. (Original Sarmanov bivariate distribution construction.)
- Lee, M.T. & Cha, J.H. (2015). "On two general classes of discrete bivariate distributions." The American Statistician, 69(3), 221–230. doi:10.1080/00031305.2015.1044710 (Sarmanov family properties relevant to count-continuous joint models.)
Insurance frequency-severity joint modelling
- Vernic, R., Bolancé, C. & Alemany, R. (2022). "Sarmanov distribution for modeling dependence between the frequency and the average severity of insurance claims." Insurance: Mathematics and Economics, 102, 111–125. doi:10.1016/j.insmatheco.2021.11.003
- Garrido, J., Genest, C. & Schulz, J. (2016). "Generalized linear models for dependent frequency and severity of insurance claims." Insurance: Mathematics and Economics, 70, 205–215. doi:10.1016/j.insmatheco.2016.06.006
- Lee, G. & Shi, P. (2019). "A dependent frequency-severity approach to modeling longitudinal insurance claims." Insurance: Mathematics and Economics, 87, 115–129. doi:10.1016/j.insmatheco.2019.04.004
- Czado, C., Kastenmeier, R., Brechmann, E.C. & Min, A. (2012). "A mixed copula model for insurance claims and claim sizes." Scandinavian Actuarial Journal, 4, 278–305. doi:10.1080/03461238.2010.546009
- Frees, E.W. & Valdez, E.A. (1998). "Understanding Relationships Using Copulas." North American Actuarial Journal, 2(1), 1–25. doi:10.1080/10920277.1998.10595667 (Foundational copula reference for actuarial dependence modelling.)
Community
- Questions? Start a Discussion
- Found a bug? Open an Issue
- Blog and tutorials: burning-cost.github.io
- Training course: Insurance Pricing in Python — Module 4 covers frequency-severity modelling. £97 one-time.
Licence
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_frequency_severity-0.2.9.tar.gz.
File metadata
- Download URL: insurance_frequency_severity-0.2.9.tar.gz
- Upload date:
- Size: 288.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9bf397e8321053a483861bb4775a917091dc2c743ff7241678972c4d8013a20
|
|
| MD5 |
e3c72cf6dfb775e1776c09a202f742d5
|
|
| BLAKE2b-256 |
563709104c9b3196577cc2bbb4be8989c8e593257e4aaab82f8ed3f43add3da8
|
File details
Details for the file insurance_frequency_severity-0.2.9-py3-none-any.whl.
File metadata
- Download URL: insurance_frequency_severity-0.2.9-py3-none-any.whl
- Upload date:
- Size: 63.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c0d12cfff435cd6d70ceb3ee09e4773fc2bbba1aa6a7e064112e8876c715c2c
|
|
| MD5 |
e4ce3e6e14fbb143b610e007e750d488
|
|
| BLAKE2b-256 |
daf376b5c299288c12190fbd24b84dd9989d99134ed53dc86e1b7d59cdf92ed6
|