Joint frequency-severity modelling for insurance pricing: Sarmanov copula, compound distributions, and neural two-part models for UK personal lines claims.

These details have not been verified by PyPI

Project links

Project description

insurance-frequency-severity

Sarmanov copula joint frequency-severity modelling — analytical premium correction without refitting your GLMs.

The problem

Every UK motor pricing team multiplies a Poisson frequency GLM by a Gamma severity GLM and calls it pure premium. This assumes claim count and average severity are independent given the rating factors — they are not.

In UK motor, the NCD structure suppresses borderline claims: policyholders aware of the NCD threshold do not report near-miss incidents. The result is a systematic negative correlation between claim count and average severity. Ignoring this biases the pure premium, and the bias concentrates in your highest-risk accounts. Vernic, Bolancé and Alemany (2022) measured this at €5–55+ per policyholder on a Spanish auto book. The directional effect in UK motor is the same.

Blog post: Your Frequency-Severity Independence Assumption Is Costing You Premium

Why this library?

Standard copulas (Gaussian, Clayton) require a probability integral transform for the discrete frequency margin — and Sklar's theorem is not unique for discrete distributions. The Sarmanov bivariate distribution sidesteps this entirely by working directly with the joint density, giving you analytically closed-form per-policy correction factors without simulation.

IFM estimation means you plug in your already-fitted statsmodels GLM objects. The library estimates the dependence parameter omega on top of your existing models. You do not refit the marginals.

Compared to alternatives

	Independent GLM multiplication	Gaussian copula	Tweedie single model	insurance-frequency-severity
Handles discrete-continuous margins correctly	No (assumption)	Partial (PIT approximation)	N/A	Yes (Sarmanov)
Per-policy correction factors	No	Portfolio average only	N/A	Yes
Uses existing GLM objects	Yes	Requires refitting	No	Yes (IFM)
Test for dependence first	No	No	No	Yes (`DependenceTest`)
AIC/BIC copula comparison	No	No	No	Yes
HTML model report	No	No	No	Yes (`JointModelReport`)

Quickstart

uv add insurance-frequency-severity

import pandas as pd
from insurance_frequency_severity import JointFreqSev, DependenceTest

# Test for dependence before committing to a correction
test = DependenceTest()
test.fit(n=claim_count[claims_mask], s=avg_severity[claims_mask])
print(test.summary())  # Kendall tau, Spearman rho, permutation p-values

# Fit joint model on top of your existing fitted GLMs
policy_df = pd.DataFrame({"claim_count": claim_count, "avg_severity": avg_severity})
model = JointFreqSev(freq_glm=my_nb_glm, sev_glm=my_gamma_glm, copula="sarmanov")
model.fit(policy_df, n_col="claim_count", s_col="avg_severity")

corrections = model.premium_correction()
print(corrections[["mu_n", "mu_s", "correction_factor", "premium_joint"]].describe())

The three methods

Sarmanov copula (primary) — the recommended approach for books with enough data (≥20,000 policyholder-years, ≥2,000 claims). Handles the discrete-continuous mixed margins problem correctly. Per-policy analytical correction factors, no simulation.

Gaussian copula (comparison) — the standard actuarial approach. Uses PIT approximation for the discrete frequency margin. Good for presenting results in familiar terms, or for comparing rho estimates. Returns a portfolio-average correction factor, not per-policy factors.

Garrido conditional fallback (ConditionalFreqSev) — adds claim count N as a covariate in the severity GLM. One extra GLM parameter. More stable on small books where omega estimation from the Sarmanov would be unreliable.

Complete example

import numpy as np
import pandas as pd
import statsmodels.api as sm
from insurance_frequency_severity import (
    JointFreqSev,
    ConditionalFreqSev,
    DependenceTest,
    compare_copulas,
    JointModelReport,
)

rng = np.random.default_rng(42)
n_policies = 5000
claim_count = rng.poisson(0.10, size=n_policies)
avg_severity = np.where(
    claim_count > 0,
    rng.gamma(shape=3.0, scale=800.0, size=n_policies),
    np.nan,
)
X = pd.DataFrame({
    "age": rng.normal(35, 8, n_policies),
    "ncb": rng.normal(5, 2, n_policies),
})
X_const = sm.add_constant(X)
claims_mask = claim_count > 0

my_nb_glm = sm.GLM(
    claim_count, X_const,
    family=sm.families.NegativeBinomial(alpha=0.8),
).fit()
my_gamma_glm = sm.GLM(
    avg_severity[claims_mask], X_const[claims_mask],
    family=sm.families.Gamma(link=sm.families.links.Log()),
).fit()

# Step 1: test for dependence
test = DependenceTest(n_permutations=1000)
test.fit(claim_count[claims_mask], avg_severity[claims_mask])
print(test.summary())

# Step 2: compare copula families
comparison = compare_copulas(claim_count, avg_severity, my_nb_glm, my_gamma_glm)
print(comparison)  # sorted by AIC: sarmanov, gaussian, fgm

# Step 3: fit and correct
policy_df = pd.DataFrame({"claim_count": claim_count, "avg_severity": avg_severity})
model = JointFreqSev(freq_glm=my_nb_glm, sev_glm=my_gamma_glm, copula="sarmanov")
model.fit(policy_df, n_col="claim_count", s_col="avg_severity")
print(model.dependence_summary())  # omega, CI, Spearman rho, AIC/BIC
corrections = model.premium_correction()

# Step 4: generate model report
report = JointModelReport(model, dependence_test=test, copula_comparison=comparison)
report.to_html("pricing_review.html", n=claim_count, s=avg_severity, correction_df=corrections)

Garrido conditional fallback

from insurance_frequency_severity import ConditionalFreqSev

policy_df = pd.DataFrame({"claim_count": claim_count, "avg_severity": avg_severity})

model = ConditionalFreqSev(my_nb_glm, my_gamma_glm)
model.fit(policy_df, n_col="claim_count", s_col="avg_severity")
model.premium_correction()

Use this when you have fewer than 1,000 claims and cannot reliably estimate omega.

Reading the correction factors

premium_correction() returns the factor E[N×S] / (E[N] × E[S]) per policy:

< 1.0: negative dependence. High-count policyholders have lower severity than independence predicts. Independence model overstates their risk.
= 1.0: independence holds.
> 1.0: positive dependence — valid in some commercial lines where large customers have both high frequency and high severity.

For UK motor with typical NCD structure, expect the average correction to be 0.93–0.98, with larger corrections at the high-frequency tail.

Validated performance

On a 30,000-policy synthetic UK motor book with planted Sarmanov dependence (omega=3.5):

Metric	Independence	Sarmanov copula
Portfolio premium bias	−3% to −8%	~0%
High-risk decile correction factor	1.00	1.05–1.15×
Omega recovery relative error	—	10–20%
Fit time	< 1s	< 1s

In a benchmark on 12,000 synthetic policies with latent freq-sev dependence, the Sarmanov correction reduced pure premium MAE vs oracle by 28.6% and portfolio bias from +22.95% to −6.77%.

Always run DependenceTest before fitting. If independence cannot be rejected (p > 0.05) and your book has fewer than 1,000 claims, use ConditionalFreqSev instead.

Full validation notebook: notebooks/databricks_validation.py.

Data requirements

Stable omega estimation requires approximately 20,000 policyholder-years with at least 2,000 claims. The library warns at < 1,000 policies and < 500 claims. Zero-claim policies contribute no information about the dependence parameter — only observed (n > 0, s) pairs enter the likelihood.

Theoretical background

The Sarmanov bivariate distribution:

f(n, s) = f_N(n) * f_S(s) * [1 + omega * phi_1(n) * phi_2(s)]

where phi_1 and phi_2 are bounded kernel functions with zero mean under their marginals. When omega=0 this reduces to the independence model. The key advantage: no probability integral transform is needed for the discrete frequency margin, which is required by Gaussian/Clayton copulas and is not well-defined for discrete distributions.

IFM estimation: fit frequency GLM → fit severity GLM → profile likelihood over omega using only observed (n > 0, s) pairs. Closed-form, no simulation.

Reference: Vernic, Bolancé, Alemany (2022), Insurance: Mathematics and Economics, 102, 111–125.

Limitations

Stable omega estimation requires ≥20,000 policyholder-years and ≥2,000 claims. Smaller books produce wide confidence intervals. Always check DependenceTest first.
Per-policy analytical corrections are only available with copula="sarmanov". Gaussian and FGM copulas return a portfolio-average factor only.
The library wraps statsmodels GLM objects. Non-statsmodels models may work via .predict() but kernel parameters are inferred from statsmodels-specific attributes.
The correction is not recalibrated as the portfolio evolves. If the NCD scale is restructured, re-estimate omega on recent data.

Part of the Burning Cost stack

Takes claims data and your existing fitted GLMs. Feeds Sarmanov-corrected joint premium estimates into insurance-optimise and insurance-conformal. See the full stack

Library	Description
insurance-conformal	Distribution-free prediction intervals — joint frequency-severity coverage guarantees
insurance-credibility	Bühlmann-Straub credibility — blends frequency and severity estimates for thin segments
insurance-monitoring	Model drift detection — monitors frequency and severity calibration separately
insurance-governance	Model validation and MRM governance — sign-off pack for joint frequency-severity models

References

Sarmanov copula foundations

Sarmanov, O.V. (1966). "Generalized normal correlation and two-dimensional Fréchet classes." Soviet Mathematics Doklady, 7, 596–599. (Original Sarmanov bivariate distribution construction.)
Lee, M.T. & Cha, J.H. (2015). "On two general classes of discrete bivariate distributions." The American Statistician, 69(3), 221–230. doi:10.1080/00031305.2015.1044710 (Sarmanov family properties relevant to count-continuous joint models.)

Insurance frequency-severity joint modelling

Vernic, R., Bolancé, C. & Alemany, R. (2022). "Sarmanov distribution for modeling dependence between the frequency and the average severity of insurance claims." Insurance: Mathematics and Economics, 102, 111–125. doi:10.1016/j.insmatheco.2021.11.003
Garrido, J., Genest, C. & Schulz, J. (2016). "Generalized linear models for dependent frequency and severity of insurance claims." Insurance: Mathematics and Economics, 70, 205–215. doi:10.1016/j.insmatheco.2016.06.006
Lee, G. & Shi, P. (2019). "A dependent frequency-severity approach to modeling longitudinal insurance claims." Insurance: Mathematics and Economics, 87, 115–129. doi:10.1016/j.insmatheco.2019.04.004
Czado, C., Kastenmeier, R., Brechmann, E.C. & Min, A. (2012). "A mixed copula model for insurance claims and claim sizes." Scandinavian Actuarial Journal, 4, 278–305. doi:10.1080/03461238.2010.546009
Frees, E.W. & Valdez, E.A. (1998). "Understanding Relationships Using Copulas." North American Actuarial Journal, 2(1), 1–25. doi:10.1080/10920277.1998.10595667 (Foundational copula reference for actuarial dependence modelling.)

Community

Questions? Start a Discussion
Found a bug? Open an Issue
Blog and tutorials: burning-cost.github.io
Training course: Insurance Pricing in Python — Module 4 covers frequency-severity modelling. £97 one-time.

Licence

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.9

Apr 4, 2026

0.2.8

Mar 25, 2026

0.2.7

Mar 22, 2026

0.2.4

Mar 17, 2026

0.2.2

Mar 15, 2026

0.2.1

Mar 14, 2026

0.2.0

Mar 14, 2026

0.1.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_frequency_severity-0.2.9.tar.gz (288.1 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_frequency_severity-0.2.9-py3-none-any.whl (63.9 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file insurance_frequency_severity-0.2.9.tar.gz.

File metadata

Download URL: insurance_frequency_severity-0.2.9.tar.gz
Upload date: Apr 4, 2026
Size: 288.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_frequency_severity-0.2.9.tar.gz
Algorithm	Hash digest
SHA256	`e9bf397e8321053a483861bb4775a917091dc2c743ff7241678972c4d8013a20`
MD5	`e3c72cf6dfb775e1776c09a202f742d5`
BLAKE2b-256	`563709104c9b3196577cc2bbb4be8989c8e593257e4aaab82f8ed3f43add3da8`

See more details on using hashes here.

File details

Details for the file insurance_frequency_severity-0.2.9-py3-none-any.whl.

File metadata

Download URL: insurance_frequency_severity-0.2.9-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 63.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_frequency_severity-0.2.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0c0d12cfff435cd6d70ceb3ee09e4773fc2bbba1aa6a7e064112e8876c715c2c`
MD5	`e4ce3e6e14fbb143b610e007e750d488`
BLAKE2b-256	`daf376b5c299288c12190fbd24b84dd9989d99134ed53dc86e1b7d59cdf92ed6`

See more details on using hashes here.

insurance-frequency-severity 0.2.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-frequency-severity

The problem

Why this library?

Compared to alternatives

Quickstart

The three methods

Complete example

Garrido conditional fallback

Reading the correction factors

Validated performance

Data requirements

Theoretical background

Limitations

Part of the Burning Cost stack

References

Community

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes