Bunching estimators for insurance threshold gaming detection — exposure-weighted density discontinuity analysis with FCA Consumer Duty reporting

These details have not been verified by PyPI

Project description

insurance-bunching

Bunching estimators for insurance threshold gaming detection.

The problem

Insurance pricing creates sharp incentives at threshold boundaries. When premiums jump at 10,000 miles, policyholders declare 9,999. When NCD resets at 60%, insurers see mass just below the threshold. When sum-insured bands change at £50,000, you get a spike of policies at exactly £50,000.

This is bunching — excess mass in a distribution at a known threshold — and it matters because:

Adverse selection: The risk pool is mispriced if customers gaming the threshold have different claims rates than the segment they appear to be in.
Premium leakage: Mileage understaters cost more claims than their premium reflects.
FCA Consumer Duty: You need evidence that your pricing creates fair outcomes. Unexplained bunching at kinks is a red flag in a regulatory review.

The technique comes from public economics (Saez 2010, Kleven 2016) where it was used to detect taxpayer responses to income tax kinks. The methodology is rigorous, peer-reviewed, and now adapted here for insurance.

No Python implementation existed before this library.

What it does

Fit a polynomial counterfactual to the density away from the threshold. Compare what you observe inside the exclusion window to what you'd expect from the smooth counterfactual. The difference, normalised by the counterfactual density at z*, is the excess mass B.

B = 0: no bunching. B = 1: excess mass equal to one bin-width of counterfactual density. B >> 1: strong, systematic bunching.

The iterative correction handles the fact that bunching mass must come from somewhere — observations that moved to z* are missing from just above it. Without correcting for this, the polynomial overestimates the counterfactual above z* and underestimates B.

Install

pip install insurance-bunching

Quick start

import numpy as np
from insurance_bunching import BunchingEstimator

rng = np.random.default_rng(42)
# 9,000 policies from smooth distribution + 1,000 bunched at £50,000
z = np.concatenate([rng.normal(50_000, 8_000, 9_000), [50_000] * 1_000])

est = BunchingEstimator(z, threshold=50_000, n_boot=200, seed=42)
result = est.fit()
print(result.summary())
fig = est.plot()

Insurance-specific usage

The ExposureWeightedBunching class handles policy DataFrames directly:

import pandas as pd
from insurance_bunching import ExposureWeightedBunching

ewb = ExposureWeightedBunching(
    policies,
    running_var="annual_mileage",
    exposure_col="earned_years",   # exposure weighting — critical for motor
    threshold=10_000,
    threshold_type="kink",         # or "notch" for discrete price jumps
    round_numbers=[5_000, 15_000, 20_000],  # control for rounding at other values
    n_boot=500,
)
result = ewb.fit()
fig = ewb.plot()

Why exposure weighting? A policy file has policies, not risk-years. A policy renewing in June contributes 0.5 earned years. Without weighting, short-period policies and long-period policies are treated identically. The density you actually care about is per-unit-of-exposure, not per-policy.

Scan all thresholds

Test multiple pricing boundaries simultaneously and apply Benjamini-Hochberg FDR correction to control false discoveries:

from insurance_bunching import MultiThresholdScanner

scanner = MultiThresholdScanner(
    z,
    thresholds=[5_000, 10_000, 15_000, 20_000, 25_000],
    n_boot=300,
    fdr_level=0.05,
)
summary = scanner.scan()  # polars DataFrame, sorted by significance
print(summary)
print("Significant thresholds:", scanner.significant_thresholds())

FCA Consumer Duty report

Generate a self-contained HTML report for regulatory submission:

from insurance_bunching import BunchingReport

report = BunchingReport(
    results=[result_mileage, result_sum_insured, result_ncd],
    title="Annual Bunching Analysis — UK Private Motor 2024",
    product_line="UK Private Motor",
    fdr_level=0.05,
)
report.save("bunching_analysis_2024.html")

The report includes: embedded density plots, BH-corrected p-value table, regulatory interpretation boilerplate, methodology appendix with equations and references.

API

BunchingEstimator

BunchingEstimator(
    z,                    # running variable (array-like)
    threshold,            # z* — the kink/notch point
    binwidth=None,        # auto from Silverman/10 if None
    poly_degree=9,        # polynomial degree for counterfactual
    excl_left=2,          # bins excluded left of z*
    excl_right=2,         # bins excluded right of z*
    weights=None,         # per-observation exposure weights
    round_numbers=None,   # list of round numbers to control for
    n_boot=200,           # bootstrap replications
    notch=False,          # True = notch estimator (infer z**)
    t0=None,              # pre-threshold marginal rate (for elasticity)
    t1=None,              # post-threshold marginal rate
    seed=None,
)

BunchingResult attributes:

B — normalised excess mass
B_se — bootstrap standard error
B_ci — (lower, upper) 95% CI
p_value — two-sided p-value for H0: B=0
elasticity — kink elasticity (if t0/t1 provided)
marginal_buncher — z** (if notch=True)
counterfactual — pd.DataFrame with per-bin observed/counterfactual/excess
.summary() — text summary
.plot() — matplotlib Figure

ExposureWeightedBunching

DataFrame-first wrapper. Takes column names instead of arrays. Same output as BunchingEstimator.

MultiThresholdScanner

Runs BunchingEstimator at each threshold, applies Benjamini-Hochberg FDR correction. Returns a polars DataFrame.

BunchingReport

Jinja2 HTML report. Embeds plots as base64 PNG. Applies BH correction across all results.

Algorithm

Bin z into bins of width h anchored at z*
Fit p-th degree polynomial (WLS, exposure-weighted) to bins outside exclusion window [z* - L·h, z* + R·h]
Iterative correction: estimate excess mass → redistribute missing mass above z* → refit → repeat until convergence
B_hat = Σ(observed - counterfactual in window) / counterfactual(z*)
Bootstrap SE: resample observations, re-bin, re-estimate
Kink elasticity: e = B / [z* × log((1-t0)/(1-t1))]
Notch: infer z** from cumulative missing mass above threshold

References

Saez, E. (2010). Do taxpayers bunch at kink points? AEJ: Economic Policy, 2(3), 180-212.
Kleven, H. J. (2016). Bunching estimators. Annual Review of Economics, 8, 435-464.
Einav, L., Finkelstein, A., & Cullen, M. (2010). Estimating welfare in insurance markets. QJE, 125(3), 877-921.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate. JRSS-B, 57(1), 289-300.

License

MIT. Built by Burning Cost.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_bunching-0.1.0.tar.gz (33.3 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_bunching-0.1.0-py3-none-any.whl (24.8 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file insurance_bunching-0.1.0.tar.gz.

File metadata

Download URL: insurance_bunching-0.1.0.tar.gz
Upload date: Mar 11, 2026
Size: 33.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_bunching-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9ab134c3f4207dabe3b5ade4163b78d1042da1b068f13f65d4b44eb89338b21b`
MD5	`22f853954bd8b79acc6753c46df45943`
BLAKE2b-256	`93202b11e0a2124bc9e94a80d30d05d9a7c837275d34e8bb15c09b02a6d14e16`

See more details on using hashes here.

File details

Details for the file insurance_bunching-0.1.0-py3-none-any.whl.

File metadata

Download URL: insurance_bunching-0.1.0-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 24.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_bunching-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ceedb20f9691ab6a667e1baea90f6e1f985ce4755fb5ce1598520cae7bb23c7f`
MD5	`5b51e89010aa77572bf0e28301969b47`
BLAKE2b-256	`72d509112f704f026227cbf6bfff7a83d73de74a0d24abef0fde837758ffcc09`

See more details on using hashes here.

insurance-bunching 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

insurance-bunching

The problem

What it does

Install

Quick start

Insurance-specific usage

Scan all thresholds

FCA Consumer Duty report

API

BunchingEstimator

ExposureWeightedBunching

MultiThresholdScanner

BunchingReport

Algorithm

References

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes