Regression Discontinuity Design for insurance pricing rating thresholds

These details have not been verified by PyPI

Project links

Project description

insurance-rdd

Regression Discontinuity Design for insurance pricing thresholds.

UK insurers embed pricing thresholds — age 25 young driver cliffs, NCD step boundaries, vehicle age cutoffs, territory lines. These generate discontinuous premiums assumed to reflect discontinuous underlying risk. That assumption is almost never formally tested.

Regression Discontinuity Design is the only credible method for this test. It uses the discontinuity in treatment assignment at the threshold to identify a causal effect locally, without randomisation. The question for each threshold: does crossing it cause a corresponding jump in claims outcomes, or is the premium jump larger than the risk justifies?

This library fills the Python gap. rdrobust handles Gaussian outcomes. We add exposure weighting, Poisson/Gamma local GLM, geographic territory boundary RDD, multi-cutoff pooling, and FCA Consumer Duty output.

The problem this solves

Standard RDD packages (rdrobust, rddensity) assume:

Outcome is approximately Gaussian
Observations have equal weight

Insurance data is neither. Claim counts are Poisson with variable exposure (0.1–1.0 policy-years). Running OLS on claims per policy without accounting for exposure differences produces biased estimates and wrong standard errors. There is no existing Python package that handles this correctly.

Additionally, no Python implementation of geographic RDD exists. SpatialRDD is R-only. If you want to test whether a territory boundary discontinuity in claims matches the boundary in premiums, you previously had no Python tool.

Installation

pip install insurance-rdd

For geographic RDD:

pip install insurance-rdd[geo]

Quick start

Age 25 threshold (Poisson, exposure offset)

from insurance_rdd import InsuranceRD, presets

rd = InsuranceRD(
    outcome='claim_count',
    running_var='driver_age_months',
    cutoff=300,                     # 25 * 12 months
    data=df,
    outcome_type='poisson',
    exposure='exposure_years',
    preset=presets.AGE_25,          # loads: donut=3mo, recommended bandwidth, FCA context
)
result = rd.fit()
print(result.summary())
print(result.rate_ratio())          # exp(tau): causal rate ratio at cutoff
print(result.regulatory_report(tariff_relativity=0.70))

Output:

InsuranceRD — claim_count ~ RD at driver_age_months = 300.0
Outcome type : poisson
Bandwidth (h): 28.451  [left / right]
N in bandwidth: 1842 (921 left, 921 right)

                                        Estimate    Std Err              95% CI    p-value
----------------------------------------------------------------------------------------
  tau (robust BC)                         -0.3214     0.0421   [-0.4039, -0.2389]     0.0000

  Rate ratio (exp(tau)): 0.7245  (95% CI: 0.6679, 0.7869)
  Interpretation: crossing the cutoff multiplies the claims rate by this factor.

NCD transitions (multi-cutoff)

from insurance_rdd import MultiCutoffRD

mc_rd = MultiCutoffRD(
    outcome='claim_count',
    running_var='ncd_level',
    cutoffs=[1, 2, 3, 4, 5],
    data=df,
    outcome_type='poisson',
    exposure='exposure_years',
)
result = mc_rd.fit()
print(result.summary())
result.cutoff_effects_df()          # per-cutoff table
result.pooled_effect()              # inverse-variance weighted pooled estimate

Local Poisson GLM (correct likelihood, not weighted OLS)

from insurance_rdd.outcomes import PoissonRD

rd = PoissonRD(
    outcome='claim_count',
    running_var='driver_age_months',
    cutoff=300,
    data=df,
    exposure='exposure_years',
    n_boot=500,
)
result = rd.fit()
print(result.summary())
print(result.rate_ratio())

PoissonRD fits the Poisson log-likelihood directly rather than transforming to rates and running weighted OLS. The difference matters when exposure varies substantially across the bandwidth, or when claim counts are sparse.

Claim severity (Gamma outcome)

from insurance_rdd.outcomes import GammaRD

rd = GammaRD(
    outcome='claim_amount',
    running_var='driver_age_months',
    cutoff=300,
    data=df[df.claim_amount > 0],   # severity conditioning on claim
)
result = rd.fit()
print(result.rate_ratio())          # severity ratio at the cutoff

Territory boundary (Geographic RDD)

from insurance_rdd import GeographicRD

# If you have pre-computed signed distances:
geo_rd = GeographicRD(
    outcome='claim_count',
    treatment_col='territory_band',
    data=df,
    outcome_type='poisson',
    exposure='exposure_years',
    signed_distance_col='dist_to_boundary_m',  # negative = territory A
    border_segment_fes=False,
)

# With a boundary shapefile (requires geopandas):
geo_rd = GeographicRD(
    outcome='claim_count',
    treatment_col='territory_band',
    data=df,
    boundary_file='territory_AB.geojson',
    lat_col='lat',
    lon_col='lon',
    outcome_type='poisson',
    exposure='exposure_years',
    border_segment_fes=True,
    n_segments=10,
)
result = geo_rd.fit()
print(result.summary())

Validity checks

from insurance_rdd.validity import DensityTest, CovariateBalance, PlaceboTest

# McCrary manipulation test.
dt = DensityTest(running_var='driver_age_months', cutoff=300, data=df)
print(dt.fit().summary())

# Covariate balance (predetermined covariates should be smooth at cutoff).
cb = CovariateBalance(
    covariates=['vehicle_group', 'years_licensed'],
    running_var='driver_age_months',
    cutoff=300,
    data=df,
)
print(cb.fit().summary())

# Placebo test at false cutoffs.
pt = PlaceboTest(
    outcome='claim_count',
    running_var='driver_age_months',
    cutoff=300,
    data=df,
    placebo_cutoffs=[264, 282, 318, 336],
    true_tau=result.tau_bc,
)
print(pt.fit().summary())

Regulatory report

from insurance_rdd.report import ThresholdReport, ThresholdReportData

report = ThresholdReport(
    ThresholdReportData(
        rd_result=result,
        density_result=density_result,
        balance_result=balance_result,
        tariff_relativity=0.70,     # your tariff factor at this threshold
        threshold_name='age 25 driver boundary',
    )
)
print(report.markdown())
report.save('age25_threshold_report.md')
report.save('age25_threshold_report.html')

The report compares your tariff relativity to the empirical causal rate ratio. If the tariff factor implies a larger premium reduction than the data supports, the report flags it as a potential FCA Consumer Duty concern and quantifies the discrepancy.

Presets

from insurance_rdd import presets

presets.AGE_25          # age 25 threshold, running variable in months
presets.AGE_25_YEARS    # age 25, running variable in years (discrete)
presets.NCD_STEP        # any NCD level transition
presets.NCD_MAX         # NCD 4→5 (manipulation expected, documented)
presets.VEHICLE_AGE_10  # vehicle age 10 years
presets.VEHICLE_AGE_15  # vehicle age 15 years
presets.TENURE_3        # policy tenure 3 years
presets.SUM_INSURED_100K  # sum insured £100k band

Each preset loads: cutoff value, recommended kernel, donut radius, bandwidth selector, FCA context, and methodological notes.

Design choices

Exposure weighting: InsuranceRD passes exposure as weights to rdrobust and converts counts to rates. This is the weighted OLS approximation, which is correct when the mean-variance relationship is approximately proportional. For sparse count data or highly variable exposure, use PoissonRD which fits the Poisson log-likelihood directly.

Bootstrap CIs for GLM outcomes: PoissonRD and GammaRD use nonparametric bootstrap (500 replications by default). Analytical bias correction for non-Gaussian local regression requires adapting the CCT derivation to the score function of the GLM — non-trivial and deferred to v0.2. The bootstrap is honest and interpretation is straightforward.

Donut RDD: Many insurance running variables are reported at integer precision (driver age in years, vehicle year). This creates heaping at the integers nearest to the cutoff. The donut exclusion removes a neighbourhood around the cutoff before estimation. The default donut for AGE_25 is 3 months.

Geographic RDD: We compute signed distance to the boundary line using geopandas and pyproj, reproject to the local UTM zone for accurate metre-scale distances, then run standard rdrobust with signed distance as the running variable. Border segment fixed effects are added by dividing the boundary into segments and one-hot encoding segment membership as covariates.

FCA Consumer Duty context

FCA Consumer Duty (PS22/9, effective July 2023) requires insurers to demonstrate that pricing factors are risk-reflective. RDD provides causal evidence at each pricing threshold. The formal test is:

H0: log(tariff relativity at c) = tau_hat_poisson

If the tariff applies a 30% premium reduction at age 25 (relativity = 0.70, log = -0.357) and the empirical Poisson RDD estimates tau = -0.32 (rate ratio = 0.726), these are consistent. If the tariff applies a 50% reduction (relativity = 0.50, log = -0.693) for a 25% empirical claims reduction (tau = -0.29), that is evidence of over-pricing at the threshold.

Methodology

Core estimation uses rdrobust's CCT bias-corrected robust inference (Calonico, Cattaneo, Titiunik 2014, Econometrica 82(6):2295-2326). We do not reimplement the CCT bandwidth selection or bias correction — rdrobust does this correctly and efficiently.

Our additions:

Local Poisson/Gamma GLM via scipy.optimize (correct log-likelihood, not OLS on log Y)
Exposure offset in Poisson log-likelihood
Geographic distance computation and border segment fixed effects
Multi-cutoff inverse-variance pooling
Insurance domain presets and FCA regulatory output

References

Calonico, Cattaneo, Titiunik (2014) Econometrica 82(6):2295-2326 — CCT bias-corrected inference
Lee & Lemieux (2010) Journal of Economic Literature 48(2):281-355 — comprehensive survey
Keele & Titiunik (2015) Political Analysis 23(1):127-155 — geographic RDD
Souza et al. (2025) Health Economics DOI:10.1002/hec.4898 — only published insurance RDD
Cattaneo, Jansson, Ma (2018) — rddensity manipulation test
Artis, Ayuso, Guillén (2002) — NCD claim withholding evidence

Burning Cost — pricing tools that actuaries can actually use.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Mar 15, 2026

0.1.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_rdd-0.1.1.tar.gz (59.4 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_rdd-0.1.1-py3-none-any.whl (47.1 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file insurance_rdd-0.1.1.tar.gz.

File metadata

Download URL: insurance_rdd-0.1.1.tar.gz
Upload date: Mar 15, 2026
Size: 59.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_rdd-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`0065dc61a6ed54a4055305103a6f73dcaa4a411ec4744a3e5677cc225acfbb22`
MD5	`d6b4a62ac4004ee1f04c16bf8df27491`
BLAKE2b-256	`ee2f7a4a9806c769f6ff1b37aecdea93466a32db48d87eb5b246a3362cdf5c71`

See more details on using hashes here.

File details

Details for the file insurance_rdd-0.1.1-py3-none-any.whl.

File metadata

Download URL: insurance_rdd-0.1.1-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 47.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_rdd-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5eac0e7d3da7b24c8c2b11138749b9b0f2af622abf739624a9f2c54fd6b8adb2`
MD5	`2900b55230be07f5e941f4292f9bc80d`
BLAKE2b-256	`f440753a68f21771e8afae5d841e95d4819309455a9843c17b7c7b7a0773d782`

See more details on using hashes here.

insurance-rdd 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

insurance-rdd

The problem this solves

Installation

Quick start

Age 25 threshold (Poisson, exposure offset)

NCD transitions (multi-cutoff)

Local Poisson GLM (correct likelihood, not weighted OLS)

Claim severity (Gamma outcome)

Territory boundary (Geographic RDD)

Validity checks

Regulatory report

Presets

Design choices

FCA Consumer Duty context

Methodology

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes