Skip to main content

Automatic Debiased ML via Riesz Representers for continuous treatment causal inference in UK personal lines insurance pricing

Project description

insurance-autodml

Automatic Debiased ML via Riesz Representers for continuous treatment causal inference in UK personal lines insurance pricing.

The problem

You want to know: "If I increase this policyholder's premium by £20, by how much does their claim probability change?" Or: "What happens to average claims if I raise all renewals 5%?"

These are causal questions. OLS on observed premiums and claims is biased because premiums are set by an underwriting model that already incorporates risk — high-risk policyholders are charged more, so premium and claims are positively correlated through confounding, not causal structure.

Standard Double ML handles this for discrete or well-behaved treatments. But in UK motor/home insurance, the treatment (premium) is continuous, and the standard approach requires estimating the generalised propensity score (GPS) — the conditional density p(D|X). This is numerically unstable when:

  • Renewal rates vary from 80% at low premiums to 20% at high premiums (selection creates heavy tails)
  • Premium distributions are multimodal (tiered pricing bands)
  • High-premium policyholders are sparse but influential

The Riesz representer approach (Chernozhukov et al. 2022) bypasses the GPS entirely. It directly estimates the reweighting functional via a minimax regression, which is stable even at the extremes of the treatment distribution.

What this library estimates

Average Marginal Effect (AME): E[dE[Y|D,X]/dD] — the average derivative of the outcome with respect to premium. This is your price elasticity.

Dose-response curve: E[Y(d)] for a grid of premium values. Answers "what would average claims be if everyone paid £d?"

Policy shift effect: E[Y(D*(1+delta))] - E[Y]. Answers "what if we raised all premiums 5%?"

Selection-corrected elasticity: All of the above, but corrected for the renewal selection bias problem — claims are only observed for policies that renew.

Installation

pip install insurance-autodml

For CatBoost nuisance models:

pip install "insurance-autodml[catboost]"

For HTML reports:

pip install "insurance-autodml[reports]"

Quick start

from insurance_autodml import PremiumElasticity, SyntheticContinuousDGP

# Generate synthetic data (or use your own)
dgp = SyntheticContinuousDGP(n=5000, outcome_family="gaussian", random_state=42)
X, D, Y, _ = dgp.generate()

# Fit the AME estimator
model = PremiumElasticity(
    outcome_family="gaussian",
    n_folds=5,
    random_state=0,
)
model.fit(X, D, Y)
result = model.estimate()

print(result.summary())
# estimate=-0.0021  se=0.0003  95% CI=[-0.0027, -0.0015]  p=0.0000***

# True AME for comparison
print(f"True AME: {dgp.true_ame_:.4f}")

Price elasticity with exposure (motor claims)

from insurance_autodml import PremiumElasticity

# D: annual premium (£), Y: claim count, exposure: years at risk
model = PremiumElasticity(
    outcome_family="poisson",
    n_folds=5,
)
model.fit(X, D, Y_claims, exposure=years_at_risk)
result = model.estimate()
# Interpretation: change in claim RATE per £1 premium increase

Dose-response curve

from insurance_autodml import DoseResponseCurve
import numpy as np

model = DoseResponseCurve(outcome_family="gaussian", n_folds=5)
model.fit(X, D, Y)

d_grid = np.linspace(200, 700, 50)
result = model.predict(d_grid)

# Plot
model.plot(d_grid=d_grid, xlabel="Annual Premium (£)", ylabel="Claim Rate")

Policy shift

from insurance_autodml import PolicyShiftEffect

model = PolicyShiftEffect(outcome_family="gaussian", n_folds=5)
model.fit(X, D, Y)

# What happens if all premiums increase 5%?
result = model.estimate(delta=0.05)
print(result.summary())

# Full curve of effects
effects = model.estimate_curve(np.linspace(-0.10, 0.10, 21))

Handling renewal selection bias

from insurance_autodml import SelectionCorrectedElasticity

# S: renewal indicator (1=renewed, 0=lapsed)
# Y: claims (observed only for renewals; set to 0 or NaN for lapses)
model = SelectionCorrectedElasticity(
    outcome_family="gaussian",
    n_folds=5,
)
model.fit(X, D, Y_observed, S=renewal_indicator)
result = model.estimate()

# Sensitivity analysis: how robust is this to unobserved selection confounding?
bounds = model.sensitivity_bounds(gamma_grid=np.array([1.0, 1.5, 2.0, 3.0]))
for gamma, b in bounds.items():
    print(f"Gamma={gamma}: AME in [{b['lower']:.4f}, {b['upper']:.4f}]")

Segment-level effects

# No refitting required — segments computed from EIF scores
age_bands = pd.cut(age_feature, bins=[17, 25, 35, 50, 65, 100], labels=["17-25", "26-35", "36-50", "51-65", "66+"])
segment_results = model.effect_by_segment(age_bands)

for sr in segment_results:
    print(f"{sr.segment_name}: {sr.result.summary()}")

FCA evidence report

from insurance_autodml import ElasticityReport

report = ElasticityReport(
    estimator=model,
    segment_results=segment_results,
    sensitivity_bounds=bounds,
    analyst="Pricing Team",
)
report.to_html("elasticity_report.html")
report.to_json("elasticity_report.json")

Design choices

Why not GPS-based double ML? The GPS (p(D|X)) requires density estimation in high dimensions. In renewal portfolios, the treatment density has long tails and selection-induced gaps. The Riesz minimax regression is a regression problem — more stable, standard ML machinery applies directly.

Why ForestRiesz over genriesz? We implement our own forest-based Riesz regressor rather than depending on genriesz (which requires JAX). The scikit-learn RandomForest is sufficient for the derivative estimation task and avoids GPU/JAX dependency issues in production insurance environments.

Why 5-fold cross-fitting? Standard in the DML literature. 3 folds for n < 2000; 5 folds is the default sweet spot. More folds give smaller bias but higher variance in the nuisance estimates.

Outcome families: The library uses GradientBoostingRegressor for all families by default (transforming Y for Poisson/Gamma to ensure positivity). CatBoost's native Poisson loss is available via the catboost extra and gives better calibration for claim count models.

References

  • Chernozhukov et al. (2022). Automatic Debiased Machine Learning of Causal and Structural Effects. Econometrica 90(3):967-1027.
  • Colangelo & Lee (2020). Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments. arXiv:2004.03036.
  • Hirshberg & Wager (2021). Augmented minimax linear estimation. Annals of Statistics 49(6):3206-3227.
  • arXiv:2601.08643. Automatic debiased machine learning and sensitivity analysis for sample selection models.

Related libraries


Built by Burning Cost — insurance pricing tools for practitioners.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_autodml-0.1.0.tar.gz (44.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_autodml-0.1.0-py3-none-any.whl (39.9 kB view details)

Uploaded Python 3

File details

Details for the file insurance_autodml-0.1.0.tar.gz.

File metadata

  • Download URL: insurance_autodml-0.1.0.tar.gz
  • Upload date:
  • Size: 44.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_autodml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 96c1127509681c3ad6d192ef37960f8e25e19515fd67ebd8de61d0447d66bce8
MD5 c3d7eb38cc4565bfbabc14b4053b04ac
BLAKE2b-256 aaaf13ac9e303c669bd25d7fe20e5e31d76a60f564ae35aac9c4b42f4fe622aa

See more details on using hashes here.

File details

Details for the file insurance_autodml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: insurance_autodml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_autodml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 14bcc1ef0161f454b263f859566995e24529f3bb0e464d93002691e4f8b732d7
MD5 7e7787ecf9e9585d24de8672e4f460d9
BLAKE2b-256 5195e6e36d0d2fcd184d557c150554b9133cbe88c97d25cfc97eadadfa0ed503

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page