Automatic Debiased ML via Riesz Representers for continuous treatment causal inference in UK personal lines insurance pricing

These details have not been verified by PyPI

Project links

Project description

insurance-autodml

Automatic Debiased ML via Riesz Representers for continuous treatment causal inference in UK personal lines insurance pricing.

The problem

You want to know: "If I increase this policyholder's premium by £20, by how much does their claim probability change?" Or: "What happens to average claims if I raise all renewals 5%?"

These are causal questions. OLS on observed premiums and claims is biased because premiums are set by an underwriting model that already incorporates risk — high-risk policyholders are charged more, so premium and claims are positively correlated through confounding, not causal structure.

Standard Double ML handles this for discrete or well-behaved treatments. But in UK motor/home insurance, the treatment (premium) is continuous, and the standard approach requires estimating the generalised propensity score (GPS) — the conditional density p(D|X). This is numerically unstable when:

Renewal rates vary from 80% at low premiums to 20% at high premiums (selection creates heavy tails)
Premium distributions are multimodal (tiered pricing bands)
High-premium policyholders are sparse but influential

The Riesz representer approach (Chernozhukov et al. 2022) bypasses the GPS entirely. It directly estimates the reweighting functional via a minimax regression, which is stable even at the extremes of the treatment distribution.

What this library estimates

Average Marginal Effect (AME): E[dE[Y|D,X]/dD] — the average derivative of the outcome with respect to premium. This is your price elasticity.

Dose-response curve: E[Y(d)] for a grid of premium values. Answers "what would average claims be if everyone paid £d?"

Policy shift effect: E[Y(D*(1+delta))] - E[Y]. Answers "what if we raised all premiums 5%?"

Selection-corrected elasticity: All of the above, but corrected for the renewal selection bias problem — claims are only observed for policies that renew.

Installation

pip install insurance-autodml

For CatBoost nuisance models:

pip install "insurance-autodml[catboost]"

For HTML reports:

pip install "insurance-autodml[reports]"

Quick start

from insurance_autodml import PremiumElasticity, SyntheticContinuousDGP

# Generate synthetic data (or use your own)
dgp = SyntheticContinuousDGP(n=5000, outcome_family="gaussian", random_state=42)
X, D, Y, _ = dgp.generate()

# Fit the AME estimator
model = PremiumElasticity(
    outcome_family="gaussian",
    n_folds=5,
    random_state=0,
)
model.fit(X, D, Y)
result = model.estimate()

print(result.summary())
# estimate=-0.0021  se=0.0003  95% CI=[-0.0027, -0.0015]  p=0.0000***

# True AME for comparison
print(f"True AME: {dgp.true_ame_:.4f}")

Price elasticity with exposure (motor claims)

from insurance_autodml import PremiumElasticity

# D: annual premium (£), Y: claim count, exposure: years at risk
model = PremiumElasticity(
    outcome_family="poisson",
    n_folds=5,
)
model.fit(X, D, Y_claims, exposure=years_at_risk)
result = model.estimate()
# Interpretation: change in claim RATE per £1 premium increase

Dose-response curve

from insurance_autodml import DoseResponseCurve
import numpy as np

model = DoseResponseCurve(outcome_family="gaussian", n_folds=5)
model.fit(X, D, Y)

d_grid = np.linspace(200, 700, 50)
result = model.predict(d_grid)

# Plot
model.plot(d_grid=d_grid, xlabel="Annual Premium (£)", ylabel="Claim Rate")

Policy shift

from insurance_autodml import PolicyShiftEffect

model = PolicyShiftEffect(outcome_family="gaussian", n_folds=5)
model.fit(X, D, Y)

# What happens if all premiums increase 5%?
result = model.estimate(delta=0.05)
print(result.summary())

# Full curve of effects
effects = model.estimate_curve(np.linspace(-0.10, 0.10, 21))

Handling renewal selection bias

from insurance_autodml import SelectionCorrectedElasticity

# S: renewal indicator (1=renewed, 0=lapsed)
# Y: claims (observed only for renewals; set to 0 or NaN for lapses)
model = SelectionCorrectedElasticity(
    outcome_family="gaussian",
    n_folds=5,
)
model.fit(X, D, Y_observed, S=renewal_indicator)
result = model.estimate()

# Sensitivity analysis: how robust is this to unobserved selection confounding?
bounds = model.sensitivity_bounds(gamma_grid=np.array([1.0, 1.5, 2.0, 3.0]))
for gamma, b in bounds.items():
    print(f"Gamma={gamma}: AME in [{b['lower']:.4f}, {b['upper']:.4f}]")

Segment-level effects

# No refitting required — segments computed from EIF scores
age_bands = pd.cut(age_feature, bins=[17, 25, 35, 50, 65, 100], labels=["17-25", "26-35", "36-50", "51-65", "66+"])
segment_results = model.effect_by_segment(age_bands)

for sr in segment_results:
    print(f"{sr.segment_name}: {sr.result.summary()}")

FCA evidence report

from insurance_autodml import ElasticityReport

report = ElasticityReport(
    estimator=model,
    segment_results=segment_results,
    sensitivity_bounds=bounds,
    analyst="Pricing Team",
)
report.to_html("elasticity_report.html")
report.to_json("elasticity_report.json")

Design choices

Why not GPS-based double ML? The GPS (p(D|X)) requires density estimation in high dimensions. In renewal portfolios, the treatment density has long tails and selection-induced gaps. The Riesz minimax regression is a regression problem — more stable, standard ML machinery applies directly.

Why ForestRiesz over genriesz? We implement our own forest-based Riesz regressor rather than depending on genriesz (which requires JAX). The scikit-learn RandomForest is sufficient for the derivative estimation task and avoids GPU/JAX dependency issues in production insurance environments.

Why 5-fold cross-fitting? Standard in the DML literature. 3 folds for n < 2000; 5 folds is the default sweet spot. More folds give smaller bias but higher variance in the nuisance estimates.

Outcome families: The library uses GradientBoostingRegressor for all families by default (transforming Y for Poisson/Gamma to ensure positivity). CatBoost's native Poisson loss is available via the catboost extra and gives better calibration for claim count models.

References

Chernozhukov et al. (2022). Automatic Debiased Machine Learning of Causal and Structural Effects. Econometrica 90(3):967-1027.
Colangelo & Lee (2020). Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments. arXiv:2004.03036.
Hirshberg & Wager (2021). Augmented minimax linear estimation. Annals of Statistics 49(6):3206-3227.
arXiv:2601.08643. Automatic debiased machine learning and sensitivity analysis for sample selection models.

Related libraries

insurance-causal — binary treatment effects via DoubleML
insurance-elasticity — GLM-based elasticity without causal identification

Built by Burning Cost — insurance pricing tools for practitioners.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_autodml-0.1.0.tar.gz (44.8 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_autodml-0.1.0-py3-none-any.whl (39.9 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file insurance_autodml-0.1.0.tar.gz.

File metadata

Download URL: insurance_autodml-0.1.0.tar.gz
Upload date: Mar 13, 2026
Size: 44.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_autodml-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`96c1127509681c3ad6d192ef37960f8e25e19515fd67ebd8de61d0447d66bce8`
MD5	`c3d7eb38cc4565bfbabc14b4053b04ac`
BLAKE2b-256	`aaaf13ac9e303c669bd25d7fe20e5e31d76a60f564ae35aac9c4b42f4fe622aa`

See more details on using hashes here.

File details

Details for the file insurance_autodml-0.1.0-py3-none-any.whl.

File metadata

Download URL: insurance_autodml-0.1.0-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 39.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_autodml-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`14bcc1ef0161f454b263f859566995e24529f3bb0e464d93002691e4f8b732d7`
MD5	`7e7787ecf9e9585d24de8672e4f460d9`
BLAKE2b-256	`5195e6e36d0d2fcd184d557c150554b9133cbe88c97d25cfc97eadadfa0ed503`

See more details on using hashes here.

insurance-autodml 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-autodml

The problem

What this library estimates

Installation

Quick start

Price elasticity with exposure (motor claims)

Dose-response curve

Policy shift

Handling renewal selection bias

Segment-level effects

FCA evidence report

Design choices

References

Related libraries

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes