Skip to main content

Causal price elasticity estimation and FCA-compliant renewal pricing optimisation for UK insurance

Project description

insurance-elasticity

PyPI Python Tests License

Causal price elasticity estimation and FCA PS21/5-compliant renewal pricing optimisation for UK personal lines — because naively regressing renewal flag on price in a formula-rated book measures confounding, not elasticity.


Why bother

Benchmarked against naive OLS elasticity (logistic regression with confounders) on 50,000 synthetic UK motor renewal records with known DGP. Results from Databricks run, 2026-03-16.

Metric OLS Naive DML (HistGBM nuisance)
ATE relative bias (prob scale) 24.5% 21.8%
NCD GATE RMSE 0.0855 0.0448 (-47.6%)
95% CI covers true ATE No Yes
Fit time (35k train) 2.5s 6.4s

OLS in a formula-rated book measures the correlation between risk level and renewal propensity, not the causal price effect. DML residualises both outcome and price on the same confounder set, recovering a credible causal semi-elasticity. The key advantage is in segment-level heterogeneous effects: DML recovers NCD-band elasticity gradients 47.6% more accurately.


Run on Databricks


The problem

UK motor and home insurance pricing teams want to know one thing: if we increase this customer's renewal price by 10%, how much does their probability of renewing fall?

The naive answer — run a logistic regression of renewal flag on price, read off the coefficient — is wrong. Risk factors drive both the price (because we re-rate them into the premium) and the renewal decision (because higher-risk customers may also have fewer alternatives). Ordinary regression conflates the two.

Double Machine Learning (DML) separates them. It residualises both the outcome and the treatment on the same set of observable confounders, then estimates the causal effect from what's left. Applied to renewal data, it gives a semi-elasticity: the expected change in renewal probability per unit change in log price, controlling for everything in your rating factors.

This library wraps EconML's CausalForestDML and LinearDML to do exactly that, with insurance-specific defaults and an FCA-compliant pricing optimiser built in.

Blog post: Your Renewal Pricing Is Flying Blind — why OLS gives the wrong answer, the near-deterministic price problem in practice, and how to interpret CATE heterogeneity for UK personal lines.


What you get

  • Heterogeneous elasticity estimates: per-customer CATE and segment-level GATE (group average treatment effects by NCD band, age, channel, etc.)
  • Treatment variation diagnostics: flags the near-deterministic price problem before you fit — if your pricing grid leaves no residual variation, the results are meaningless
  • Elasticity surface: heatmap and bar chart of elasticity across two dimensions simultaneously
  • FCA PS21/5-compliant optimiser: maximises profit subject to the ENBP constraint (offer price <= equivalent new business price)
  • ENBP audit: per-policy FCA ICOBS 6B.2 compliance flag
  • Portfolio demand curve: renewal rate and expected profit across a sweep of price changes

Install

uv add "insurance-elasticity[all]"
# or
pip install "insurance-elasticity[all]"

Core dependencies: polars, numpy, scipy, scikit-learn. Optional (for fitting): econml>=0.15, catboost>=1.2. Optional (for plotting): matplotlib>=3.7.


Quick start

from insurance_elasticity.data import make_renewal_data
from insurance_elasticity.fit import RenewalElasticityEstimator
from insurance_elasticity.surface import ElasticitySurface
from insurance_elasticity.optimise import RenewalPricingOptimiser
from insurance_elasticity.diagnostics import ElasticityDiagnostics
from insurance_elasticity.demand import demand_curve

# 1. Load data (or use the synthetic generator for testing)
df = make_renewal_data(n=50_000)

# 2. Check treatment variation before fitting
diag = ElasticityDiagnostics()
report = diag.treatment_variation_report(
    df,
    treatment="log_price_change",
    confounders=["age", "ncd_years", "vehicle_group", "region", "channel"],
)
print(report.summary())
# If report.weak_treatment is True, read the suggestions before proceeding.

# 3. Fit the elasticity model
confounders = ["age", "ncd_years", "vehicle_group", "region", "channel"]
est = RenewalElasticityEstimator(
    cate_model="causal_forest",   # non-parametric CATE surface
    n_estimators=200,
    catboost_iterations=500,
    n_folds=5,
)
est.fit(df, outcome="renewed", treatment="log_price_change", confounders=confounders)

# 4. Average treatment effect
ate, lb, ub = est.ate()
print(f"ATE: {ate:.3f}  95% CI: [{lb:.3f}, {ub:.3f}]")
# A 1-unit increase in log price change reduces renewal by |ATE| percentage points.
# For a 10% price increase (log change approx 0.095), effect approx ATE * 0.095.

# 5. Segment-level elasticity
gate = est.gate(df, by="ncd_years")
print(gate)

# 6. Elasticity surface and plots
surface = ElasticitySurface(est)
fig = surface.plot_surface(df, dims=["ncd_years", "age_band"])
fig.savefig("elasticity_surface.png", dpi=150, bbox_inches="tight")

fig2 = surface.plot_gate(df, by="channel")
fig2.savefig("gate_by_channel.png", dpi=150, bbox_inches="tight")

# 7. FCA-compliant pricing optimisation
opt = RenewalPricingOptimiser(
    est,
    technical_premium_col="tech_prem",
    enbp_col="enbp",
    floor_loading=1.0,
)
priced_df = opt.optimise(df, objective="profit")

# 8. Compliance audit
audit = opt.enbp_audit(priced_df)
print(f"Breaches: {(audit['compliant'] == False).sum()} / {len(audit)}")

# 9. Portfolio demand curve
demand_df = demand_curve(est, df, price_range=(-0.25, 0.25, 50))

Worked Example

price_elasticity_optimisation.py covers the complete DML workflow: elasticity estimation on a synthetic 50,000-policy motor book, heterogeneous CATE broken down by NCD band, channel, and age, an ENBP-constrained profit-maximising optimiser, and an efficient frontier showing the renewal rate versus expected profit trade-off across price change scenarios. Run it before fitting on your own data to understand how each component behaves.


The near-deterministic price problem

Insurance re-rating makes the offered price nearly a deterministic function of the observable risk factors. When Var(D~) / Var(D) < 10% — that is, less than 10% of price variation remains after conditioning on X — DML has almost nothing to work with. The confidence intervals blow up and the point estimate is noise.

Always run ElasticityDiagnostics.treatment_variation_report() first. If weak_treatment is True, do not proceed to fitting without addressing it.

The report's suggestions cover the main remedies: A/B price tests, panel data with within-customer variation, quasi-experiments from bulk re-rates, and the PS21/5 regression discontinuity.


FCA PS21/5 and ENBP

Since January 2022, UK GI firms must not quote a renewing customer a price above the equivalent new business price (ENBP). The RenewalPricingOptimiser enforces this as a hard per-policy constraint. The enbp_audit() method returns a per-row compliance flag for reporting to the compliance function.


Treatment variable

The standard treatment is log(offer_price / last_year_price). This gives a semi-elasticity directly: a 1-unit change in D (100% price increase) changes renewal probability by theta percentage points. For the typical 5-20% renewal re-rates in UK personal lines, interpret as: a 10% increase changes renewal probability by approximately ATE * log(1.1) approx ATE * 0.095.


Model choices

CausalForestDML (default): non-parametric, requires no pre-specified feature interactions, provides valid pointwise confidence intervals via honest splitting. Right for the elasticity surface. Computationally heavier.

LinearDML: assumes constant elasticity (or heterogeneity only through explicitly interacted features). Much faster. Right for quick portfolio-level ATE estimation.

CatBoost nuisance models: UK insurance data is full of categoricals (region, vehicle group, occupation, payment method). CatBoost is the default nuisance model choice. Note: the library currently one-hot encodes categorical columns before fitting (via _extract_arrays), so the native CatBoost categorical handling is not active. Passing pre-encoded features or a custom outcome_model / treatment_model will get you there faster than the default path.


Performance

Benchmarked against naive OLS elasticity (logistic regression with confounders) on 50,000 synthetic UK motor renewal records with known DGP (70/15/15 train/cal/test split). Run on Databricks serverless compute, 2026-03-16. See benchmarks/run_benchmark.py for full methodology.

DGP design: High-risk customers (low NCD, young, group D-F vehicle) face larger systematic price increases AND have lower base renewal probability. This creates positive confounding bias in OLS — the naive regression conflates the risk effect with the price effect.

Metrics are on the probability scale (average marginal effect: expected change in P(renew) per unit of log price change). True ATEs by NCD band range from -0.28 (NCD 0, most elastic) to -0.10 (NCD 5, least elastic).

Metric OLS logistic AME DML (HistGBM nuisance)
Portfolio ATE estimate -0.194 -0.122
True portfolio ATE -0.156 -0.156
ATE relative bias 24.5% 21.8%
95% CI covers true ATE N/A Yes
NCD GATE RMSE 0.0855 0.0448 (-47.6%)
Fit time (35k train) 2.5s 6.4s

Where DML wins: The 47.6% reduction in NCD GATE RMSE is the key result. OLS misranks the NCD bands because the confounding is unevenly distributed (low-NCD customers face both the largest systematic price increases and the lowest base renewal rates). DML's cross-fitting removes this. The segment-level heterogeneity — who is most price-sensitive — is what actually feeds into pricing decisions, not the portfolio average.

The ATE comparison: Both methods have meaningful bias in this partially-observable setting. The important difference is that DML provides a valid 95% confidence interval (covers the true value) while OLS has no interval at all. OLS is a point estimate from a mis-specified model; DML is an estimate with honest uncertainty quantification.

When to expect larger OLS bias: The benchmark uses price_variation_sd=0.08 — enough exogenous variation to identify the effect. In a tighter pricing grid (less A/B testing, more formula-driven re-rating), OLS bias will increase substantially while DML remains consistent.


References

  • Chernozhukov et al. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1).
  • Athey & Wager (2019). Estimating treatment effects with causal forests. Annals of Statistics, 47(2).
  • Guelman & Guillén (2014). A causal inference approach to measure price elasticity in automobile insurance. Expert Systems with Applications, 41(2).
  • FCA PS21/5 (2021). General Insurance Pricing Practices Policy Statement.

Worked Example

price_elasticity_optimisation.py — DML elasticity estimation from renewal data, ENBP-constrained optimiser, efficient frontier visualisation.

A Databricks-importable version is also available: Databricks notebook.


Related Libraries

Library What it does
insurance-demand Conversion, retention, and demand curve modelling — elasticity estimates feed directly into demand curve construction
insurance-optimise Constrained rate change optimisation — consumes elasticity estimates to find profit-maximising factor adjustments
insurance-causal Double Machine Learning for causal treatment effects — the methodological foundation for causal elasticity estimation

Other Burning Cost libraries

Model building

Library Description
shap-relativities Extract rating relativities from GBMs using SHAP
insurance-interactions Automated GLM interaction detection via CANN and NID scores
insurance-cv Walk-forward cross-validation respecting IBNR structure

Uncertainty quantification

Library Description
insurance-conformal Distribution-free prediction intervals for Tweedie models
bayesian-pricing Hierarchical Bayesian models for thin-data segments
insurance-credibility Bühlmann-Straub credibility weighting

Deployment and optimisation

Library Description
insurance-deploy Champion/challenger framework with ENBP audit logging
insurance-optimise Constrained rate change optimisation with FCA PS21/5 compliance

Governance

Library Description
insurance-fairness Proxy discrimination auditing for UK insurance models
insurance-governance PRA SS1/23 model governance and validation reports
insurance-monitoring Model monitoring: PSI, A/E ratios, Gini drift test

All libraries and blog posts


Licence

MIT. Built by Burning Cost.


Need help implementing this in production? Talk to us.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_elasticity-0.1.6.tar.gz (174.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_elasticity-0.1.6-py3-none-any.whl (35.2 kB view details)

Uploaded Python 3

File details

Details for the file insurance_elasticity-0.1.6.tar.gz.

File metadata

  • Download URL: insurance_elasticity-0.1.6.tar.gz
  • Upload date:
  • Size: 174.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_elasticity-0.1.6.tar.gz
Algorithm Hash digest
SHA256 a3fe44bb827e7e3d821569adcae150532add869c5f3ebdc2a9412160aee9514b
MD5 11c165abc956e8d0f0066e0b1a7c1991
BLAKE2b-256 04bccf85ef47529dbc32009a83f23282bc6b95a0ecfa1329b05414e6f8ed2407

See more details on using hashes here.

File details

Details for the file insurance_elasticity-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: insurance_elasticity-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 35.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_elasticity-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a3521f38639ff003cf4fe2d47ac15ee38879575b43e97d7a6052b0722d8efdc4
MD5 5dabd254ffea1bbe63be39d033ab1fea
BLAKE2b-256 3b457c96d7b66c3d3b67c8b10800c765056e261f2f3ed6a9ad8816004dcfd963

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page