Bayesian Causal Forests for insurance pricing — heterogeneous treatment effects with FCA audit reporting

These details have not been verified by PyPI

Project description

insurance-bcf

Bayesian Causal Forests for UK insurance pricing teams.

The problem

A motor insurer applies an 8% rate increase across the book. Aggregate lapse rises 1.8pp. The GLM says the elasticity is −0.22. Job done?

No. That number is the average. Young PCW customers may lapse at 3x the rate of mature direct customers under the same rate increase. If you price to the average elasticity, you overshoot on the sensitive segments and leave margin on the insensitive ones.

BCF (Bayesian Causal Forests) estimates the treatment effect for every policy in the portfolio — not an average. The output is a posterior distribution over the lapse effect for each segment, with credible intervals suitable for FCA audit documentation.

When to use this vs. insurance-elasticity

Use BCF (this library) when:

Treatment is binary or categorical: rate increase applied yes/no, NCD tier change, telematics policy
You want posterior uncertainty over segment effects for FCA EP25/2 audit documentation
Strong confounding is suspected: the risk model drives both the premium and the renewal probability
You want counterfactual analysis: what would have happened if we had not applied the increase to this segment?

Use DML (insurance-elasticity) when:

Treatment is the actual premium level (continuous)
You have exogenous price variation from an A/B test or natural experiment
You want a single elasticity scalar to feed into a rate optimiser

The methods are complementary. Run both. Divergence in segment rankings flags model misspecification in one or both.

The method

BCF runs two separate Bayesian tree ensembles:

Y_i = mu(x_i, pi_hat(x_i)) + tau(x_i) * z_i + epsilon_i

mu — the prognostic function — captures the renewal probability under control (250 trees, expressive prior). tau — the treatment effect function — captures CATE (50 trees, shrink-to-homogeneity prior with alpha=0.25, beta=3).

Including pi_hat explicitly in mu corrects Regularization-Induced Confounding (RIC): the mechanism by which standard BART over-shrinks mu and incorrectly attributes unexplained outcome variance to the treatment. This is not optional for insurance observational data where the risk model drives both premium assignment and renewal probability.

Reference: Hahn, Murray, Carvalho (2020) Bayesian Analysis 15(3): 965-1056.

Engine: stochtree 0.4.0 — the reference Python BCF implementation by the original paper authors (Herren, Hahn, Murray, Carvalho 2025/2026).

Quick start

from insurance_bcf import BayesianCausalForest, ElasticityEstimator, BCFAuditReport
from insurance_bcf.simulate import simulate_renewal, SimulationParams

# Simulate a UK motor renewal dataset (or load your own)
data = simulate_renewal(SimulationParams(n_policies=10_000, random_seed=42))

# Fit the BCF model
model = BayesianCausalForest(
    outcome='binary',       # binary renewal flag
    num_mcmc=500,           # posterior samples
    num_gfr=10,             # GFR warm-start iterations
    random_seed=42,
)
model.fit(
    X=data.X,               # pd.DataFrame of rating factors
    treatment=data.treatment,  # binary: rate increase applied (1) or not (0)
    outcome=data.outcome,   # renewal flag (0/1)
)

# CATE: posterior mean + 95% credible interval per policy
cate_df = model.cate(data.X)
print(cate_df.head())
#    cate_mean  cate_lower  cate_upper  cate_std
# 0   -0.0612     -0.0741     -0.0483    0.0066
# 1   -0.0421     -0.0510     -0.0332    0.0045
# ...

# Segment effects
est = ElasticityEstimator(model)
seg = est.segment_effects(data.X, segment_cols=['age_band', 'channel'])
print(seg)
#   age_band  channel  effect_mean  effect_lower  effect_upper  n_policies
# 0        0        1       -0.082        -0.094        -0.071        1241
# 1        0        0       -0.041        -0.049        -0.033         420
# 2        1        1       -0.035        -0.041        -0.029        3410
# 3        5        0       -0.011        -0.018        -0.004        1892

Young PCW customers (age_band=0, channel=1) are 7.5x more lapse-sensitive than mature direct customers. That is the heterogeneity the GLM missed.

Rate adjustment recommendations

import pandas as pd
import numpy as np

current_premium = pd.Series(np.random.uniform(400, 1200, len(data.X)))

adj = est.optimal_rate_adjustment(
    data.X,
    target_margin=0.05,
    current_premium=current_premium,
    max_adjustment=0.20,
)
print(adj[['suggested_adjustment', 'adjustment_confidence']].head())

Partial dependence

How does the CATE vary with a single feature, after averaging over the distribution of other covariates?

pd_df = est.partial_dependence(data.X, feature='ncb_steps', grid_points=6)
print(pd_df)
# feature_value  pdp_mean  pdp_lower  pdp_upper
#             0    -0.071     -0.082     -0.060
#             1    -0.065     -0.074     -0.055
#             5    -0.031     -0.039     -0.023

Customers with higher NCB are less lapse-sensitive to rate increases — they have more to lose by switching insurer.

FCA EP25/2 audit report

report = BCFAuditReport(model, est)

# Protected characteristic check: does tau vary by age band?
pc_df = report.protected_characteristic_check(
    data.X,
    protected_cols=['age_band'],
)
print(pc_df[['characteristic', 'group', 'effect_mean', 'flag']])

# Render HTML report
report.render(
    output_path='bcf_audit_2024Q4.html',
    X=data.X,
    Z=data.treatment,
    protected_cols=['age_band'],
    segment_cols=[['age_band'], ['channel'], ['age_band', 'channel']],
)

The report documents model configuration, MCMC convergence, segment effects, protected characteristic moderation, and a methodology appendix. It is designed for internal model governance, not FCA submission.

Using pre-computed propensity scores

For insurance applications, passing an external propensity score is preferred over letting BCF estimate it internally. You have domain knowledge about what drives treatment assignment.

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()
lr.fit(data.X, data.treatment)
pi_hat = lr.predict_proba(data.X)[:, 1]

model.fit(
    X=data.X,
    treatment=data.treatment,
    outcome=data.outcome,
    propensity=pi_hat,
)

GIPP date warning

If your dataset spans January 2022 (the FCA GIPP implementation date), BCF will warn you:

import pandas as pd

X_with_dates = data.X.copy()
X_with_dates['renewal_date'] = pd.date_range('2021-06-01', periods=len(data.X), freq='D')

model.fit(
    X_with_dates, data.treatment, data.outcome,
    gipp_date_col='renewal_date'
)
# GIPPBreakWarning: Column 'renewal_date' spans the GIPP implementation date (January 2022).

Serialisation

# Save
json_str = model.to_json()

# Load
model2 = BayesianCausalForest.from_json(json_str, outcome='binary')
cate_df = model2.cate(data.X)

API reference

`BayesianCausalForest`

Parameter	Default	Notes
`outcome`	`'binary'`	`'binary'` activates probit link (tau on latent scale); `'continuous'` for loss ratio
`treatment_trees`	50	Shrink-to-homogeneity prior — do not increase without testing
`prognostic_trees`	250	Expressive prior for mu
`num_mcmc`	500	Retained posterior samples
`num_gfr`	10	GFR warm-start iterations (eliminates burn-in)
`num_chains`	1	Set to 4 for R-hat diagnostics (requires arviz)
`propensity_covariate`	`'prognostic'`	Never `'none'` for observational data
`random_seed`	`None`
`positivity_threshold`	0.05	Propensity scores outside [0.05, 0.95]
`positivity_max_fraction`	0.05	Fraction allowed to violate before error

`ElasticityEstimator`

Method	Returns	Notes
`segment_effects(X, segment_cols)`	`pd.DataFrame`	CATE aggregated by segment
`partial_dependence(X, feature)`	`pd.DataFrame`	CATE vs. single feature
`optimal_rate_adjustment(X, target_margin, current_premium)`	`pd.DataFrame`	Elasticity-weighted adjustments
`portfolio_summary(X)`	`pd.DataFrame`	Aggregate CATE statistics

`BCFAuditReport`

Method	Returns	Notes
`protected_characteristic_check(X, protected_cols)`	`pd.DataFrame`	FCA EP25/2 protected group analysis
`render(output_path, X, ...)`	None	HTML report

Installation

pip install insurance-bcf

stochtree requires a C++ build. Wheels are available for Linux x86_64, macOS (Intel + Apple Silicon), and Windows x86_64. On other architectures, the library falls back to a mock implementation for testing.

pip install stochtree>=0.4.0  # C++ backend
pip install insurance-bcf

For MCMC convergence diagnostics with multi-chain sampling:

pip install insurance-bcf[diagnostics]  # includes arviz

Databricks demo

See notebooks/insurance_bcf_demo.py for the full workflow on synthetic data. Upload to your Databricks workspace and run on any cluster with ML runtime >= 13.0.

Methodology note on binary outcomes

When outcome='binary', BCF uses a probit link function. The treatment effect tau(x) is on the latent normal scale, not the probability scale. The relationship between latent-scale tau and probability-scale lapse effect depends on mu(x):

P(Y=1 | X, Z=1) - P(Y=1 | X, Z=0) = Phi(mu(x) + tau(x)) - Phi(mu(x))

For audit reporting, use posterior_samples(X, marginalise_probit=True) to apply the standard normal CDF approximation. For precise marginalisation, use mu and tau posteriors jointly.

References

Hahn, P.R., Murray, J.S., Carvalho, C.M. (2020). Bayesian Regression Tree Models for Causal Inference. Bayesian Analysis 15(3): 965-1056.
Herren, A., Hahn, P.R., Murray, J.S., Carvalho, C.M. (2025/2026). StochTree. arXiv:2512.12051v2.
Chipman, H.A., George, E.I., McCulloch, R.E. (2010). BART. Annals of Applied Statistics 4(1): 266-298.
He, J., Hahn, P.R. (2021). GFR warm-start algorithm for BART MCMC.
FCA Evaluation Paper EP25/2 (2025). Evaluation of GIPP Remedies.

Built by Burning Cost. Practitioner tools for UK insurance pricing teams.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_bcf-0.1.0.tar.gz (43.3 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_bcf-0.1.0-py3-none-any.whl (33.6 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file insurance_bcf-0.1.0.tar.gz.

File metadata

Download URL: insurance_bcf-0.1.0.tar.gz
Upload date: Mar 12, 2026
Size: 43.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_bcf-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f04d44394872ed943e9f04cb062b725a69de9e79d64c0a265a1149f46d747e3e`
MD5	`e5ab58a05514c2c5bbc45ffc8d0bdc86`
BLAKE2b-256	`ae75fe91b88c970b351cde96a6fb90c99a6b254983f71ff399fd6adbd76a4e12`

See more details on using hashes here.

File details

Details for the file insurance_bcf-0.1.0-py3-none-any.whl.

File metadata

Download URL: insurance_bcf-0.1.0-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 33.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_bcf-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9c2689d18301e213cbf5d73d930bb1482738d2476c880dc103f5a3d32e8fad0b`
MD5	`a0bbb9fcddfd1d880b76f80263d7f583`
BLAKE2b-256	`5370fab82a97ca681c02363c687de14a64ae7e585fff281aae9c7a3405f2a0f8`

See more details on using hashes here.

insurance-bcf 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

insurance-bcf

The problem

When to use this vs. insurance-elasticity

The method

Quick start

Rate adjustment recommendations

Partial dependence

FCA EP25/2 audit report

Using pre-computed propensity scores

GIPP date warning

Serialisation

API reference

BayesianCausalForest

ElasticityEstimator

BCFAuditReport

Installation

Databricks demo

Methodology note on binary outcomes

References

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`BayesianCausalForest`

`ElasticityEstimator`

`BCFAuditReport`