Discrimination-free insurance pricing via Lindholm marginalisation, causal path decomposition, and Wasserstein barycenter correction
Project description
insurance-fairness-ot
Discrimination-free insurance pricing via Lindholm marginalisation, causal path decomposition, and Wasserstein barycenter correction.
The problem
UK insurers face a live regulatory obligation, not a theoretical one. The FCA Consumer Duty (PRIN 2A, live July 2023), Equality Act 2010 Section 19, and ICOBS 6B together require demonstrating that pricing models do not systematically disadvantage customers with protected characteristics. The key word is demonstrate — annual board attestation, documented methodology, sub-group monitoring.
The hard part is that the regulatory standard is conditional fairness (equal price for equal risk), not demographic parity. Young drivers genuinely have more accidents; equalising their premium distribution with older drivers would be actuarially wrong, not fair. Most fairness tooling — including the nearest Python library, EquiPy — targets demographic parity and would over-correct your model.
The correct mathematical framework comes from Lindholm, Richman, Tsanakas and Wüthrich (2022): the discrimination-free price is a marginalisation of the model over the unconditional distribution of the protected attribute, equivalent to the causal do-operator. This library implements that, plus the causal path decomposition from Côté, Genest and Abdallah (2025) to separate direct discrimination, proxy discrimination, and actuarially justified effects.
What it solves that EquiPy doesn't
| Requirement | EquiPy | this library |
|---|---|---|
| Correct fairness criterion (conditional) | No (demographic parity) | Yes (Lindholm) |
| Exposure weighting | No | Yes |
| Causal graph — direct/proxy/justified decomposition | No | Yes |
| GLM-compatible relativity output | No | Yes |
| Frequency/severity decomposition | No | Yes |
| Portfolio bias correction (3 methods) | Implicit | Explicit |
| UK regulatory output (FCA format) | No | Yes |
| Polars-native | No (pandas) | Yes |
Install
pip install insurance-fairness-ot
Dependencies: numpy, scipy, statsmodels, networkx, POT (Python Optimal Transport), polars.
Quickstart
import polars as pl
import numpy as np
from insurance_fairness_ot import (
CausalGraph,
DiscriminationFreePrice,
FairnessReport,
FCAReport,
)
# 1. Specify the causal structure of your pricing model
graph = (CausalGraph()
.add_protected("gender")
.add_justified_mediator("claims_history", parents=["gender"])
.add_proxy("annual_mileage", parents=["gender"])
.add_outcome("claim_freq")
.add_edge("claims_history", "claim_freq")
.add_edge("annual_mileage", "claim_freq"))
# 2. Your trained model (must include gender in training)
def my_model(df: pl.DataFrame) -> np.ndarray:
# e.g. catboost_model.predict(df) or glm.predict(df)
...
# 3. Fit the corrector on calibration data
X_calib = pl.read_parquet("calibration_features.parquet")
D_calib = X_calib.select(["gender"])
exposure_calib = X_calib["exposure"].to_numpy()
dfp = DiscriminationFreePrice(
graph=graph,
combined_model_fn=my_model,
correction="lindholm", # primary: conditional fairness
bias_correction="proportional",
)
dfp.fit(X_calib, D_calib, exposure=exposure_calib)
# 4. Apply to new business
X_new = pl.read_parquet("new_business.parquet")
D_new = X_new.select(["gender"])
result = dfp.transform(X_new, D_new)
print(result.fair_premium) # discrimination-free premium
print(result.bias_correction_factor) # should be close to 1.0
# 5. FCA compliance report
report = FCAReport(
result,
report_metadata={
"firm_name": "Acme Insurance",
"model_name": "Motor Frequency GLM v3",
"reporting_date": "2026-03-10",
"model_version": "3.0",
}
)
report.save("fca_fair_value_assessment.md", format="markdown")
report.save("fca_fair_value_assessment.json", format="json")
The math
Lindholm marginalisation (primary correction):
h*(x_i) = sum_d mu_hat(x_i, d) * P(D=d)
For each policyholder, predict what the model would output if they were in each protected group, then average weighted by portfolio proportions. This breaks the correlation between X and D, removing both direct and proxy discrimination while preserving actuarially justified effects.
Portfolio bias correction: marginalisation introduces a small bias. Three options:
proportional(default): multiply all fair premiums byE[Y] / E[h*(X)]— preserves relativity ordering, compatible with GLM tablesuniform: additive shiftkl: KL-optimal reweighting ofP*(D=d)— maximum entropy approach
Wasserstein barycenter (secondary, for multi-attribute simultaneous correction):
m*(x_i) = Q_bar(F_{d_i}(mu_hat(x_i)))
where Q_bar is the weighted average of per-group quantile functions. Achieves demographic parity. Use after Lindholm for multi-attribute cases.
Causal graph
The graph classifies variables into four roles:
- Protected (S): gender, disability, ethnicity — must be removed from pricing effect
- Proxy (V): variables that proxy S with no independent causal justification — postcode in some applications, vehicle colour as age proxy
- Justified mediator (R): variables caused by or correlated with S but actuarially legitimate — claims history, NCB years
- Outcome (Y): claims frequency × severity
The Lindholm marginalisation handles all three paths correctly without you needing to manually intervene on them.
Frequency/severity split
dfp = DiscriminationFreePrice(
graph=graph,
frequency_model_fn=freq_model,
severity_model_fn=sev_model,
correction="lindholm",
)
result = dfp.fit_transform(X, D, exposure=exposure, y_freq=observed_freq)
# result.freq_fair and result.sev_fair are available separately
GLM relativities
If your downstream system expects multiplicative rating factors, not flat premiums:
corrector = LindholmCorrector(["gender"])
corrector.fit(my_model, X_calib, D_calib)
base_profile = {"vehicle_group": 3, "age_band": "35-44", "ncb": 5, "gender": "F"}
relativities = corrector.get_relativities(my_model, X_new, D_new, base_profile)
# Load these into your GLM parameter table
FCA report output
FCAReport.render() produces nine sections covering PS21/11, EP25/2, and Consumer Duty:
- Executive summary with discrimination metrics before/after
- Protected characteristics assessed with portfolio shares
- Methodology explanation in plain English
- Premium impact by group
- Causal path attribution
- Bias correction documentation
- Limitations and governance notes
- Equality Act proportionality analysis (template text)
- Consumer Duty fair value assessment
Available in markdown, JSON, and HTML.
D paradox
The Lindholm formula requires your model to have been trained with the protected attribute as a feature — you need to predict mu_hat(x, d) for all values of d. This is intentional: including d in training maximises predictive accuracy (the "corrective" fairness family), and marginalisation at prediction time removes the discriminatory effect.
If you cannot collect a protected attribute (common for ethnicity in UK insurance), you must impute P(D|X) from external data (e.g. census postcode distributions). This library flags the gap in the FCA report but does not yet implement the imputation.
Known test values (Lindholm 2022, Example 8)
On the synthetic gender/smoking health insurance example:
h*(smoker) = 0.200— weighted average of 0.2406 (women smoker rate) × 0.4482 + 0.1667 (men smoker rate) × 0.5518h*(non-smoker) = 0.184- Portfolio bias = 110.77/112.0 = 0.989
- Proportional correction factor = 1.011
These are implemented as regression tests in tests/test_correction.py.
References
- Lindholm, Richman, Tsanakas, Wüthrich (2022). Discrimination-Free Insurance Pricing. ASTIN Bulletin 52(1), 55–89.
- Côté, Genest, Abdallah (2025). A fair price to pay: Exploiting causal graphs for fairness in insurance. Journal of Risk and Insurance 92(1), 33–75.
- Charpentier, Hu, Ratz (2023). Mitigating Discrimination in Insurance with Wasserstein Barycenters. arXiv:2306.12912.
Licence
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_fairness_ot-0.1.0.tar.gz.
File metadata
- Download URL: insurance_fairness_ot-0.1.0.tar.gz
- Upload date:
- Size: 117.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d64e86be72115bf60c31716db97a1bb8596dae5a9deb84fc4e390dcb9941e295
|
|
| MD5 |
a6a57f0ebff0502066e9a71d8b99743e
|
|
| BLAKE2b-256 |
32126e9b22021f6f13b8a0ac9947a0215471e64688525b0f77b4cb2471f32ae3
|
File details
Details for the file insurance_fairness_ot-0.1.0-py3-none-any.whl.
File metadata
- Download URL: insurance_fairness_ot-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
abf4f5d8e4f844ac7e9dc94626180bc7ff1821b65817fe3e8368416320d75636
|
|
| MD5 |
c4b5963ab96d0c6f0464888a743dbde5
|
|
| BLAKE2b-256 |
90e99f94bcef7cfcafdec87b82e6732deb1f60a4ac1943903dec7fd46f225567
|