Discrimination-free insurance pricing via Lindholm marginalisation, causal path decomposition, and Wasserstein barycenter correction

These details have not been verified by PyPI

Project links

Project description

insurance-fairness-ot

Discrimination-free insurance pricing via Lindholm marginalisation, causal path decomposition, and Wasserstein barycenter correction.

The problem

UK insurers face a live regulatory obligation, not a theoretical one. The FCA Consumer Duty (PRIN 2A, live July 2023), Equality Act 2010 Section 19, and ICOBS 6B together require demonstrating that pricing models do not systematically disadvantage customers with protected characteristics. The key word is demonstrate — annual board attestation, documented methodology, sub-group monitoring.

The hard part is that the regulatory standard is conditional fairness (equal price for equal risk), not demographic parity. Young drivers genuinely have more accidents; equalising their premium distribution with older drivers would be actuarially wrong, not fair. Most fairness tooling — including the nearest Python library, EquiPy — targets demographic parity and would over-correct your model.

The correct mathematical framework comes from Lindholm, Richman, Tsanakas and Wüthrich (2022): the discrimination-free price is a marginalisation of the model over the unconditional distribution of the protected attribute, equivalent to the causal do-operator. This library implements that, plus the causal path decomposition from Côté, Genest and Abdallah (2025) to separate direct discrimination, proxy discrimination, and actuarially justified effects.

What it solves that EquiPy doesn't

Requirement	EquiPy	this library
Correct fairness criterion (conditional)	No (demographic parity)	Yes (Lindholm)
Exposure weighting	No	Yes
Causal graph — direct/proxy/justified decomposition	No	Yes
GLM-compatible relativity output	No	Yes
Frequency/severity decomposition	No	Yes
Portfolio bias correction (3 methods)	Implicit	Explicit
UK regulatory output (FCA format)	No	Yes
Polars-native	No (pandas)	Yes

Install

pip install insurance-fairness-ot

Dependencies: numpy, scipy, statsmodels, networkx, POT (Python Optimal Transport), polars.

Quickstart

import polars as pl
import numpy as np
from insurance_fairness_ot import (
    CausalGraph,
    DiscriminationFreePrice,
    FairnessReport,
    FCAReport,
)

# 1. Specify the causal structure of your pricing model
graph = (CausalGraph()
    .add_protected("gender")
    .add_justified_mediator("claims_history", parents=["gender"])
    .add_proxy("annual_mileage", parents=["gender"])
    .add_outcome("claim_freq")
    .add_edge("claims_history", "claim_freq")
    .add_edge("annual_mileage", "claim_freq"))

# 2. Your trained model (must include gender in training)
def my_model(df: pl.DataFrame) -> np.ndarray:
    # e.g. catboost_model.predict(df) or glm.predict(df)
    ...

# 3. Fit the corrector on calibration data
X_calib = pl.read_parquet("calibration_features.parquet")
D_calib = X_calib.select(["gender"])
exposure_calib = X_calib["exposure"].to_numpy()

dfp = DiscriminationFreePrice(
    graph=graph,
    combined_model_fn=my_model,
    correction="lindholm",        # primary: conditional fairness
    bias_correction="proportional",
)
dfp.fit(X_calib, D_calib, exposure=exposure_calib)

# 4. Apply to new business
X_new = pl.read_parquet("new_business.parquet")
D_new = X_new.select(["gender"])
result = dfp.transform(X_new, D_new)

print(result.fair_premium)         # discrimination-free premium
print(result.bias_correction_factor)  # should be close to 1.0

# 5. FCA compliance report
report = FCAReport(
    result,
    report_metadata={
        "firm_name": "Acme Insurance",
        "model_name": "Motor Frequency GLM v3",
        "reporting_date": "2026-03-10",
        "model_version": "3.0",
    }
)
report.save("fca_fair_value_assessment.md", format="markdown")
report.save("fca_fair_value_assessment.json", format="json")

The math

Lindholm marginalisation (primary correction):

h*(x_i) = sum_d mu_hat(x_i, d) * P(D=d)

For each policyholder, predict what the model would output if they were in each protected group, then average weighted by portfolio proportions. This breaks the correlation between X and D, removing both direct and proxy discrimination while preserving actuarially justified effects.

Portfolio bias correction: marginalisation introduces a small bias. Three options:

proportional (default): multiply all fair premiums by E[Y] / E[h*(X)] — preserves relativity ordering, compatible with GLM tables
uniform: additive shift
kl: KL-optimal reweighting of P*(D=d) — maximum entropy approach

Wasserstein barycenter (secondary, for multi-attribute simultaneous correction):

m*(x_i) = Q_bar(F_{d_i}(mu_hat(x_i)))

where Q_bar is the weighted average of per-group quantile functions. Achieves demographic parity. Use after Lindholm for multi-attribute cases.

Causal graph

The graph classifies variables into four roles:

Protected (S): gender, disability, ethnicity — must be removed from pricing effect
Proxy (V): variables that proxy S with no independent causal justification — postcode in some applications, vehicle colour as age proxy
Justified mediator (R): variables caused by or correlated with S but actuarially legitimate — claims history, NCB years
Outcome (Y): claims frequency × severity

The Lindholm marginalisation handles all three paths correctly without you needing to manually intervene on them.

Frequency/severity split

dfp = DiscriminationFreePrice(
    graph=graph,
    frequency_model_fn=freq_model,
    severity_model_fn=sev_model,
    correction="lindholm",
)
result = dfp.fit_transform(X, D, exposure=exposure, y_freq=observed_freq)
# result.freq_fair and result.sev_fair are available separately

GLM relativities

If your downstream system expects multiplicative rating factors, not flat premiums:

corrector = LindholmCorrector(["gender"])
corrector.fit(my_model, X_calib, D_calib)

base_profile = {"vehicle_group": 3, "age_band": "35-44", "ncb": 5, "gender": "F"}
relativities = corrector.get_relativities(my_model, X_new, D_new, base_profile)
# Load these into your GLM parameter table

FCA report output

FCAReport.render() produces nine sections covering PS21/11, EP25/2, and Consumer Duty:

Executive summary with discrimination metrics before/after
Protected characteristics assessed with portfolio shares
Methodology explanation in plain English
Premium impact by group
Causal path attribution
Bias correction documentation
Limitations and governance notes
Equality Act proportionality analysis (template text)
Consumer Duty fair value assessment

Available in markdown, JSON, and HTML.

D paradox

The Lindholm formula requires your model to have been trained with the protected attribute as a feature — you need to predict mu_hat(x, d) for all values of d. This is intentional: including d in training maximises predictive accuracy (the "corrective" fairness family), and marginalisation at prediction time removes the discriminatory effect.

If you cannot collect a protected attribute (common for ethnicity in UK insurance), you must impute P(D|X) from external data (e.g. census postcode distributions). This library flags the gap in the FCA report but does not yet implement the imputation.

Known test values (Lindholm 2022, Example 8)

On the synthetic gender/smoking health insurance example:

h*(smoker) = 0.200 — weighted average of 0.2406 (women smoker rate) × 0.4482 + 0.1667 (men smoker rate) × 0.5518
h*(non-smoker) = 0.184
Portfolio bias = 110.77/112.0 = 0.989
Proportional correction factor = 1.011

These are implemented as regression tests in tests/test_correction.py.

References

Lindholm, Richman, Tsanakas, Wüthrich (2022). Discrimination-Free Insurance Pricing. ASTIN Bulletin 52(1), 55–89.
Côté, Genest, Abdallah (2025). A fair price to pay: Exploiting causal graphs for fairness in insurance. Journal of Risk and Insurance 92(1), 33–75.
Charpentier, Hu, Ratz (2023). Mitigating Discrimination in Insurance with Wasserstein Barycenters. arXiv:2306.12912.

Licence

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_fairness_ot-0.1.0.tar.gz (117.3 kB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_fairness_ot-0.1.0-py3-none-any.whl (24.9 kB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file insurance_fairness_ot-0.1.0.tar.gz.

File metadata

Download URL: insurance_fairness_ot-0.1.0.tar.gz
Upload date: Mar 10, 2026
Size: 117.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_fairness_ot-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d64e86be72115bf60c31716db97a1bb8596dae5a9deb84fc4e390dcb9941e295`
MD5	`a6a57f0ebff0502066e9a71d8b99743e`
BLAKE2b-256	`32126e9b22021f6f13b8a0ac9947a0215471e64688525b0f77b4cb2471f32ae3`

See more details on using hashes here.

File details

Details for the file insurance_fairness_ot-0.1.0-py3-none-any.whl.

File metadata

Download URL: insurance_fairness_ot-0.1.0-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 24.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_fairness_ot-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`abf4f5d8e4f844ac7e9dc94626180bc7ff1821b65817fe3e8368416320d75636`
MD5	`c4b5963ab96d0c6f0464888a743dbde5`
BLAKE2b-256	`90e99f94bcef7cfcafdec87b82e6732deb1f60a4ac1943903dec7fd46f225567`

See more details on using hashes here.

insurance-fairness-ot 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-fairness-ot

The problem

What it solves that EquiPy doesn't

Install

Quickstart

The math

Causal graph

Frequency/severity split

GLM relativities

FCA report output

D paradox

Known test values (Lindholm 2022, Example 8)

References

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes