Local Differential Privacy for discrimination-free insurance pricing. Implements the Zhang/Liu/Shi (2025) correction matrix framework — the insurer never sees the true sensitive attribute.
Project description
insurance-fairness-ldp
Discrimination-free insurance pricing using Local Differential Privacy (LDP). The insurer never sees the true sensitive attribute.
The problem
UK insurers face a genuine bind on ethnicity pricing. GDPR Article 9 makes it legally uncomfortable to collect ethnicity data. FCA's 2025 ethnicity penalty analysis (EP25/2) found a residual £28/year gap in motor premiums that isn't explained by claims risk. The FCA Consumer Duty requires demonstrable fair value. And the Equality Act 2010 Section 19 exposes insurers to indirect discrimination risk via postcode rating.
The standard fairness toolkit (audit models, run counterfactuals, apply Lindholm corrections) requires the insurer to hold the sensitive attribute at some point. That creates the GDPR Article 9 exposure in the first place.
LDP flips the architecture. Policyholders submit a privatised version of their sensitive attribute — one that satisfies epsilon-LDP before it leaves their hands. The insurer never sees the true value. The mathematical correction happens on the privatised data, and the result is a discrimination-free premium that is actuarially valid.
What this library implements
The Zhang/Liu/Shi (arXiv:2504.11775, 2025) correction matrix framework for the Lindholm (2022) discrimination-free pricing formula, operating exclusively on privatised sensitive attributes.
The core formula is Lindholm's::
h*(X) = sum_k f_k(X) * P*(D=k)
where each group model f_k(X) is trained using LDP-corrected sample weights derived from the Pi^{-1} correction matrix, and P*(D) is a reference distribution estimated from the debiased noisy frequencies via T^{-1}.
No existing Python package implements this. OpenDP does k-RR but not the group-specific pricing correction. Fairlearn and AIF360 require the true sensitive attribute. InsurFair (R) implements Lindholm but not under LDP.
Architecture warning
The formal LDP privacy guarantee requires a Trusted Third Party (TTP) architecture: policyholders submit their privatised responses directly to the TTP, not to the insurer. When a single organisation runs this code, the formal privacy guarantee does not apply in the same sense. This library provides the correct mathematical framework for the multi-party case and is suitable for research, simulation, and compliance demonstration. Deploying as a live privacy guarantee requires proper TTP infrastructure.
Quick start
import numpy as np
from sklearn.linear_model import Ridge
from insurance_fairness_ldp import (
KaryRandomisedResponse,
LDPDiscriminationFreePrice,
LDPFairnessReport,
)
# Step 1: Define the LDP mechanism (epsilon controls privacy/accuracy trade-off)
krr = KaryRandomisedResponse(
epsilon=1.0,
categories=["White", "Asian", "Black", "Other"],
)
# Step 2: In a real deployment, policyholders apply k-RR themselves.
# In simulation or research, we apply it:
S_private = krr.privatise(true_ethnicity_array, random_state=42)
# Step 3: Fit discrimination-free pricing model
model = LDPDiscriminationFreePrice(
base_estimator=Ridge(),
mechanism=krr,
reference_dist="marginal", # or supply P*(D) directly
)
model.fit(X_train, S_private_train, y_train)
# Step 4: Generate discrimination-free premiums
premiums = model.predict(X_test)
# Step 5: Generate regulatory report
report = LDPFairnessReport.from_model(
model, X_test, S_private_test, y=y_test
)
report.to_markdown("ldp_fairness_report.md")
print(report.summary())
Unknown epsilon
When epsilon is not known (because privatisation was done externally), use anchor-point estimation:
from insurance_fairness_ldp import NoiseRateEstimator
# Anchor: observations where you know the true category with near-certainty
anchor_selector = lambda X: X[:, 0] > 65 # e.g. policyholders known to be in group 0
estimator = NoiseRateEstimator(
categories=["White", "Asian", "Black", "Other"],
anchor_category="White",
anchor_selector=anchor_selector,
)
estimator.fit(S_private, X=X)
print(estimator.summary())
# Convert to mechanism and use in pricing
krr_estimated = estimator.to_mechanism()
Choosing epsilon
| epsilon | pi (k=2) | C1 (k=2) | Privacy | Accuracy |
|---|---|---|---|---|
| 0.5 | 0.622 | 2.45 | Very strong | Poor |
| 1.0 | 0.731 | 1.73 | Strong | Acceptable |
| 2.0 | 0.880 | 1.27 | Moderate | Good |
| 5.0 | 0.993 | 1.01 | Minimal | Excellent |
For UK insurance research, epsilon=1 to 2 gives meaningful privacy with acceptable accuracy loss. The accuracy constant C1 tells you how much the LDP correction inflates the generalisation error bound relative to direct observation: C1=2 means the bound is 2x worse.
API reference
KaryRandomisedResponse(epsilon, categories)
k-ary Randomised Response mechanism. Perturbs a sensitive categorical attribute to satisfy epsilon-LDP.
.privatise(s, random_state)— apply k-RR to an array of true values.correction_matrix()— return the k x k transition matrix T.pi— truth probability P(S=d | D=d).k,.epsilon,.categories
CorrectionMatrix(pi, k)
Computes the LDP correction matrices.
.T_inv()— inverse of T; used to debias frequency distributions.Pi_inv(group_probs)— group-reweighted correction; used in loss weighting.debias_probs(noisy_probs, clip=True)— apply T^{-1} to a frequency vector.accuracy_constant()— C1 valueCorrectionMatrix.from_mechanism(krr)— factory from a KaryRandomisedResponse
LDPDiscriminationFreePrice(base_estimator, mechanism, reference_dist)
Main pricing class. sklearn-compatible.
.fit(X, S_private, y, exposure=None)— train with LDP-corrected sample weights.predict(X)— return h*(X) discrimination-free premiums.predict_group(X, category)— return f_k(X) for a single group.group_models_— dict of fitted group models.reference_dist_— P*(D) used in the Lindholm formula
NoiseRateEstimator(categories, anchor_category, anchor_selector)
Anchor-point estimation of pi (unknown epsilon case).
.fit(S_private, X, bootstrap, n_bootstrap, random_state).pi_,.epsilon_,.std_error_,.n_anchor_.to_mechanism(categories)— convert to KaryRandomisedResponse.summary()— text summary
LDPFairnessReport
Structured report with summary() and to_markdown() methods.
LDPFairnessReport.from_model(model, X, S_private, y, h_naive, notes)
Functions
privatise(s, epsilon, categories, random_state)— convenience wrapperdiscrimination_free_indicator(h_star, h_naive, norm)— pricing distance metricgroup_loss_corrected(y_true, y_pred, S_private, categories, Pi_inv)— LDP-corrected group losscalibration_by_group_ldp(y_true, y_pred, S_private, categories)— calibration checkc1_adjusted_error_bound(base_bound, c1, k, p_s_k_star)— bound inflationdebiased_group_means(y, S_private, categories, T_inv)— unbiased conditional means
UK regulatory context
- GDPR Article 9 / DPA 2018 Schedule 1: If the insurer receives only privatised S, there is a defensible argument they have not "processed" special category data in the Article 9 sense. The TTP processes it; the insurer receives noise.
- FCA EP25/2 (2025): The FCA found a £28/year residual ethnicity gap in motor after risk adjustment. This library provides a technical route to demonstrate non-discrimination even when ethnicity data is unavailable.
- Equality Act 2010, Section 19: The Lindholm reference distribution P*(D) being independent of X removes the indirect discrimination mechanism.
- Test-Achats (2012): UK insurers have been prohibited from using gender in pricing for 13 years. LDP extends this architecture to ethnicity and disability.
- Data (Use and Access) Act 2025: Reduces the sensitivity of the protected-attribute decision pathway, supporting ADM compliance.
How this fits the Burning Cost stack
| Library | Requires true D? | Purpose |
|---|---|---|
| insurance-fairness-diag | Yes | Diagnose proxy leakage |
| insurance-fairness | Yes | Audit model discrimination |
| insurance-fairness-ot | Yes | Wasserstein discrimination-free prices |
| insurance-fairness-ldp | No | Discrimination-free prices without ever seeing D |
The natural workflow: run insurance-fairness-diag to detect proxy leakage, then use insurance-fairness-ldp to correct for it without requiring access to the restricted attribute.
Installation
pip install insurance-fairness-ldp
Optional CatBoost support:
pip install insurance-fairness-ldp[catboost]
References
Zhang, Liu, Shi (2025). Discrimination-Free Insurance Pricing under Local Differential Privacy. arXiv:2504.11775.
Lindholm, Richman, Tsanakas, Wüthrich (2022). Discrimination-Free Insurance Pricing. ASTIN Bulletin 52(1), 55-89.
Makhlouf et al. (2024). A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness. arXiv:2405.14725. CSF 2024.
Warner (1965). Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. JASA 60(309), 63-69.
Licence
MIT. Copyright Burning Cost, 2026.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_fairness_ldp-0.1.1.tar.gz.
File metadata
- Download URL: insurance_fairness_ldp-0.1.1.tar.gz
- Upload date:
- Size: 156.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61a41ccec353cf00ca7e5085de0226c02e4b96ccad1bd6d5f6f7f2b9b1ff7590
|
|
| MD5 |
f48bba35bf7527f91e6aebd24ebd0be8
|
|
| BLAKE2b-256 |
aaa4e62be772409c639e4e37b5464abf99276dca636bf087e901fa0c86dfe7f0
|
File details
Details for the file insurance_fairness_ldp-0.1.1-py3-none-any.whl.
File metadata
- Download URL: insurance_fairness_ldp-0.1.1-py3-none-any.whl
- Upload date:
- Size: 34.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b5a7abdad0e2ea1362024e0e0abbd5fced19f6f7237a583909f71f22f19f5f4
|
|
| MD5 |
07fa723b7d059cba18bfe16ccde24068
|
|
| BLAKE2b-256 |
83cacb0c75f0cfa239e6c49f158b929078e4e982c44c516b999d9cbed64dd265
|