Local Differential Privacy for discrimination-free insurance pricing. Implements the Zhang/Liu/Shi (2025) correction matrix framework — the insurer never sees the true sensitive attribute.

These details have not been verified by PyPI

Project links

Project description

insurance-fairness-ldp

Discrimination-free insurance pricing using Local Differential Privacy (LDP). The insurer never sees the true sensitive attribute.

The problem

UK insurers face a genuine bind on ethnicity pricing. GDPR Article 9 makes it legally uncomfortable to collect ethnicity data. FCA's 2025 ethnicity penalty analysis (EP25/2) found a residual £28/year gap in motor premiums that isn't explained by claims risk. The FCA Consumer Duty requires demonstrable fair value. And the Equality Act 2010 Section 19 exposes insurers to indirect discrimination risk via postcode rating.

The standard fairness toolkit (audit models, run counterfactuals, apply Lindholm corrections) requires the insurer to hold the sensitive attribute at some point. That creates the GDPR Article 9 exposure in the first place.

LDP flips the architecture. Policyholders submit a privatised version of their sensitive attribute — one that satisfies epsilon-LDP before it leaves their hands. The insurer never sees the true value. The mathematical correction happens on the privatised data, and the result is a discrimination-free premium that is actuarially valid.

What this library implements

The Zhang/Liu/Shi (arXiv:2504.11775, 2025) correction matrix framework for the Lindholm (2022) discrimination-free pricing formula, operating exclusively on privatised sensitive attributes.

The core formula is Lindholm's::

h*(X) = sum_k f_k(X) * P*(D=k)

where each group model f_k(X) is trained using LDP-corrected sample weights derived from the Pi^{-1} correction matrix, and P*(D) is a reference distribution estimated from the debiased noisy frequencies via T^{-1}.

No existing Python package implements this. OpenDP does k-RR but not the group-specific pricing correction. Fairlearn and AIF360 require the true sensitive attribute. InsurFair (R) implements Lindholm but not under LDP.

Architecture warning

The formal LDP privacy guarantee requires a Trusted Third Party (TTP) architecture: policyholders submit their privatised responses directly to the TTP, not to the insurer. When a single organisation runs this code, the formal privacy guarantee does not apply in the same sense. This library provides the correct mathematical framework for the multi-party case and is suitable for research, simulation, and compliance demonstration. Deploying as a live privacy guarantee requires proper TTP infrastructure.

Quick start

import numpy as np
from sklearn.linear_model import Ridge
from insurance_fairness_ldp import (
    KaryRandomisedResponse,
    LDPDiscriminationFreePrice,
    LDPFairnessReport,
)

# Step 1: Define the LDP mechanism (epsilon controls privacy/accuracy trade-off)
krr = KaryRandomisedResponse(
    epsilon=1.0,
    categories=["White", "Asian", "Black", "Other"],
)

# Step 2: In a real deployment, policyholders apply k-RR themselves.
# In simulation or research, we apply it:
S_private = krr.privatise(true_ethnicity_array, random_state=42)

# Step 3: Fit discrimination-free pricing model
model = LDPDiscriminationFreePrice(
    base_estimator=Ridge(),
    mechanism=krr,
    reference_dist="marginal",  # or supply P*(D) directly
)
model.fit(X_train, S_private_train, y_train)

# Step 4: Generate discrimination-free premiums
premiums = model.predict(X_test)

# Step 5: Generate regulatory report
report = LDPFairnessReport.from_model(
    model, X_test, S_private_test, y=y_test
)
report.to_markdown("ldp_fairness_report.md")
print(report.summary())

Unknown epsilon

When epsilon is not known (because privatisation was done externally), use anchor-point estimation:

from insurance_fairness_ldp import NoiseRateEstimator

# Anchor: observations where you know the true category with near-certainty
anchor_selector = lambda X: X[:, 0] > 65  # e.g. policyholders known to be in group 0

estimator = NoiseRateEstimator(
    categories=["White", "Asian", "Black", "Other"],
    anchor_category="White",
    anchor_selector=anchor_selector,
)
estimator.fit(S_private, X=X)
print(estimator.summary())

# Convert to mechanism and use in pricing
krr_estimated = estimator.to_mechanism()

Choosing epsilon

epsilon	pi (k=2)	C1 (k=2)	Privacy	Accuracy
0.5	0.622	2.45	Very strong	Poor
1.0	0.731	1.73	Strong	Acceptable
2.0	0.880	1.27	Moderate	Good
5.0	0.993	1.01	Minimal	Excellent

For UK insurance research, epsilon=1 to 2 gives meaningful privacy with acceptable accuracy loss. The accuracy constant C1 tells you how much the LDP correction inflates the generalisation error bound relative to direct observation: C1=2 means the bound is 2x worse.

API reference

`KaryRandomisedResponse(epsilon, categories)`

k-ary Randomised Response mechanism. Perturbs a sensitive categorical attribute to satisfy epsilon-LDP.

.privatise(s, random_state) — apply k-RR to an array of true values
.correction_matrix() — return the k x k transition matrix T
.pi — truth probability P(S=d | D=d)
.k, .epsilon, .categories

`CorrectionMatrix(pi, k)`

Computes the LDP correction matrices.

.T_inv() — inverse of T; used to debias frequency distributions
.Pi_inv(group_probs) — group-reweighted correction; used in loss weighting
.debias_probs(noisy_probs, clip=True) — apply T^{-1} to a frequency vector
.accuracy_constant() — C1 value
CorrectionMatrix.from_mechanism(krr) — factory from a KaryRandomisedResponse

`LDPDiscriminationFreePrice(base_estimator, mechanism, reference_dist)`

Main pricing class. sklearn-compatible.

.fit(X, S_private, y, exposure=None) — train with LDP-corrected sample weights
.predict(X) — return h*(X) discrimination-free premiums
.predict_group(X, category) — return f_k(X) for a single group
.group_models_ — dict of fitted group models
.reference_dist_ — P*(D) used in the Lindholm formula

`NoiseRateEstimator(categories, anchor_category, anchor_selector)`

Anchor-point estimation of pi (unknown epsilon case).

.fit(S_private, X, bootstrap, n_bootstrap, random_state)
.pi_, .epsilon_, .std_error_, .n_anchor_
.to_mechanism(categories) — convert to KaryRandomisedResponse
.summary() — text summary

`LDPFairnessReport`

Structured report with summary() and to_markdown() methods.

LDPFairnessReport.from_model(model, X, S_private, y, h_naive, notes)

Functions

privatise(s, epsilon, categories, random_state) — convenience wrapper
discrimination_free_indicator(h_star, h_naive, norm) — pricing distance metric
group_loss_corrected(y_true, y_pred, S_private, categories, Pi_inv) — LDP-corrected group loss
calibration_by_group_ldp(y_true, y_pred, S_private, categories) — calibration check
c1_adjusted_error_bound(base_bound, c1, k, p_s_k_star) — bound inflation
debiased_group_means(y, S_private, categories, T_inv) — unbiased conditional means

UK regulatory context

GDPR Article 9 / DPA 2018 Schedule 1: If the insurer receives only privatised S, there is a defensible argument they have not "processed" special category data in the Article 9 sense. The TTP processes it; the insurer receives noise.
FCA EP25/2 (2025): The FCA found a £28/year residual ethnicity gap in motor after risk adjustment. This library provides a technical route to demonstrate non-discrimination even when ethnicity data is unavailable.
Equality Act 2010, Section 19: The Lindholm reference distribution P*(D) being independent of X removes the indirect discrimination mechanism.
Test-Achats (2012): UK insurers have been prohibited from using gender in pricing for 13 years. LDP extends this architecture to ethnicity and disability.
Data (Use and Access) Act 2025: Reduces the sensitivity of the protected-attribute decision pathway, supporting ADM compliance.

How this fits the Burning Cost stack

Library	Requires true D?	Purpose
insurance-fairness-diag	Yes	Diagnose proxy leakage
insurance-fairness	Yes	Audit model discrimination
insurance-fairness-ot	Yes	Wasserstein discrimination-free prices
insurance-fairness-ldp	No	Discrimination-free prices without ever seeing D

The natural workflow: run insurance-fairness-diag to detect proxy leakage, then use insurance-fairness-ldp to correct for it without requiring access to the restricted attribute.

Installation

pip install insurance-fairness-ldp

Optional CatBoost support:

pip install insurance-fairness-ldp[catboost]

References

Zhang, Liu, Shi (2025). Discrimination-Free Insurance Pricing under Local Differential Privacy. arXiv:2504.11775.

Lindholm, Richman, Tsanakas, Wüthrich (2022). Discrimination-Free Insurance Pricing. ASTIN Bulletin 52(1), 55-89.

Makhlouf et al. (2024). A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness. arXiv:2405.14725. CSF 2024.

Warner (1965). Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. JASA 60(309), 63-69.

Licence

MIT. Copyright Burning Cost, 2026.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Mar 15, 2026

0.1.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_fairness_ldp-0.1.1.tar.gz (156.1 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

insurance_fairness_ldp-0.1.1-py3-none-any.whl (34.7 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file insurance_fairness_ldp-0.1.1.tar.gz.

File metadata

Download URL: insurance_fairness_ldp-0.1.1.tar.gz
Upload date: Mar 15, 2026
Size: 156.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_fairness_ldp-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`61a41ccec353cf00ca7e5085de0226c02e4b96ccad1bd6d5f6f7f2b9b1ff7590`
MD5	`f48bba35bf7527f91e6aebd24ebd0be8`
BLAKE2b-256	`aaa4e62be772409c639e4e37b5464abf99276dca636bf087e901fa0c86dfe7f0`

See more details on using hashes here.

File details

Details for the file insurance_fairness_ldp-0.1.1-py3-none-any.whl.

File metadata

Download URL: insurance_fairness_ldp-0.1.1-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 34.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_fairness_ldp-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b5a7abdad0e2ea1362024e0e0abbd5fced19f6f7237a583909f71f22f19f5f4`
MD5	`07fa723b7d059cba18bfe16ccde24068`
BLAKE2b-256	`83cacb0c75f0cfa239e6c49f158b929078e4e982c44c516b999d9cbed64dd265`

See more details on using hashes here.

insurance-fairness-ldp 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

insurance-fairness-ldp

The problem

What this library implements

Architecture warning

Quick start

Unknown epsilon

Choosing epsilon

API reference

KaryRandomisedResponse(epsilon, categories)

CorrectionMatrix(pi, k)

LDPDiscriminationFreePrice(base_estimator, mechanism, reference_dist)

NoiseRateEstimator(categories, anchor_category, anchor_selector)

LDPFairnessReport

Functions

UK regulatory context

How this fits the Burning Cost stack

Installation

References

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`KaryRandomisedResponse(epsilon, categories)`

`CorrectionMatrix(pi, k)`

`LDPDiscriminationFreePrice(base_estimator, mechanism, reference_dist)`

`NoiseRateEstimator(categories, anchor_category, anchor_selector)`

`LDPFairnessReport`