Skip to main content

Local Differential Privacy for discrimination-free insurance pricing. Implements the Zhang/Liu/Shi (2025) correction matrix framework — the insurer never sees the true sensitive attribute.

Project description

insurance-fairness-ldp

Discrimination-free insurance pricing using Local Differential Privacy (LDP). The insurer never sees the true sensitive attribute.

The problem

UK insurers face a genuine bind on ethnicity pricing. GDPR Article 9 makes it legally uncomfortable to collect ethnicity data. FCA's 2025 ethnicity penalty analysis (EP25/2) found a residual £28/year gap in motor premiums that isn't explained by claims risk. The FCA Consumer Duty requires demonstrable fair value. And the Equality Act 2010 Section 19 exposes insurers to indirect discrimination risk via postcode rating.

The standard fairness toolkit (audit models, run counterfactuals, apply Lindholm corrections) requires the insurer to hold the sensitive attribute at some point. That creates the GDPR Article 9 exposure in the first place.

LDP flips the architecture. Policyholders submit a privatised version of their sensitive attribute — one that satisfies epsilon-LDP before it leaves their hands. The insurer never sees the true value. The mathematical correction happens on the privatised data, and the result is a discrimination-free premium that is actuarially valid.

What this library implements

The Zhang/Liu/Shi (arXiv:2504.11775, 2025) correction matrix framework for the Lindholm (2022) discrimination-free pricing formula, operating exclusively on privatised sensitive attributes.

The core formula is Lindholm's::

h*(X) = sum_k f_k(X) * P*(D=k)

where each group model f_k(X) is trained using LDP-corrected sample weights derived from the Pi^{-1} correction matrix, and P*(D) is a reference distribution estimated from the debiased noisy frequencies via T^{-1}.

No existing Python package implements this. OpenDP does k-RR but not the group-specific pricing correction. Fairlearn and AIF360 require the true sensitive attribute. InsurFair (R) implements Lindholm but not under LDP.

Architecture warning

The formal LDP privacy guarantee requires a Trusted Third Party (TTP) architecture: policyholders submit their privatised responses directly to the TTP, not to the insurer. When a single organisation runs this code, the formal privacy guarantee does not apply in the same sense. This library provides the correct mathematical framework for the multi-party case and is suitable for research, simulation, and compliance demonstration. Deploying as a live privacy guarantee requires proper TTP infrastructure.

Quick start

import numpy as np
from sklearn.linear_model import Ridge
from insurance_fairness_ldp import (
    KaryRandomisedResponse,
    LDPDiscriminationFreePrice,
    LDPFairnessReport,
)

# Step 1: Define the LDP mechanism (epsilon controls privacy/accuracy trade-off)
krr = KaryRandomisedResponse(
    epsilon=1.0,
    categories=["White", "Asian", "Black", "Other"],
)

# Step 2: In a real deployment, policyholders apply k-RR themselves.
# In simulation or research, we apply it:
S_private = krr.privatise(true_ethnicity_array, random_state=42)

# Step 3: Fit discrimination-free pricing model
model = LDPDiscriminationFreePrice(
    base_estimator=Ridge(),
    mechanism=krr,
    reference_dist="marginal",  # or supply P*(D) directly
)
model.fit(X_train, S_private_train, y_train)

# Step 4: Generate discrimination-free premiums
premiums = model.predict(X_test)

# Step 5: Generate regulatory report
report = LDPFairnessReport.from_model(
    model, X_test, S_private_test, y=y_test
)
report.to_markdown("ldp_fairness_report.md")
print(report.summary())

Unknown epsilon

When epsilon is not known (because privatisation was done externally), use anchor-point estimation:

from insurance_fairness_ldp import NoiseRateEstimator

# Anchor: observations where you know the true category with near-certainty
anchor_selector = lambda X: X[:, 0] > 65  # e.g. policyholders known to be in group 0

estimator = NoiseRateEstimator(
    categories=["White", "Asian", "Black", "Other"],
    anchor_category="White",
    anchor_selector=anchor_selector,
)
estimator.fit(S_private, X=X)
print(estimator.summary())

# Convert to mechanism and use in pricing
krr_estimated = estimator.to_mechanism()

Choosing epsilon

epsilon pi (k=2) C1 (k=2) Privacy Accuracy
0.5 0.622 2.45 Very strong Poor
1.0 0.731 1.73 Strong Acceptable
2.0 0.880 1.27 Moderate Good
5.0 0.993 1.01 Minimal Excellent

For UK insurance research, epsilon=1 to 2 gives meaningful privacy with acceptable accuracy loss. The accuracy constant C1 tells you how much the LDP correction inflates the generalisation error bound relative to direct observation: C1=2 means the bound is 2x worse.

API reference

KaryRandomisedResponse(epsilon, categories)

k-ary Randomised Response mechanism. Perturbs a sensitive categorical attribute to satisfy epsilon-LDP.

  • .privatise(s, random_state) — apply k-RR to an array of true values
  • .correction_matrix() — return the k x k transition matrix T
  • .pi — truth probability P(S=d | D=d)
  • .k, .epsilon, .categories

CorrectionMatrix(pi, k)

Computes the LDP correction matrices.

  • .T_inv() — inverse of T; used to debias frequency distributions
  • .Pi_inv(group_probs) — group-reweighted correction; used in loss weighting
  • .debias_probs(noisy_probs, clip=True) — apply T^{-1} to a frequency vector
  • .accuracy_constant() — C1 value
  • CorrectionMatrix.from_mechanism(krr) — factory from a KaryRandomisedResponse

LDPDiscriminationFreePrice(base_estimator, mechanism, reference_dist)

Main pricing class. sklearn-compatible.

  • .fit(X, S_private, y, exposure=None) — train with LDP-corrected sample weights
  • .predict(X) — return h*(X) discrimination-free premiums
  • .predict_group(X, category) — return f_k(X) for a single group
  • .group_models_ — dict of fitted group models
  • .reference_dist_ — P*(D) used in the Lindholm formula

NoiseRateEstimator(categories, anchor_category, anchor_selector)

Anchor-point estimation of pi (unknown epsilon case).

  • .fit(S_private, X, bootstrap, n_bootstrap, random_state)
  • .pi_, .epsilon_, .std_error_, .n_anchor_
  • .to_mechanism(categories) — convert to KaryRandomisedResponse
  • .summary() — text summary

LDPFairnessReport

Structured report with summary() and to_markdown() methods.

  • LDPFairnessReport.from_model(model, X, S_private, y, h_naive, notes)

Functions

  • privatise(s, epsilon, categories, random_state) — convenience wrapper
  • discrimination_free_indicator(h_star, h_naive, norm) — pricing distance metric
  • group_loss_corrected(y_true, y_pred, S_private, categories, Pi_inv) — LDP-corrected group loss
  • calibration_by_group_ldp(y_true, y_pred, S_private, categories) — calibration check
  • c1_adjusted_error_bound(base_bound, c1, k, p_s_k_star) — bound inflation
  • debiased_group_means(y, S_private, categories, T_inv) — unbiased conditional means

UK regulatory context

  • GDPR Article 9 / DPA 2018 Schedule 1: If the insurer receives only privatised S, there is a defensible argument they have not "processed" special category data in the Article 9 sense. The TTP processes it; the insurer receives noise.
  • FCA EP25/2 (2025): The FCA found a £28/year residual ethnicity gap in motor after risk adjustment. This library provides a technical route to demonstrate non-discrimination even when ethnicity data is unavailable.
  • Equality Act 2010, Section 19: The Lindholm reference distribution P*(D) being independent of X removes the indirect discrimination mechanism.
  • Test-Achats (2012): UK insurers have been prohibited from using gender in pricing for 13 years. LDP extends this architecture to ethnicity and disability.
  • Data (Use and Access) Act 2025: Reduces the sensitivity of the protected-attribute decision pathway, supporting ADM compliance.

How this fits the Burning Cost stack

Library Requires true D? Purpose
insurance-fairness-diag Yes Diagnose proxy leakage
insurance-fairness Yes Audit model discrimination
insurance-fairness-ot Yes Wasserstein discrimination-free prices
insurance-fairness-ldp No Discrimination-free prices without ever seeing D

The natural workflow: run insurance-fairness-diag to detect proxy leakage, then use insurance-fairness-ldp to correct for it without requiring access to the restricted attribute.

Installation

pip install insurance-fairness-ldp

Optional CatBoost support:

pip install insurance-fairness-ldp[catboost]

References

Zhang, Liu, Shi (2025). Discrimination-Free Insurance Pricing under Local Differential Privacy. arXiv:2504.11775.

Lindholm, Richman, Tsanakas, Wüthrich (2022). Discrimination-Free Insurance Pricing. ASTIN Bulletin 52(1), 55-89.

Makhlouf et al. (2024). A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness. arXiv:2405.14725. CSF 2024.

Warner (1965). Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. JASA 60(309), 63-69.

Licence

MIT. Copyright Burning Cost, 2026.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insurance_fairness_ldp-0.1.1.tar.gz (156.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insurance_fairness_ldp-0.1.1-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file insurance_fairness_ldp-0.1.1.tar.gz.

File metadata

  • Download URL: insurance_fairness_ldp-0.1.1.tar.gz
  • Upload date:
  • Size: 156.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_fairness_ldp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 61a41ccec353cf00ca7e5085de0226c02e4b96ccad1bd6d5f6f7f2b9b1ff7590
MD5 f48bba35bf7527f91e6aebd24ebd0be8
BLAKE2b-256 aaa4e62be772409c639e4e37b5464abf99276dca636bf087e901fa0c86dfe7f0

See more details on using hashes here.

File details

Details for the file insurance_fairness_ldp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: insurance_fairness_ldp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for insurance_fairness_ldp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7b5a7abdad0e2ea1362024e0e0abbd5fced19f6f7237a583909f71f22f19f5f4
MD5 07fa723b7d059cba18bfe16ccde24068
BLAKE2b-256 83cacb0c75f0cfa239e6c49f158b929078e4e982c44c516b999d9cbed64dd265

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page