Skip to main content

Achieve error-rate parity between protected groups for any predictor

Project description

error-parity

Tests status PyPI status Documentation status PyPI version OSI license Python compatibility

Work presented as an oral at ICLR 2024, titled "Unprocessing Seven Years of Algorithmic Fairness".

Fast postprocessing of any score-based predictor to meet fairness criteria.

The error-parity package can achieve strict or relaxed fairness constraint fulfillment, which can be useful to compare ML models at equal fairness levels.

Package documentation available here.

Contents:

Installing

Install package from PyPI:

pip install error-parity

Or, for development, you can clone the repo and install from local sources:

git clone https://github.com/socialfoundations/error-parity.git
pip install ./error-parity

Getting started

See detailed example notebooks under the examples folder and on the package documentation.

from error_parity import RelaxedThresholdOptimizer

# Given any trained model that outputs real-valued scores
fair_clf = RelaxedThresholdOptimizer(
    predictor=lambda X: model.predict_proba(X)[:, -1],   # for sklearn API
    # predictor=model,            # use this for a callable model
    constraint="equalized_odds",  # other constraints are available
    tolerance=0.05,               # fairness constraint tolerance
)

# Fit the fairness adjustment on some data
# This will find the optimal _fair classifier_
fair_clf.fit(X=X, y=y, group=group)

# Now you can use `fair_clf` as any other classifier
# You have to provide group information to compute fair predictions
y_pred_test = fair_clf(X=X_test, group=group_test)

How it works

Given a callable score-based predictor (i.e., y_pred = predictor(X)), and some (X, Y, S) data to fit, RelaxedThresholdOptimizer will:

  1. Compute group-specific ROC curves and their convex hulls;
  2. Compute the $r$-relaxed optimal solution for the chosen fairness criterion (using cvxpy);
  3. Find the set of group-specific binary classifiers that match the optimal solution found.
    • each group-specific classifier is made up of (possibly randomized) group-specific thresholds over the given predictor;
    • if a group's ROC point is in the interior of its ROC curve, partial randomization of its predictions may be necessary.

Fairness constraints

You can choose specific fairness constraints via the constraint key-word argument to the RelaxedThresholdOptimizer constructor. The equation under each constraint details how it is evaluated, where $r$ is the relaxation (or tolerance) and $\mathcal{S}$ is the set of sensitive groups.

Currently implemented fairness constraints:

  • equalized odds (Hardt et al., 2016) [default];
    • i.e., equal group-specific TPR and FPR;
    • use constraint="equalized_odds";
    • $\max_{a, b \in \mathcal{S}} \max_{y \in {0, 1}} \left( \mathbb{P}[\hat{Y}=1 | S=a, Y=y] - \mathbb{P}[\hat{Y}=1 | S=b, Y=y] \right) \leq r$
    • other relaxations available by changing the l_p_norm parameter;
  • equal opportunity;
    • i.e., equal group-specific TPR;
    • use constraint="true_positive_rate_parity";
    • $\max_{a, b \in \mathcal{S}} \left( \mathbb{P}[\hat{Y}=1 | S=a, Y=1] - \mathbb{P}[\hat{Y}=1 | S=b, Y=1] \right) \leq r$
  • predictive equality;
    • i.e., equal group-specific FPR;
    • use constraint="false_positive_rate_parity";
    • $\max_{a, b \in \mathcal{S}} \left( \mathbb{P}[\hat{Y}=1 | S=a, Y=0] - \mathbb{P}[\hat{Y}=1 | S=b, Y=0] \right) \leq r$
  • demographic parity;
    • i.e., equal group-specific predicted prevalence;
    • use constraint="demographic_parity";
    • $\max_{a, b \in \mathcal{S}} \left( \mathbb{P}[\hat{Y}=1 | S=a] - \mathbb{P}[\hat{Y}=1 | S=b] \right) \leq r$

We welcome community contributions for cvxpy implementations of other fairness constraints.

Equalized odds relaxations

When using constraint="equalized_odds", different relaxations can be chosen by altering the l_p_norm parameter (which dictates how to compute the distance between group-specific ROC points).

A few useful values:

  • l_p_norm=np.inf [default] evaluates equalized-odds as the maximum between group-wise TPR and FPR differences (as shown above);
  • l_p_norm=1 evaluates equalized-odds as the sum of absolute difference in group-wise TPR and FPR;
    • corresponds to twice the "average absolute odds" metric;
    • accordingly, use twice the tolerance target to constrain the average_abs_odds_difference;

The actual equalized odds constraint implemented is:

$\max_{a, b \in \mathcal{S}} \left\lVert ROC_a - ROC_b \right\rVert_p \leq r,$ where $ROC_a$ is the ROC point of group $S=a$ and $ROC_b$ is the ROC point of group $S=b$.

Citing

@inproceedings{
  cruz2024unprocessing,
  title={Unprocessing Seven Years of Algorithmic Fairness},
  author={Andr{\'e} Cruz and Moritz Hardt},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=jr03SfWsBS}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

error_parity-0.3.11.tar.gz (40.9 kB view details)

Uploaded Source

Built Distribution

error_parity-0.3.11-py3-none-any.whl (44.2 kB view details)

Uploaded Python 3

File details

Details for the file error_parity-0.3.11.tar.gz.

File metadata

  • Download URL: error_parity-0.3.11.tar.gz
  • Upload date:
  • Size: 40.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for error_parity-0.3.11.tar.gz
Algorithm Hash digest
SHA256 e105d4196c8ef4c028fea0d77cdccf8bd066d43c69ef11808a2164ff7c525704
MD5 bc435ba28069ae1df51197ea48fb4e1a
BLAKE2b-256 d909cc40f372a4f53872241e7db0216d6bac6daf715c7baa2a2e86b03b6b1bae

See more details on using hashes here.

File details

Details for the file error_parity-0.3.11-py3-none-any.whl.

File metadata

File hashes

Hashes for error_parity-0.3.11-py3-none-any.whl
Algorithm Hash digest
SHA256 efe0d9366bd8afa30584e9c30f78aef21be79b1669080f68ebcce0fdf8851cf6
MD5 8aaa1996bdc530de994f3f5eaa23ca78
BLAKE2b-256 bee565b9cec0ca880ef9565e07d0f423b65551838891b0b86b11072b94a36e3c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page