FairGBM Python Package
Project description
FairGBM
FairGBM is an easy-to-use and lightweight fairness-aware ML algorithm with state-of-the-art performance on tabular datasets.
FairGBM builds upon the popular LightGBM algorithm and adds customizable constraints for group-wise fairness (e.g., equal opportunity, predictive equality) and other global goals (e.g., specific Recall or FPR prediction targets).
Please consult the paper for further details.
Install
Currently, compatibility is only maintained with Linux OS.
FairGBM can be installed from PyPI
pip install fairgbm
or from GitHub
git clone --recurse-submodules https://github.com/feedzai/fairgbm.git
pip install fairgbm/python-package/
Note Install requires CMake and an up-to-date C++ compiler (gcc, clang, or mingw). You may need to install wheel via
pip install wheel
first. For Linux users, glibc >= 2.14 is required. For more details see LightGBM's installation guide, or follow this link for the Python package installation instructions.
Getting started
You can get FairGBM up and running in just a few lines of Python code:
from fairgbm import FairGBMClassifier
# Instantiate
fairgbm_clf = FairGBMClassifier(
constraint_type="FNR", # constraint on equal group-wise TPR (equal opportunity)
n_estimators=200, # core parameters from vanilla LightGBM
random_state=42, # ...
)
# Train using features (X), labels (Y), and sensitive attributes (S)
fairgbm_clf.fit(X, Y, constraint_group=S)
# NOTE: labels (Y) and sensitive attributes (S) must be in numeric format
# Predict
Y_test_pred = fairgbm_clf.predict_proba(X_test)[:, -1] # Compute continuous class probabilities (recommended)
# Y_test_pred = fairgbm_clf.predict(X_test) # Or compute discrete class predictions
A more in-depth explanation and other usage examples can be found in the examples folder.
For Python examples see the notebooks folder.
Parameter list
The following parameters can be used as key-word arguments for the FairGBMClassifier
Python class.
Name | Description | Default |
---|---|---|
constraint_type |
The type of fairness (group-wise equality) constraint to use (if any). | FPR,FNR |
global_constraint_type |
The type of global equality constraint to use (if any). | None |
multiplier_learning_rate |
The learning rate for the gradient ascent step (w.r.t. Lagrange multipliers). | 0.1 |
constraint_fpr_tolerance |
The slack when fulfilling group-wise FPR constraints. | 0.01 |
constraint_fnr_tolerance |
The slack when fulfilling group-wise FNR constraints. | 0.01 |
global_target_fpr |
Target rate for the global FPR (inequality) constraint. | None |
global_target_fnr |
Target rate for the global FNR (inequality) constraint. | None |
constraint_stepwise_proxy |
Differentiable proxy for the step-wise function in group-wise constraints. | cross_entropy |
objective_stepwise_proxy |
Differentiable proxy for the step-wise function in global constraints. | cross_entropy |
stepwise_proxy_margin |
Intercept value for the proxy function: value at f(logodds=0.0) |
1.0 |
score_threshold |
Score threshold used when assessing group-wise FPR or FNR in training. | 0.5 |
global_score_threshold |
Score threshold used when assessing global FPR or FNR in training. | 0.5 |
init_multipliers |
The initial value of the Lagrange multipliers. | 0 for each constraint |
... | Any core LGBMClassifier parameter can be used with FairGBM as well. |
Please consult this list for a detailed
view of all vanilla LightGBM parameters (e.g., n_estimators
, n_jobs
, ...).
Note The
objective
is the only core LightGBM parameter that cannot be changed when using FairGBM, as you must use the constrained loss functionobjective="constrained_cross_entropy"
. Using a standard non-constrained objective will fallback to using standard LightGBM.
fit(X, Y, constraint_group=S)
In addition to the usual fit
arguments, features X
and labels Y
, FairGBM takes in the sensitive attributes S
column for training.
Regarding the sensitive attributes column S
:
- It should be in numeric format, and have each different protected group take a different integer value, starting at
0
. - It is not restricted to binary sensitive attributes: you can use two or more different groups encoded in the same column;
- It is only required for training and not for computing predictions;
Here is an example pre-processing for the sensitive attributes on the UCI Adult dataset:
# Given X, Y, S
X, Y, S = load_dataset()
# The sensitive attributes S must be in numeric format
S = [1. if val == "Female" else 0. for val in S]
# The labels Y must be binary and in numeric format: {0, 1}
Y = [1. if val == ">50K" else 0. for val in Y]
# And the features X may be numeric or categorical, but make sure categorical columns are in the correct format
X: Union[pd.DataFrame, np.ndarray] # any array-like can be used
# Train FairGBM
fairgbm_clf.fit(X, Y, constraint_group=S)
Features
FairGBM enables you to train a GBM model to minimize a loss function (e.g., cross-entropy) subject to fairness constraints (e.g., equal opportunity).
Namely, you can target equality of performance metrics (FPR, FNR, or both) across instances from two or more different protected groups (see fairness constraints section). Simultaneously (and optionally), you can add global constraints on specific metrics (see global constraints section).
Fairness constraints
You can use FairGBM to equalize the following metrics across two or more protected groups:
- Equalize FNR (equivalent to equalizing TPR or Recall)
- also known as equal opportunity (Hardt et al., 2016)
- Equalize FPR (equivalent to equalizing TNR or Specificity)
- also known as predictive equality (Corbett-Davies et al., 2017)
- Equalize both FNR and FPR simultaneously
- also known as equal odds (Hardt et al., 2016)
Example for equality of opportunity in college admissions: your likelihood of getting admitted to a certain college (predicted positive) given that you're a qualified candidate (label positive) should be the same regardless of your race (sensitive attribute).
Global constraints
You can also target specific FNR or FPR goals.
For example, in cases where high accuracy is trivially achieved (e.g., problems with high class imbalance),
you may want to maximize TPR with a constraint on FPR (e.g., "maximize TPR with at most 5% FPR").
You can set a constraint on global FPR ≤ 0.05 by using global_target_fpr=0.05
and
global_constraint_type="FPR"
.
You can simultaneously set constraints on group-wise metrics (fairness constraints) and constraints on global metrics.
Technical Details
FairGBM is a framework that enables constrained optimization of Gradient Boosting Machines (GBMs). This way, we can train a GBM model to minimize some loss function (usually the binary cross-entropy) subject to a set of constraints that should be met in the training dataset (e.g., equality of opportunity).
FairGBM applies the method of Lagrange multipliers, and uses iterative and interleaved steps of gradient descent (on the function space, by adding new trees to the GBM model) and gradient ascent (on the space of Lagrange multipliers, Λ).
The main obstacle with enforcing fairness constraints in training is that these constraints are often non-differentiable. To side-step this issue, we use a differentiable proxy of the step-wise function. The following plot shows an example of hinge-based and cross-entropy-based proxies for the false positive value of a label negative instance.
For a more in-depth explanation of FairGBM please consult the paper.
Contact
For commercial uses of FairGBM please contact oss-licenses@feedzai.com.
Citing FairGBM
The paper is publicly available at this arXiv link.
@misc{cruz2022fairgbm,
doi = {10.48550/ARXIV.2209.07850},
url = {https://arxiv.org/abs/2209.07850},
author = {Cruz, Andr{\'{e}} F and Bel{\'{e}}m, Catarina and Bravo, Jo{\~{a}}o and Saleiro, Pedro and Bizarro, Pedro},
keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Computers and Society (cs.CY), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {FairGBM: Gradient Boosting with Fairness Constraints},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fairgbm-0.9.14.tar.gz
.
File metadata
- Download URL: fairgbm-0.9.14.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9d69aef01740fea5956157f097aba6daeb6bdc35dba4449a41d4f9e3d59dc7a |
|
MD5 | 7599ac608c7b913321d8c5ed4aa08617 |
|
BLAKE2b-256 | c59e9c154b0e98497e98cd37203ff9606765231a78bd1bb105149803600463da |
File details
Details for the file fairgbm-0.9.14-py3-none-any.whl
.
File metadata
- Download URL: fairgbm-0.9.14-py3-none-any.whl
- Upload date:
- Size: 2.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37110689fcaa3a4b28b363a02bfef02eec50f6039754a8eedaed6ba50e6a6a90 |
|
MD5 | 2355c51949f0f759803ea4f08125f8e4 |
|
BLAKE2b-256 | 36a9362c7253f244bb62eca82d244f08b15eadf1be084015649ee130ce2c86ab |