This package enables fairness auditing with statistical guarantees.
Project description
fairaudit
fairaudit
is a Python package for fairness auditing with statisical guarantees.
Given a hold-out set ${(x_i, y_i)}_{i = 1}^n$, a model $f(\cdot)$,
and a collection of groups $\mathcal{G}$, fairaudit
issues
simultaneously valid guarantees on group-wise performance and discovers
groups with disparate performance for an arbitrary metric, $\ell(f(x), y)$.
Formally, let
$$\epsilon(G) := \mathbb{E}_P[\ell(f(X), Y) \mid (X, Y) \in G] - \theta_P.$$
Then, the certification task corresponds to issuing a simultaneously valid confidence set (can be an upper bound, lower bound, or interval) for $\epsilon(G)$, while the flagging task corresponds to discovering $G$ for which $\epsilon(G)$ fails to meet some tolerance threshold. The latter discoveries satisfy an asymptotic FDR guarantee.
Installation
fairaudit can be installed (locally for now) with pip.
To install with pip, navigate to the directory for this repo and
$ pip install .
Examples
The easiest way to start using fairaudit may be to go through the following notebook:
Usage
The Auditor
Class
The Auditor
class has the following API:
class Auditor:
def __init__(
self,
x: np.ndarray,
y: np.ndarray,
z: np.ndarray,
metric: fairaudit.Metric): ...
def calibrate_groups(
self,
alpha : float,
type : str,
groups : Union[np.ndarray, str],
epsilon : float = None,
bootstrap_params : dict = {}
) -> None:
"""
Obtain bootstrap critical values for a specific group collection.
Parameters
----------
alpha : float
Type I error threshold
type : str
Takes one of three values ('lower', 'upper', 'interval').
See epsilon documentation for what these correpsond to.
groups : Union[np.ndarray, str]
Either a string for a supported collection of groups or a numpy array
likely obtained by calling `get_intersections` or `get_rectangles`
from group.py
Array dimensions should be (n_points, n_groups)
epsilon : float = None
epsilon = None calibrates for issuing confidence bounds.
type = "upper" issues lower confidence bounds,
type = "lower" issues upper confidence bounds,
type = "interval" issues confidence intervals.
If a non-null value is passed in, we issue a Boolean certificate.
type = "upper" tests the null that epsilon(G) >= epsilon
type = "lower" tests the null that epsilon(G) <= epsilon
type = "interval" tests the null that |epsilon(G)| >= epsilon
bootstrap_params : dict = {}
Allows the user to specify a random seed, number of bootstrap resamples,
and studentization parameters for the bootstrap process.
"""
def query_group(self, group : Union[np.ndarray, int]) ->
Tuple[List[Union[float, bool]], List[float], List[float]]:
"""
Query calibrated auditor for certificate for a particular group
Parameters
----------
group : Union[np.ndarray, int]
Will accept index into groups originally passed in or Boolean
array if calibrated collection was infinite
Returns
-------
certificate : List[Union[float, bool]]
Boolean certificates or confidence bounds for each metric audited
value : List[float]
Empirical value of epsilon(G) for each metric audited
threshold : List[float]
Estimate of theta for each metric audited
"""
def calibrate_rkhs(
self,
alpha : float,
type : str,
kernel : str,
kernel_params : dict = {},
bootstrap_params : dict = {}
) -> None:
"""
Obtain bootstrap critical value for a specified RKHS.
Parameters
----------
alpha : float
Type I error threshold
type : str
Takes one of three values ('lower', 'upper', 'interval').
type = "upper" issues lower confidence bounds,
type = "lower" issues upper confidence bounds,
type = "interval" issues confidence intervals.
kernel : str
Name of scikit-learn kernel the user would like to use.
Suggested kernels: 'rbf' 'laplacian' 'sigmoid'
kenrnel_params : dict = {}
Additional parameters required to specify the kernel,
e.g. {'gamma': 1} for RBF kernel
bootstrap_params : dict = {}
Allows the user to specify a random seed, number of bootstrap resamples,
and studentization parameters for the bootstrap process.
"""
def query_rkhs(self, weights : np.ndarray) -> Tuple[List[float], List[float]]:
"""
Query calibrated auditor for certificate for a particular RKHS
function.
Parameters
----------
weights : np.ndarray
RHKS function weights, i.e. f(x_i) = (Kw)_i
Returns
-------
certificate : List[float]
Confidence bounds for each metric queried.
value : List[float]
Empirical value of epsilon(G) for each metric queried.
"""
def flag_groups(
self,
groups : np.ndarray,
type : str,
alpha : float,
epsilon : float = 0,
bootstrap_params : dict = {"student" : "mad", "student_threshold" : -np.inf}
) -> Tuple[np.ndarray, np.ndarray]:
"""
Returns flags and estimates of epsilon(G) for each group in some finite
collection.
Parameters
----------
groups : np.ndarray
Boolean numpy array of dimension (n_points, n_groups)
type : str
Takes values ('upper', 'lower', 'interval')
'upper' tests the null, epsilon(G) >= epsilon
'lower' tests the null, epsilon(G) <= epsilon
'interval' tests the null, |epsilon(G)| <= epsilon
alpha : float
FDR level
epsilon : float = 0
See 'type' documentation
bootstrap_params : dict = {"student" : "mad", "student_threshold" : -np.inf}
Allows the user to specify a random seed, number of bootstrap resamples,
and studentization parameters for the bootstrap process.
Returns
-------
flags : List[bool]
One flag is raised for each group - at least one metric must be flagged
for the group to receive a True flag.
values : List[float]
Empirical value of epsilon(G) for each metric queried.
"""
The Metric
class
The Metric object should be instantiated for any performance metrics the auditor may be interested in querying. Its methods are never directly queried by the user, but its constructor should be usable.
class Metric:
def __init__(
self,
name : str,
evaluation_function : Callable[[np.ndarray, np.ndarray], np.ndarray] = None,
threshold_function : Callable[[np.ndarray, np.ndarray], float] = None,
metric_params : dict = {}
) -> None:
"""
Constructs the Metric object used by the Auditor object to recompute performance
over bootstrap samples.
Parameters
----------
name : str
evaluation_function : Callable[[np.ndarray, np.ndarray], np.ndarray] = None
Function applied to model predictions (Z) and true labels (Y) that returns
an array of metric values, e.g. the evaluation_function for
mean squared error is lambda z, y: return (z - y)**2
threshold_function : Callable[[np.ndarray, np.ndarray], float]
Function applied to model predictions (Z) and true labels (Y) that returns
a single threshold for comparison, e.g. when comparing to the population average,
the threshold_function for MSE is lambda z, y: return np.mean((z - y)**2)
metric_params : dict = {}
Additional parameters may be required for metrics that require separate error
tracking depending on the predicted value or true label.
For 'calibration'-type metrics, the key 'calibration_bins' should map to a list
that determines how the predicted values (Z) should be digitized/binned
For 'equalized odds'-type metrics, the key 'y_values' should map to a list
so that the metric is calculated separately for each value of y in that list
"""
The groups
module
We provide two methods in the groups
module for constructing collections of groups
that can be audited.
def get_intersections(
X : np.ndarray,
discretization : dict = {},
depth : int = None
) -> np.ndarray:
"""
Construct groups formed by intersections of other attributes.
Parameters
----------
X : np.ndarray
discretization : dict = {}
Keys index columns of X
Values specify input to the "bins" argument of np.digitize(...)
depth : int = None
If None, we consider all intersections, otherwise
we all consider intersections of up to specified depth.
Returns
---------
groups : np.ndarray
Boolean numpy array of size (n_points, n_groups)
"""
def get_rectangles(X : np.ndarray, discretization : dict = {}) -> np.ndarray:
"""
Construct rectangles formed by attributes.
Parameters
----------
discretization : dict
Keys index columns of X
Values specify input to the "bins" argument of np.digitize(...)
Returns
---------
groups : np.ndarray
Boolean numpy array of size (n_points, n_groups)
"""
Citing
If you use this code in a research project, please cite the forthcoming paper.
@article{cherian2023statistical,
title={Statistical inference for fairness auditing},
author={John Cherian \and Emmanuel Cand\`es},
publisher = {arXiv},
year = {2023},
note = {arXiv:2305.03712 [stat.ME]},
url = {https://arxiv.org/abs/2305.03712},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fairaudit-0.0.1.tar.gz
.
File metadata
- Download URL: fairaudit-0.0.1.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ecae3a6f6078d229314fdcf0e65cfa815e7be6cbc1b98b79b8046dddc4051f5 |
|
MD5 | 410483923f55f34bc6cd8fb1ce6da8d4 |
|
BLAKE2b-256 | 45c2f9ddbe4eae953ce8c3b318a633e9739f25286ae4b7626cb75e5d22efeb7c |
File details
Details for the file fairaudit-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: fairaudit-0.0.1-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9dbf1dbece88cab16ea78746e56e34d1539cb76676efe75306b9891f09c0b651 |
|
MD5 | 789d7ce6c18b8278f943a77f5fd0ef19 |
|
BLAKE2b-256 | 871848729d56aa8a74ba22dff9d66e0d93bf1dee290a2aa101c0342ee676bdd9 |