This package enables fairness auditing with statistical guarantees.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

fairaudit

fairaudit is a Python package for fairness auditing with statisical guarantees.

Given a hold-out set ${(x_i, y_i)}_{i = 1}^n$, a model $f(\cdot)$, and a collection of groups $\mathcal{G}$, fairaudit issues simultaneously valid guarantees on group-wise performance and discovers groups with disparate performance for an arbitrary metric, $\ell(f(x), y)$.

Formally, let

$$\epsilon(G) := \mathbb{E}_P[\ell(f(X), Y) \mid (X, Y) \in G] - \theta_P.$$

Then, the certification task corresponds to issuing a simultaneously valid confidence set (can be an upper bound, lower bound, or interval) for $\epsilon(G)$, while the flagging task corresponds to discovering $G$ for which $\epsilon(G)$ fails to meet some tolerance threshold. The latter discoveries satisfy an asymptotic FDR guarantee.

Installation

fairaudit can be installed (locally for now) with pip.

To install with pip, navigate to the directory for this repo and

$ pip install .

Examples

The easiest way to start using fairaudit may be to go through the following notebook:

COMPAS

Usage

The `Auditor` Class

The Auditor class has the following API:

class Auditor:
    def __init__(
	self, 
	x: np.ndarray, 
	y: np.ndarray, 
	z: np.ndarray, 
	metric: fairaudit.Metric): ...

    def calibrate_groups(
        self, 
        alpha : float,
        type : str,
        groups : Union[np.ndarray, str],
        epsilon : float = None,
        bootstrap_params : dict = {}
    ) -> None:
        """
        Obtain bootstrap critical values for a specific group collection.

        Parameters
        ----------
        alpha : float
            Type I error threshold
        type : str
            Takes one of three values ('lower', 'upper', 'interval').
            See epsilon documentation for what these correpsond to.
        groups : Union[np.ndarray, str]
            Either a string for a supported collection of groups or a numpy array
            likely obtained by calling `get_intersections` or `get_rectangles` 
            from group.py
            Array dimensions should be (n_points, n_groups)
        epsilon : float = None
            epsilon = None calibrates for issuing confidence bounds. 
                type = "upper" issues lower confidence bounds, 
                type = "lower" issues upper confidence bounds, 
                type = "interval" issues confidence intervals.
            If a non-null value is passed in, we issue a Boolean certificate. 
                type = "upper" tests the null that epsilon(G) >= epsilon
                type = "lower" tests the null that epsilon(G) <= epsilon
                type = "interval" tests the null that |epsilon(G)| >= epsilon
        bootstrap_params : dict = {}
            Allows the user to specify a random seed, number of bootstrap resamples,
            and studentization parameters for the bootstrap process.
        """

    def query_group(self, group : Union[np.ndarray, int]) -> 
    Tuple[List[Union[float, bool]], List[float], List[float]]:
        """
        Query calibrated auditor for certificate for a particular group
        
        Parameters
        ----------
        group : Union[np.ndarray, int]
            Will accept index into groups originally passed in or Boolean 
            array if calibrated collection was infinite

        Returns
        -------
        certificate : List[Union[float, bool]]
            Boolean certificates or confidence bounds for each metric audited
        value : List[float]
            Empirical value of epsilon(G) for each metric audited
        threshold : List[float]
            Estimate of theta for each metric audited
        """

    def calibrate_rkhs(
        self,
        alpha : float,
        type : str,
        kernel : str,
        kernel_params : dict = {},
        bootstrap_params : dict = {}
    ) -> None:
        """
        Obtain bootstrap critical value for a specified RKHS.

        Parameters
        ----------
        alpha : float
            Type I error threshold
        type : str
            Takes one of three values ('lower', 'upper', 'interval').
                type = "upper" issues lower confidence bounds, 
                type = "lower" issues upper confidence bounds, 
                type = "interval" issues confidence intervals.
        kernel : str
            Name of scikit-learn kernel the user would like to use. 
            Suggested kernels: 'rbf' 'laplacian' 'sigmoid'
        kenrnel_params : dict = {}
            Additional parameters required to specify the kernel, 
            e.g. {'gamma': 1} for RBF kernel
        bootstrap_params : dict = {}
            Allows the user to specify a random seed, number of bootstrap resamples,
            and studentization parameters for the bootstrap process.
        """

    def query_rkhs(self, weights : np.ndarray) -> Tuple[List[float], List[float]]:
        """
        Query calibrated auditor for certificate for a particular RKHS
        function.
        
        Parameters
        ----------
        weights : np.ndarray
            RHKS function weights, i.e. f(x_i) = (Kw)_i
        Returns
        -------
        certificate : List[float]
            Confidence bounds for each metric queried.
        value : List[float]
            Empirical value of epsilon(G) for each metric queried.
        """

    def flag_groups(
        self,
        groups : np.ndarray,
        type : str,
        alpha : float,
        epsilon : float = 0,
        bootstrap_params : dict = {"student" : "mad", "student_threshold" : -np.inf}  
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Returns flags and estimates of epsilon(G) for each group in some finite 
        collection. 

        Parameters
        ----------
        groups : np.ndarray
            Boolean numpy array of dimension (n_points, n_groups)
        type : str
            Takes values ('upper', 'lower', 'interval')
            'upper' tests the null, epsilon(G) >= epsilon
            'lower' tests the null, epsilon(G) <= epsilon
            'interval' tests the null, |epsilon(G)| <= epsilon
        alpha : float
            FDR level
        epsilon : float = 0
            See 'type' documentation
        bootstrap_params : dict = {"student" : "mad", "student_threshold" : -np.inf} 
            Allows the user to specify a random seed, number of bootstrap resamples,
            and studentization parameters for the bootstrap process.
        
        Returns
        -------
        flags : List[bool]
            One flag is raised for each group - at least one metric must be flagged 
	    for the group to receive a True flag.
        values : List[float]
            Empirical value of epsilon(G) for each metric queried.
        """

The `Metric` class

The Metric object should be instantiated for any performance metrics the auditor may be interested in querying. Its methods are never directly queried by the user, but its constructor should be usable.

class Metric:
    def __init__(
        self, 
        name : str, 
        evaluation_function : Callable[[np.ndarray, np.ndarray], np.ndarray] = None,
        threshold_function : Callable[[np.ndarray, np.ndarray], float] = None,
        metric_params : dict = {}
    ) -> None:
        """
        Constructs the Metric object used by the Auditor object to recompute performance
        over bootstrap samples.

        Parameters
        ----------
        name : str
        evaluation_function : Callable[[np.ndarray, np.ndarray], np.ndarray] = None
            Function applied to model predictions (Z) and true labels (Y) that returns 
            an array of metric values, e.g. the evaluation_function for 
            mean squared error is lambda z, y: return (z - y)**2
        threshold_function : Callable[[np.ndarray, np.ndarray], float]
            Function applied to model predictions (Z) and true labels (Y) that returns 
            a single threshold for comparison, e.g. when comparing to the population average, 
            the threshold_function for MSE is lambda z, y: return np.mean((z - y)**2)
        metric_params : dict = {}
            Additional parameters may be required for metrics that require separate error
            tracking depending on the predicted value or true label.

            For 'calibration'-type metrics, the key 'calibration_bins' should map to a list
            that determines how the predicted values (Z) should be digitized/binned

            For 'equalized odds'-type metrics, the key 'y_values' should map to a list
            so that the metric is calculated separately for each value of y in that list
        """

The `groups` module

We provide two methods in the groups module for constructing collections of groups that can be audited.

def get_intersections(
    X : np.ndarray, 
    discretization : dict = {},
    depth : int = None
) -> np.ndarray:
    """
    Construct groups formed by intersections of other attributes.
    
    Parameters
    ----------
    X : np.ndarray
    discretization : dict = {}
        Keys index columns of X
        Values specify input to the "bins" argument of np.digitize(...)
    depth : int = None
        If None, we consider all intersections, otherwise
        we all consider intersections of up to specified depth.
    Returns
    ---------
    groups : np.ndarray
        Boolean numpy array of size (n_points, n_groups)
    """

def get_rectangles(X : np.ndarray, discretization : dict = {}) -> np.ndarray:
    """
    Construct rectangles formed by attributes.

    Parameters
    ----------

    discretization : dict 
        Keys index columns of X
        Values specify input to the "bins" argument of np.digitize(...)

    Returns
    ---------
    groups : np.ndarray
        Boolean numpy array of size (n_points, n_groups)
    """

Citing

If you use this code in a research project, please cite the forthcoming paper.

@article{cherian2023statistical,
    title={Statistical inference for fairness auditing},
    author={John Cherian \and Emmanuel Cand\`es},
    publisher = {arXiv},
    year = {2023},
    note = {arXiv:2305.03712 [stat.ME]},
    url = {https://arxiv.org/abs/2305.03712},
}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.1

Aug 2, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fairaudit-0.0.1.tar.gz (15.8 kB view hashes)

Uploaded Aug 2, 2023 Source

Built Distribution

fairaudit-0.0.1-py3-none-any.whl (15.4 kB view hashes)

Uploaded Aug 2, 2023 Python 3

Hashes for fairaudit-0.0.1.tar.gz

Hashes for fairaudit-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`1ecae3a6f6078d229314fdcf0e65cfa815e7be6cbc1b98b79b8046dddc4051f5`
MD5	`410483923f55f34bc6cd8fb1ce6da8d4`
BLAKE2b-256	`45c2f9ddbe4eae953ce8c3b318a633e9739f25286ae4b7626cb75e5d22efeb7c`

Hashes for fairaudit-0.0.1-py3-none-any.whl

Hashes for fairaudit-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9dbf1dbece88cab16ea78746e56e34d1539cb76676efe75306b9891f09c0b651`
MD5	`789d7ce6c18b8278f943a77f5fd0ef19`
BLAKE2b-256	`871848729d56aa8a74ba22dff9d66e0d93bf1dee290a2aa101c0342ee676bdd9`

fairaudit 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

fairaudit

Installation

Examples

Usage

The `Auditor` Class

The `Metric` class

The `groups` module

Citing

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

fairaudit 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

fairaudit

Installation

Examples

Usage

The Auditor Class

The Metric class

The groups module

Citing

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

The `Auditor` Class

The `Metric` class

The `groups` module