Skip to main content

This package enables fairness auditing with statistical guarantees.

Project description

fairaudit

fairaudit is a Python package for fairness auditing with statisical guarantees.

Given a hold-out set ${(x_i, y_i)}_{i = 1}^n$, a model $f(\cdot)$, and a collection of groups $\mathcal{G}$, fairaudit issues simultaneously valid guarantees on group-wise performance and discovers groups with disparate performance for an arbitrary metric, $\ell(f(x), y)$.

Formally, let

$$\epsilon(G) := \mathbb{E}_P[\ell(f(X), Y) \mid (X, Y) \in G] - \theta_P.$$

Then, the certification task corresponds to issuing a simultaneously valid confidence set (can be an upper bound, lower bound, or interval) for $\epsilon(G)$, while the flagging task corresponds to discovering $G$ for which $\epsilon(G)$ fails to meet some tolerance threshold. The latter discoveries satisfy an asymptotic FDR guarantee.

Installation

fairaudit can be installed (locally for now) with pip.

To install with pip, navigate to the directory for this repo and

$ pip install . 

Examples

The easiest way to start using fairaudit may be to go through the following notebook:

Usage

The Auditor Class

The Auditor class has the following API:

class Auditor:
    def __init__(
	self, 
	x: np.ndarray, 
	y: np.ndarray, 
	z: np.ndarray, 
	metric: fairaudit.Metric): ...

    def calibrate_groups(
        self, 
        alpha : float,
        type : str,
        groups : Union[np.ndarray, str],
        epsilon : float = None,
        bootstrap_params : dict = {}
    ) -> None:
        """
        Obtain bootstrap critical values for a specific group collection.

        Parameters
        ----------
        alpha : float
            Type I error threshold
        type : str
            Takes one of three values ('lower', 'upper', 'interval').
            See epsilon documentation for what these correpsond to.
        groups : Union[np.ndarray, str]
            Either a string for a supported collection of groups or a numpy array
            likely obtained by calling `get_intersections` or `get_rectangles` 
            from group.py
            Array dimensions should be (n_points, n_groups)
        epsilon : float = None
            epsilon = None calibrates for issuing confidence bounds. 
                type = "upper" issues lower confidence bounds, 
                type = "lower" issues upper confidence bounds, 
                type = "interval" issues confidence intervals.
            If a non-null value is passed in, we issue a Boolean certificate. 
                type = "upper" tests the null that epsilon(G) >= epsilon
                type = "lower" tests the null that epsilon(G) <= epsilon
                type = "interval" tests the null that |epsilon(G)| >= epsilon
        bootstrap_params : dict = {}
            Allows the user to specify a random seed, number of bootstrap resamples,
            and studentization parameters for the bootstrap process.
        """

    def query_group(self, group : Union[np.ndarray, int]) -> 
    Tuple[List[Union[float, bool]], List[float], List[float]]:
        """
        Query calibrated auditor for certificate for a particular group
        
        Parameters
        ----------
        group : Union[np.ndarray, int]
            Will accept index into groups originally passed in or Boolean 
            array if calibrated collection was infinite

        Returns
        -------
        certificate : List[Union[float, bool]]
            Boolean certificates or confidence bounds for each metric audited
        value : List[float]
            Empirical value of epsilon(G) for each metric audited
        threshold : List[float]
            Estimate of theta for each metric audited
        """

    def calibrate_rkhs(
        self,
        alpha : float,
        type : str,
        kernel : str,
        kernel_params : dict = {},
        bootstrap_params : dict = {}
    ) -> None:
        """
        Obtain bootstrap critical value for a specified RKHS.

        Parameters
        ----------
        alpha : float
            Type I error threshold
        type : str
            Takes one of three values ('lower', 'upper', 'interval').
                type = "upper" issues lower confidence bounds, 
                type = "lower" issues upper confidence bounds, 
                type = "interval" issues confidence intervals.
        kernel : str
            Name of scikit-learn kernel the user would like to use. 
            Suggested kernels: 'rbf' 'laplacian' 'sigmoid'
        kenrnel_params : dict = {}
            Additional parameters required to specify the kernel, 
            e.g. {'gamma': 1} for RBF kernel
        bootstrap_params : dict = {}
            Allows the user to specify a random seed, number of bootstrap resamples,
            and studentization parameters for the bootstrap process.
        """

    def query_rkhs(self, weights : np.ndarray) -> Tuple[List[float], List[float]]:
        """
        Query calibrated auditor for certificate for a particular RKHS
        function.
        
        Parameters
        ----------
        weights : np.ndarray
            RHKS function weights, i.e. f(x_i) = (Kw)_i
        Returns
        -------
        certificate : List[float]
            Confidence bounds for each metric queried.
        value : List[float]
            Empirical value of epsilon(G) for each metric queried.
        """

    def flag_groups(
        self,
        groups : np.ndarray,
        type : str,
        alpha : float,
        epsilon : float = 0,
        bootstrap_params : dict = {"student" : "mad", "student_threshold" : -np.inf}  
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Returns flags and estimates of epsilon(G) for each group in some finite 
        collection. 

        Parameters
        ----------
        groups : np.ndarray
            Boolean numpy array of dimension (n_points, n_groups)
        type : str
            Takes values ('upper', 'lower', 'interval')
            'upper' tests the null, epsilon(G) >= epsilon
            'lower' tests the null, epsilon(G) <= epsilon
            'interval' tests the null, |epsilon(G)| <= epsilon
        alpha : float
            FDR level
        epsilon : float = 0
            See 'type' documentation
        bootstrap_params : dict = {"student" : "mad", "student_threshold" : -np.inf} 
            Allows the user to specify a random seed, number of bootstrap resamples,
            and studentization parameters for the bootstrap process.
        
        Returns
        -------
        flags : List[bool]
            One flag is raised for each group - at least one metric must be flagged 
	    for the group to receive a True flag.
        values : List[float]
            Empirical value of epsilon(G) for each metric queried.
        """

The Metric class

The Metric object should be instantiated for any performance metrics the auditor may be interested in querying. Its methods are never directly queried by the user, but its constructor should be usable.

class Metric:
    def __init__(
        self, 
        name : str, 
        evaluation_function : Callable[[np.ndarray, np.ndarray], np.ndarray] = None,
        threshold_function : Callable[[np.ndarray, np.ndarray], float] = None,
        metric_params : dict = {}
    ) -> None:
        """
        Constructs the Metric object used by the Auditor object to recompute performance
        over bootstrap samples.

        Parameters
        ----------
        name : str
        evaluation_function : Callable[[np.ndarray, np.ndarray], np.ndarray] = None
            Function applied to model predictions (Z) and true labels (Y) that returns 
            an array of metric values, e.g. the evaluation_function for 
            mean squared error is lambda z, y: return (z - y)**2
        threshold_function : Callable[[np.ndarray, np.ndarray], float]
            Function applied to model predictions (Z) and true labels (Y) that returns 
            a single threshold for comparison, e.g. when comparing to the population average, 
            the threshold_function for MSE is lambda z, y: return np.mean((z - y)**2)
        metric_params : dict = {}
            Additional parameters may be required for metrics that require separate error
            tracking depending on the predicted value or true label.

            For 'calibration'-type metrics, the key 'calibration_bins' should map to a list
            that determines how the predicted values (Z) should be digitized/binned

            For 'equalized odds'-type metrics, the key 'y_values' should map to a list
            so that the metric is calculated separately for each value of y in that list
        """

The groups module

We provide two methods in the groups module for constructing collections of groups that can be audited.

def get_intersections(
    X : np.ndarray, 
    discretization : dict = {},
    depth : int = None
) -> np.ndarray:
    """
    Construct groups formed by intersections of other attributes.
    
    Parameters
    ----------
    X : np.ndarray
    discretization : dict = {}
        Keys index columns of X
        Values specify input to the "bins" argument of np.digitize(...)
    depth : int = None
        If None, we consider all intersections, otherwise
        we all consider intersections of up to specified depth.
    Returns
    ---------
    groups : np.ndarray
        Boolean numpy array of size (n_points, n_groups)
    """

def get_rectangles(X : np.ndarray, discretization : dict = {}) -> np.ndarray:
    """
    Construct rectangles formed by attributes.

    Parameters
    ----------

    discretization : dict 
        Keys index columns of X
        Values specify input to the "bins" argument of np.digitize(...)

    Returns
    ---------
    groups : np.ndarray
        Boolean numpy array of size (n_points, n_groups)
    """

Citing

If you use this code in a research project, please cite the forthcoming paper.

@article{cherian2023statistical,
    title={Statistical inference for fairness auditing},
    author={John Cherian \and Emmanuel Cand\`es},
    publisher = {arXiv},
    year = {2023},
    note = {arXiv:2305.03712 [stat.ME]},
    url = {https://arxiv.org/abs/2305.03712},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fairaudit-0.0.1.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

fairaudit-0.0.1-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file fairaudit-0.0.1.tar.gz.

File metadata

  • Download URL: fairaudit-0.0.1.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for fairaudit-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1ecae3a6f6078d229314fdcf0e65cfa815e7be6cbc1b98b79b8046dddc4051f5
MD5 410483923f55f34bc6cd8fb1ce6da8d4
BLAKE2b-256 45c2f9ddbe4eae953ce8c3b318a633e9739f25286ae4b7626cb75e5d22efeb7c

See more details on using hashes here.

File details

Details for the file fairaudit-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: fairaudit-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for fairaudit-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9dbf1dbece88cab16ea78746e56e34d1539cb76676efe75306b9891f09c0b651
MD5 789d7ce6c18b8278f943a77f5fd0ef19
BLAKE2b-256 871848729d56aa8a74ba22dff9d66e0d93bf1dee290a2aa101c0342ee676bdd9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page