Skip to main content

No project description provided

Project description

Copyright (c) 2018. The University of Chicago (''Chicago''). All Rights Reserved.

Permission to use, copy, modify, and distribute this software, including all object code and source code, and any accompanying documentation (together the ''Program'') for educational and not-for-profit research purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice, this paragraph and the following three paragraphs appear in all copies, modifications, and distributions. For the avoidance of doubt, educational and not-for-profit research purposes excludes any service or part of selling a service that uses the Program. To obtain a commercial license for the Program, contact the Technology Commercialization and Licensing, Polsky Center for Entrepreneurship and Innovation, University of Chicago, 1452 East 53rd Street, 2nd floor, Chicago, IL 60615.

Created by Data Science and Public Policy, University of Chicago

The Program is copyrighted by Chicago. The Program is supplied ''as is'', without any accompanying services from Chicago. Chicago does not warrant that the operation of the Program will be uninterrupted or error-free. The end-user understands that the Program was developed for research purposes and is advised not to rely exclusively on the Program for any reason.

IN NO EVENT SHALL CHICAGO BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THE PROGRAM, EVEN IF CHICAGO HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. CHICAGO SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE PROGRAM PROVIDED HEREUNDER IS PROVIDED "AS IS". CHICAGO HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

Description: ###################################
The Bias and Fairness Audit Toolkit
###################################

.. figure:: src/aequitas_webapp/static/images/aequitas_header.png
:scale: 50 %



Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive tools.


.. figure:: src/aequitas_webapp/static/images/use_aequitas.png
:scale: 50 %



`Visit the Aequitas project website <http://dsapp.uchicago.edu/aequitas/>`_.


`Try out the Aequitas web application <http://aequitas.dssg.io/>`_.


Documentation
=============

Find documentation `here <https://dssg.github.io/aequitas/>`_.

For usage examples of the python library, see our `demo notebook <https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb>`_ using Aequitas on the ProPublica COMPAS Recidivism Risk Assessment dataset.


Installation
============

Aequitas is compatible with: **Python 3.6+**

Install Aequitas using pip::

pip install aequitas


Or install current Aequitas master from source::

python setup.py install

...or named as an installation requirement, *e.g.* via ``pip``::

python -m pip install git+https://github.com/dssg/aequitas.git


You may then import the ``aequitas`` module from Python:

.. code-block:: python

import aequitas

...or execute the auditor from the command line::

aequitas-report

...or launch the Web front-end from the command line::

python -m serve


Containerization
================

To build a Docker container of Aequitas::

docker build -t aequitas .

...or simply via ``manage``::

manage container build

The Docker image's container defaults to launching the development Web server, though this can be overridden via the Docker "command" and/or "entrypoint".

To run such a container, supporting the Web server, on-the-fly::

docker run -p 5000:5000 -e "HOST=0.0.0.0" aequitas

...or, manage a development container via ``manage``::

manage container [create|start|stop]

To contact the team, please email us at [aequitas at uchicago dot edu]




30 Seconds to Aequitas
================================

**CLI**

With ``aequitas-report``, uncovering bias is as simple as running a single command on a CSV::

aequitas-report --input compas_for_aequitas.csv


**Python API**

To get started, preprocess your input data. Input data has slightly different requirements depending on whether you are using Aequitas via the webapp, CLI or Python package. See `general input requirements <#input-data>`_ and specific requirements for the `web app <#input-data-for-webapp>`_, `CLI <#input-data-for-cli>`_, and `Python API <#input-data-for-python-api>`_ in the section immediately below.

.. code-block:: python

from Aequitas.preprocessing import preprocess_input_df()

df['categorical_column_name'] = df['categorical_column_name'].astype(str)
df, _ = preprocess_input_df(*input_data*)

The Aequitas ``Group()`` class creates a crosstab of your preprocessed data, calculating absolute group metrics from score and label value truth status (true/ false positives and true/ false negatives)

.. code-block:: python

from aequitas.group import Group
g = Group()
xtab, _ = g.get_crosstabs(df)

The ``Plot()`` class can visualize a single group metric with ``plot_group_metric()``, or a list of bias metrics with ``plot_group_metric_all()``:

.. code-block:: python

p = Plot()
selected_metrics = p.plot_group_metric_all(xtab, metrics=['ppr','pprev','fnr','fpr'], ncols=4)


.. figure:: docs/_static/selected_group_metrics.png
:scale: 100%

The crosstab dataframe is augmented by every succeeding class with additional layers of information about biases, starting with bias disparities in the ``Bias()`` class. There are three ``get_disparity`` functions, one for each of the three ways to select a reference group. ``get_disparity_min_metric()`` and ``get_disparity_major_group()`` methods calculate a reference group automatically based on your data, while the user specifies reference groups for ``get_disparity_predefined_groups()``.

.. code-block:: python

b = Bias()
bdf = b.get_disparity_predefined_groups(xtab, original_df=df, ref_groups_dict={'race':'Caucasian', 'sex':'Male', 'age_cat':'25 - 45'}, alpha=0.05, mask_significance=True)

`Learn more about reference group selection. <https://dssg.github.io/aequitas/config.html>`_


The ``Plot()`` class visualizes disparities as treemaps colored by disparity relationship to a given `fairness threshold <https://dssg.github.io/aequitas/config.html>`_ with ``plot_disparity()`` or multiple with ``plot_disparity_all()``:

.. code-block:: python

j = aqp.plot_disparity_all(bdf, metrics=['ppr_disparity', 'pprev_disparity', 'fnr_disparity', 'fpr_disparity', 'precision_disparity', 'fdr_disparity'], attributes=['race'], significance_alpha=0.05)

.. figure:: docs/_static/selected_treemaps.png
:scale: 100%


Now you're ready to obtain metric parities with the ``Fairness()`` class:

.. code-block:: python

f = Fairness()
fdf = f.get_group_value_fairness(bdf)

You now have parity determinations for your models that can be leveraged in model selection!

To visualize fairness, use ``Plot()`` class fairness methods.

To visualize ``'all'`` group absolute bias metric parity determinations:

.. code-block:: python

fg = aqp.plot_fairness_group_all(fdf, ncols=5, metrics = "all")
wheat


.. figure:: docs/_static/all_fairness_group.png
:scale: 100%


To visualize parity treemaps for multiple disparities, pass metrics of interest as a list:

.. code-block:: python

f_maps = aqp.plot_fairness_disparity_all(fdf, metrics=['pprev_disparity', 'ppr_disparity'])

.. figure:: docs/_static/fairness_selected_disparities_race.png
:scale: 100%



Input Data
==========
In general, input data is a single table with the following columns:

- ``score``
- ``label_value`` (for error-based metrics only)
- at least one attribute e.g. ``race``, ``sex`` and ``age_cat`` (attribute categories defined by user)

===== =========== ================ ==== === ======
score label_value race sex age income
===== =========== ================ ==== === ======
0 1 African-American Male 25 18000
1 1 Caucasian Male 37 34000
===== =========== ================ ==== === ======

`Back to 30 Seconds to Aequitas <#30-seconds-to-aequitas>`_

Input data for Webapp
---------------------

The webapp requires a single CSV with columns for a binary ``score``, a binary ``label_value`` and an arbitrary number of attribute columns. Each row is associated with a single observation.

.. figure:: docs/_static/webapp_input.png
:height: 240px
:width: 320px


``score``
---------
Aequitas webapp assumes the ``score`` column is a binary decision (0 or 1).


``label_value``
---------------
This is the ground truth value of a binary decision. The data again must be binary 0 or 1.


attributes (e.g. ``race``, ``sex``, ``age``, ``income``)
---------------------------------------------------------
Group columns can be categorical or continuous. If categorical, Aequitas will produce crosstabs with bias metrics for each group_level. If continuous, Aequitas will first bin the data into quartiles and then create crosstabs with the newly defined categories.

`Back to 30 Seconds to Aequitas <#30-seconds-to-aequitas>`_


Input data for CLI
------------------

The CLI accepts CSV files and accommodates database calls defined in Configuration files.

.. figure:: docs/_static/CLI_input.png
:height: 240px
:width: 320px


``score``
---------
By default, Aequitas CLI assumes the ``score`` column is a binary decision (0 or 1). Alternatively, the ``score`` column can contain the score (e.g. the output from a logistic regression applied to the data). In this case, the user sets a threshold to determine the binary decision. `See configurations <https://dssg.github.io/aequitas/config.html>`_ for more on thresholds.


``label_value``
---------------
As with the webapp, this is the ground truth value of a binary decision. The data must be binary 0 or 1.


attributes (e.g. ``race``, ``sex``, ``age``, ``income``)
---------------------------------------------------------
Group columns can be categorical or continuous. If categorical, Aequitas will produce crosstabs with bias metrics for each group value. If continuous, Aequitas will first bin the data into quartiles.

``model_id``
------------
``model_id`` is an identifier tied to the output of a specific model. With a ``model_id`` column you can test the bias of multiple models at once. This feature is available using the CLI or the Python package.


Reserved column names:
----------------------

* ``id``
* ``model_id``
* ``entity_id``
* ``rank_abs``
* ``rank_pct``


`Back to 30 Seconds to Aequitas <#30-seconds-to-aequitas>`_


Input data for Python API
-------------------------
Python input data can be handled identically to CLI by using ``preprocess_input_df()``. Otherwise, you must discretize continuous attribute columns prior to passing the data to ``Group().get_crosstabs()``.

.. code-block:: python

from Aequitas.preprocessing import preprocess_input_df()
# *input_data* matches CLI input data norms.
df, _ = preprocess_input_df(*input_data*)


.. figure:: docs/_static/python_input.png
:height: 240px
:width: 320px

``score``
---------
By default, Aequitas assumes the ``score`` column is a binary decision (0 or 1). If the ``score`` column contains a non-binary score (e.g. the output from a logistic regression applied to the data), the user sets a threshold to determine the binary decision. Thresholds are set in a dictionary passed to `get_crosstabs()` of format {'rank_abs':[300] , 'rank_pct':[1.0, 5.0, 10.0]}. `See configurations <https://dssg.github.io/aequitas/config.html>`_ for more on thresholds.

``label_value``
---------------
This is the ground truth value of a binary decision. The data must be binary (0 or 1).

attributes (e.g. ``race``, ``sex``, ``age``, ``income``)
---------------------------------------------------------
Group columns can be categorical or continuous. If categorical, Aequitas will produce crosstabs with bias metrics for each group_level. If continuous, Aequitas will first bin the data into quartiles.

If you plan to bin or discretize continuous features manually, note that ``get_crosstabs()`` expects attribute columns to be of type 'string'. This excludes the ``pandas`` 'categorical' data type, which is the default output of certain ``pandas`` discretizing functions. You can recast 'categorical' columns to strings:

.. code-block:: python

df['categorical_column_name'] = df['categorical_column_name'].astype(str)

``model_id``
------------
``model_id`` is an identifier tied to the output of a specific model. With a ``model_id`` column you can test the bias of multiple models at once. This feature is available using the CLI or the Python package.


Reserved column names:
----------------------
* ``id``
* ``model_id``
* ``entity_id``
* ``rank_abs``
* ``rank_pct``


`Back to 30 Seconds to Aequitas <#30-seconds-to-aequitas>`_



Development
===========

Provision your development environment via the shell script ``develop``::

./develop

Common development tasks, such as deploying the webapp, may then be handled via ``manage``::

manage --help


Citing Aequitas
===============

If you use Aequitas in a scientific publication, we would appreciate citations to the following paper:

Pedro Saleiro, Benedict Kuester, Abby Stevens, Ari Anisfeld, Loren Hinkson, Jesse London, Rayid Ghani, Aequitas: A Bias and Fairness Audit Toolkit, arXiv preprint arXiv:1811.05577 (2018). ( `PDF <https://arxiv.org/pdf/1811.05577.pdf>`_)


@article{2018aequitas,
title={Aequitas: A Bias and Fairness Audit Toolkit},
author={Saleiro, Pedro and Kuester, Benedict and Stevens, Abby and Anisfeld, Ari and Hinkson, Loren and London, Jesse and Ghani, Rayid},
journal={arXiv preprint arXiv:1811.05577},
year={2018}}

|
|
|
|
|
|


© 2018 Center for Data Science and Public Policy - University of Chicago

Keywords: fairness bias aequitas
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aequitas-0.27.0.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aequitas-0.27.0-py3-none-any.whl (2.1 MB view details)

Uploaded Python 3

File details

Details for the file aequitas-0.27.0.tar.gz.

File metadata

  • Download URL: aequitas-0.27.0.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for aequitas-0.27.0.tar.gz
Algorithm Hash digest
SHA256 d123e0a15b25ea41a9a495cae5507dbfb28e0077b54ff960a5bb741d873aa72c
MD5 9ebc30bf0bb403a18e0511f3097e1327
BLAKE2b-256 ea50921ab96af9771e8aa19e8f0e45bdb997969efce8123931f16e7ad29c9ac5

See more details on using hashes here.

File details

Details for the file aequitas-0.27.0-py3-none-any.whl.

File metadata

  • Download URL: aequitas-0.27.0-py3-none-any.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for aequitas-0.27.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35e4e901e2f9ed6c88a00a8387f57a0fb181280bb22a6e7819d8a1409f9a1e58
MD5 9fe376d117172256da918357dffd2af0
BLAKE2b-256 d262349f78c567d8f1350f7c595e0fb09be67f7b89cb739402aa6de3b803f858

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page