Skip to main content

Fairness-Agnostic Data Optimization

Project description

Fairness-Agnostic Data Optimization

FairDo is a Python package for mitigating bias in data. The approaches, which are fairness-agnostic, enable optimization of diverse fairness criteria quantifying discrimination within datasets, leading to the generation of biased-reduced datasets. Our framework is able to deal with non-binary protected attributes such as nationality, race, and gender that naturally arise in many applications. Due to the possibility to choose between any of the available fairness metrics, it is possible to aim for the least fortunate group (Rawls' A Theory of Justice [2]) or the general utility of all groups (Utilitarianism).

Installation

Dependencies

Python (>=3.8, <4), numpy, pandas, sklearn, sdv

Setup Python Environment

# Create a virtual environment
python -m venv .venv

# Activate the virtual environment
# On Windows:
.venv\Scripts\activate

# On macOS and Linux:
source .venv/bin/activate

PyPI Distribution

The package is distributed via PyPI and can be installed with:

pip install fairdo

Manual Installation

python setup.py install

Development Installation

pip install -e.

Example Usage

Genetic Algorithms

In the following example, we use the COMPAS [1] dataset. The protected attribute is race and the label is recidivism. Here, we deploy a genetic algorithm to remove discriminatory samples of the merged original and synthetic dataset:

# Standard library
from functools import partial

# Related third-party imports
from sdv.tabular import GaussianCopula
import pandas as pd

# fairdo package
from fairdo.utils.dataset import load_data
from fairdo.preprocessing import HeuristicWrapper
from fairdo.optimize.geneticalgorithm import genetic_algorithm
from fairdo.metrics import statistical_parity_abs_diff_max

# Loading a sample database and encoding for appropriate usage
# data is a pandas dataframe
data, label, protected_attributes = load_data('compas')

# Create synthetic data
gc = GaussianCopula()
gc.fit(data)
data_syn = gc.sample(data.shape[0])

# Merge/concat original and synthetic data
data = pd.concat([data, data_syn.copy()], axis=0)

# Initial settings for the Genetic Algorithm
ga = partial(genetic_algorithm,
             pop_size=100,
             num_generations=100)
             
# Optimization step
preprocessor = HeuristicWrapper(heuristic=ga,
                                protected_attribute=protected_attributes[0],
                                label=label,
                                disc=statistical_parity_abs_diff_max)
data_fair = preprocessor.fit_transform(dataset=data,
                                       approach='remove')                                

Documentation

The package follows the PEP8 style guide and is documented with NumPy style docstrings. To view the HTML pages of the documentation, follow these instructions:

Activate virtual environment and install sphinx.

# Activate the virtual environment
# On Windows:
.venv\Scripts\activate

# On macOS and Linux:
source .venv/bin/activate

# Install Sphinx and a required theme
pip install sphinx furo

Run document generation script:

# Move to /docs
cd /docs

# Run script to generate documentation
bash generate_docs.sh

The HTML pages are then located in docs/_build/html. Open docs/_build/html/index.html to view the front page.

Citation

When using FairDo in your work, cite our paper:

@inproceedings{duong2023framework,
  title={Towards Fairness and Privacy: A Novel Data Pre-processing Optimization Framework for Non-binary Protected Attributes},
  author={Duong, Manh Khoi and Conrad, Stefan},
  booktitle={The 21st Australasian Data Mining Conference 2023},
  year={2023},
  organization={Springer Nature}
}

References

[1] Larson, J., Angwin, J., Mattu, S., Kirchner, L.: Machine bias (May 2016), https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

[2] Rawls, J.: A Theory of Justice (1971), Belknap Press, ISBN: 978-0-674-00078-0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fairdo-0.1.2.tar.gz (29.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page