A Python package that generates synthetic datasets with different types of bias
Project description
Bias on Demand
Biasondemand is a Python package that generates synthetic datasets with different types of bias. This package is based on the research paper "Bias on Demand: A Modelling Framework That Generates Synthetic Data With Bias" published at the ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) 2023.
Authors & Contributors
Joachim Baumann, Alessandro Castelnovo, Riccardo Crupi, Nicole Inverardi, Daniele Regoli
Installation
To use biasondemand, run:
pip install biasondemand
Usage
Generating Synthetic Datasets
To generate a synthetic dataset with no bias, use the following python script:
import biasondemand
biasondemand.generate_dataset(path='my_unbiased_dataset', dim=1000)
Alternatively, you can also run it directly from the command line using the following command:
bias_on_demand_generate_dataset -p my_unbiased_dataset -dim 1000
This will generate a dataset with 1000 rows and save it in the directory datasets/my_unbiased_dataset/
.
You can introduce different types of bias into the dataset by specifying command line arguments. For example, to generate a dataset with measurement bias on the label Y (magnitude: 1.5) and historical bias on the feature R (magnitude: 2), use the following command:
import biasondemand
biasondemand.generate_dataset(path='my_biased_dataset', dim=1000, l_m_y=1.5, l_h_r=2)
Or, again, if you prefer the command line, just use:
bias_on_demand_generate_dataset -p my_biased_dataset -dim 1000 -l_m_y 1.5 -l_h_r 2
This will generate a biased dataset with 1000 rows and save it in the directory datasets/my_biased_dataset/
.
The following command line arguments are available to specify properties of the dataset:
- dim: Dimension of the dataset
- sy: Standard deviation of the noise of Y
- l_q: Lambda coefficient for importance of Q for Y
- l_r_q: Lambda coefficient that quantifies the influence from R to Q
- thr_supp: Threshold correlation for discarding features too much correlated with s
Furthermore, the following command line arguments are available to specify the types of biases to be introduced in the dataset:
- l_y: Lambda coefficient for historical bias on the target y
- l_m_y: Lambda coefficient for measurement bias on the target y
- l_h_r: Lambda coefficient for historical bias on R
- l_h_q: Lambda coefficient for historical bias on Q
- l_m: Lambda coefficient for measurement bias on the feature R. If l_m!=0 P substitutes R.
- p_u: Percentage of undersampling instance with A=1
- l_r: Boolean for inducing representation bias, that is undersampling conditioning on a variable, e.g. R
- l_o: Boolean variable for excluding an important variable (ommited variable bias), e.g. R (or its proxy)
- l_y_b: Lambda coefficient for interaction proxy bias, i.e., historical bias on the label y with lower values of y for individuals in group A=1 with high values for the feature R
Notice that the biases are introduced w.r.t. idividuals in the group A=1. For most types of bias, larger values mean more bias. The only exceptions are undersampling and representation bias (which can be seen as a specific type of undersampling conditional on the feature R) where smaller values correspond to more (conditional) undersampling, i.e., more bias.
Run experiments using biasondemand
In the repo https://github.com/rcrupiISP/BiasOnDemand we provide the code and instructions to run a set of experiments for investigating bias, fairness, and mitigation techniques. You can also check out our paper for more details on this topic.
Python version
Biasondemand requires Python 3.7 or later.
Citation
If you use biasondemand in your research, please cite our paper:
@inproceedings{baumann2023bias,
title={Bias on Demand: A Modelling Framework That Generates Synthetic Data With Bias},
author={Baumann, Joachim and Castelnovo, Alessandro and Crupi, Riccardo and Inverardi, Nicole and Regoli, Daniele},
booktitle={Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency},
doi={https://doi.org/10.1145/3593013.3594058},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file biasondemand-0.1.0.tar.gz
.
File metadata
- Download URL: biasondemand-0.1.0.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bafe7d241837416855aab82d1ba59a318d24c9d6ee40ae23915fdb6c0f5a82ec |
|
MD5 | 149342bd1ea22712407110fa851e6437 |
|
BLAKE2b-256 | 3eb14b1cb6627a8545df8e5c4272b2d177c375670c98ab4325a1aed49b586919 |
File details
Details for the file biasondemand-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: biasondemand-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d00b9b68b0eb01fa9cee414c3f493cbd187b1f08bbcdb13f3307846989a903c1 |
|
MD5 | 2e09e5833fb5b5ecf33259e7f0e3b3d2 |
|
BLAKE2b-256 | 9fe21b494ce60589aabf66884b0d985f9889d4f34458d2f2c9c7fe3626023555 |