A GAN-based approach for fairness-aware synthetic data generation
Project description
TabFairGAN
This repository is the code for the papar TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks . TabFairGAN is a synthetic tabular data generator which could produce synthetic data, with or without fairness constraint. The model uses a Wasserstein Generative Adversarial Network to produce synthetic data with high quality.
Usage
TabFairGAN is used programmatically in Python. You can either generate synthetic data with fairness constraints or without fairness constraints. The package now provides a more modular interface.
Basic Usage
- Without Fairness Constraints: If you do not need fairness constraints, you simply omit the fairness_config parameter.
- With Fairness Constraints: To enforce fairness constraints, you must pass a dictionary with specific parameters as explained below.
Example 1: Without Fairness Constraints
import pandas as pd
from tabfairgan import TFG
# Load your dataset
df = pd.read_csv("adult.csv")
# Initialize TabFairGAN without fairness constraints
tfg = TFG(df, epochs=200, batch_size=256, device='cuda:0')
# Train the model
tfg.train()
# Generate synthetic data
fake_df = tfg.generate_fake_df(num_rows=32561)
In this case, the model will focus solely on generating high-quality synthetic data without considering fairness.
Example 2: With Fairness Constraints
To generate fair synthetic data, you need to pass a dictionary containing the following parameters:
- fair_epochs: Number of fair training epochs (integer).
- lamda: Lambda parameter controlling the trade-off between fairness and accuracy (float).
- S: Protected attribute (string, e.g., "sex").
- Y: Decision label (string, e.g., "income").
- S_under: Value representing the underprivileged group for the protected attribute (string, e.g., " Female").
- Y_desire: Desired value for the label (string, e.g., " >50K").
import pandas as pd
from tabfairgan import TFG
# Load your dataset
df = pd.read_csv("adult/adult.csv")
# Define fairness configuration
fairness_config = {
'fair_epochs': 50,
'lamda': 0.5,
'S': 'sex',
'Y': 'income',
'S_under': ' Female',
'Y_desire': ' >50K'
}
# Initialize TabFairGAN with fairness constraints
tfg = TFG(df, epochs=200, batch_size=256, device='cuda:0', fairness_config=fairness_config)
# Train the model
tfg.train()
# Generate synthetic data
fake_df = tfg.generate_fake_df(num_rows=32561)
In this case, the model will generate synthetic data that not only preserves high quality but also enforces fairness with respect to the specified protected attribute and decision label.
Important Notes:
- Fairness Configuration: If you want to use fairness constraints, you must provide a dictionary containing all the required fairness parameters: fair_epochs, lamda, S, Y, S_under, and Y_desire.
- Without Fairness: If no fairness_config is provided, the model will default to generating synthetic data without fairness constraints.
Citing TabFairGAN
If you use TabFairGAN, please cite the paper:
@article{rajabi2022tabfairgan,
title={Tabfairgan: Fair tabular data generation with generative adversarial networks},
author={Rajabi, Amirarsalan and Garibay, Ozlem Ozmen},
journal={Machine Learning and Knowledge Extraction},
volume={4},
number={2},
pages={488--501},
year={2022},
publisher={MDPI}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tabfairgan-0.1.0.tar.gz
.
File metadata
- Download URL: tabfairgan-0.1.0.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18b339b013d9beca760fa3cc38b4f40fb8955783d8321be754976ee0793d53ce |
|
MD5 | 1f9a834d906f699a7cd4c4fec6ee558a |
|
BLAKE2b-256 | d99ed936bbcf29ac84f6711fb930c08da168043f3e93c6c27e719916f0469796 |
File details
Details for the file tabfairgan-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: tabfairgan-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e2a7ee6da64eea1d871a004698f6c2e08b835f416309a1d7c7bfb035e066958 |
|
MD5 | ed242798624674c7d3634e80622ce520 |
|
BLAKE2b-256 | aafae5bffc034bd8fad468d1ce8002d91d3869f0fcd88ab9cc004cb9bf982201 |