A differentially private data synthesizer and fairness intervention benchmark framework
Project description
DP+Fair Benchmarking Framework
This repository provides a Python framework for benchmarking fairness mechanisms on Differentially Private Synthetic Data.
Features
- ⚡ Simple, reproducible setup for benchmarking algorithms
- 🧩 Flexible API to plug in any classifier implementing
fit,predict, andpredict_proba - 📊 Pre-offered datasets included under
data/ - 🔬 Configurable experiment settings: dataset schema, dataset synthesizer, seeds, privacy-budget, input/outputs, classifier, data pre-processing.
Installation
From source To install, clone the repository and install dependencies:
git clone https://github.com/vinicius-verona/dp-fair-intervention-benchmark.git
cd dp-fair-intervention-benchmark
pip install -e .
Using PyPi (SUGGESTED)
pip install BenchmarkDPFair
Repository Structure
├── data/ # Pre-offered datasets
├── src/ # Core source code
├── examples/ # Some demo
├── tests/ # Unit tests
└── README.md
Quick Start
Here is a dummy example:
import argparse
from typing import List, Union
from BenchmarkDPFair.DataGenerator import generate_data, DatasetGeneratorConfig
from BenchmarkDPFair.Benchmark import benchmark, BenchmarkInfo, BenchmarkDatasetConfig
from sklearn.linear_model import LogisticRegression
ESTIMATOR_PARAMS = {
'max_iter': 10000,
'solver': 'saga',
'l1_ratio': 0.5,
'C': 0.8
}
lr = LogisticRegression
classifiers = [lr]
ckwargs = [
ESTIMATOR_PARAMS,
]
classifier_name = ["LR"]
combinations = [
(0, 0),
(0, 1),
]
synths = ["aim", "mst"]
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Arguments of Data Generation for Adult")
parser.add_argument(
"--seeds", "-s",
nargs="+", # 1 or more values
type=int # convert automatically to int
)
args = parser.parse_args()
seeds = args.seeds
eps : List[Union[int,float]] = [.05, .1]
for synthesizer in synths:
for s in seeds:
data_conf = DatasetGeneratorConfig(
name = "Compas",
target= "two_year_recid",
synthesizer = synthesizer,
root_dir="./data",
sensitive_attr = "race",
categorical_cols = ['race', 'score_text', 'c_charge_degree','age', 'sex', 'two_year_recid'],
sensitive_cols = ['race', 'sex'],
ordinal_cols = ['priors_count'],
privacy_budgets=eps,
binary_encoder=binary_encode,
seed = s,
test_split_size=0.4,
data_filter = filter_compas
)
generate_data(f"compas.csv", "", data_conf, "./data", verbose=True)
for clf_idx, syn_idx in combinations:
classifier = classifiers[clf_idx]
synth = synths[syn_idx]
benchmark_config = BenchmarkInfo(
dp_method=synth,
output_dir=f"./output/Dummy-Compas/{classifier_name[clf_idx]}/",
seeds=seeds,
eps = eps,
classifier=classifier,
classifier_kwargs=ckwargs[clf_idx]
)
benchmark_dataset = BenchmarkDatasetConfig(
name = "Compas",
target= "two_year_recid",
root_dir="./data",
sensitive_attr = "race",
index_col="Unnamed: 0",
categorical_cols = ['race', 'score_text', 'c_charge_degree','age', 'sex', 'two_year_recid'],
ordinal_cols=["priors_count"],
sensitive_cols = ['race', 'sex'],
)
benchmark(benchmark_info=benchmark_config, data_conf=benchmark_dataset)
More detailed examples can be found in the example/ directory.
License
License: MIT
Contributing
Contributions are welcome:
- Open an issue for bug reports or feature requests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file benchmarkdpfair-0.2.3.tar.gz.
File metadata
- Download URL: benchmarkdpfair-0.2.3.tar.gz
- Upload date:
- Size: 23.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a285a72619c0b8f9f35dc559998ce1001ee6e965efea172bd73c819b60f450bd
|
|
| MD5 |
b448d05d17ea32314b7724898eae1241
|
|
| BLAKE2b-256 |
5f19730b5c256344d3a1b77c245c511afaec478e8ca9a448397e2b7fc13d95d0
|
File details
Details for the file benchmarkdpfair-0.2.3-py3-none-any.whl.
File metadata
- Download URL: benchmarkdpfair-0.2.3-py3-none-any.whl
- Upload date:
- Size: 27.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac30921c02f086aa876689142c3eeb787aa6312c59a3500315e24d9223e727fe
|
|
| MD5 |
e565c7a427f103db25ca7191bcdc4a26
|
|
| BLAKE2b-256 |
defe9bcba03639da5463fa6277b57053c6d6c4ccb6f0f8c9950658745d454abf
|