No project description provided
Project description
Unfair Data Generator
📋 About • 📦 Installation • 🚀 Usage • ⚖️ Supported Equality Types • 📜 License
📋 About
Unfair Data Generator is a Python library designed for generating biased classification datasets with intentional unfairness patterns. This tool extends scikit-learn's make_classification function to include sensitive group information and fairness constraints, allowing users to create controlled datasets with specific bias patterns for testing and developing fairness algorithms. ⚖️🧪
Unfair Data Generator supports various fairness criteria violations and provides comprehensive tools for visualization and evaluation, making it an essential tool for fairness research and education. 💡
- Free software: MIT license
- Documentation: https://unfair-data-generator.readthedocs.io
- Python: 3.11, 3.12
- Operating systems: Windows, Ubuntu, macOS
✨ Features
- Biased Dataset Generation: Create classification datasets with intentional bias across sensitive groups. 🗃️
- Fairness Evaluation: Built-in tools for evaluating model fairness across different groups. ⚖️
- Visualization: Visualization capabilities for understanding bias patterns and fairness metrics. 📈
- Flexible Configuration: Support for various equality types (demographic parity, equal opportunity, equal opportunity, equalized odds). ⚙️
- Leaky Features: Generate features that leak sensitive information to simulate real-world bias. 🔓
- Multiple Groups: Support for 2-5 sensitive groups with intuitive weather-based naming. 🌦️
- Scikit-learn Compatible: Extends familiar scikit-learn patterns and interfaces. 🎯
📦 Installation
🚀 Usage
The following example demonstrates how to generate a biased dataset and evaluate fairness using unfair-data-generator. More examples can be found in the examples directory.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from unfair_data_generator.unfair_classification import make_unfair_classification
from unfair_data_generator.util.helpers import get_params_for_certain_equality_type
from unfair_data_generator.util.model_trainer import train_and_evaluate_model_with_classifier
from unfair_data_generator.util.visualizer import (
visualize_TPR_FPR_metrics,
visualize_accuracy,
visualize_groups_separately
)
# Configure dataset parameters
fairness_type = "Demographic parity"
n_sensitive_groups = 3
# Generate group-specific parameters for fairness violation
group_params = get_params_for_certain_equality_type(fairness_type, n_sensitive_groups)
# Generate biased dataset
X, y, Z, centroids = make_unfair_classification(
n_samples=5000,
n_features=10,
n_informative=3,
n_leaky=2,
random_state=42
group_params=group_params,
return_sensitive_group_centroids=True,
)
# Visualize group-specific patterns
visualize_groups_separately(X, y, Z)
visualize_group_classes(X, y, Z, centroids)
# Train model and evaluate fairness
metrics = train_and_evaluate_model_with_classifier(X, y, Z)
# Visualize fairness metrics
title = f"{fairness_type} with {n_sensitive_groups} sensitive groups"
visualize_TPR_FPR_metrics(metrics, title)
visualize_accuracy(metrics, title)
⚖️ Supported Equality Types
The library supports generating datasets that systematically violate specific fairness criteria. Each type creates different bias patterns:
- Equal quality
Different classification performance across groups. - Demographic parity
Unequal positive prediction rates across groups. - Equal opportunity
Unequal true positive rates across groups. - Equalized odds
Unequal true positive and false positive rates across groups.
📜 License
This package is distributed under the MIT License. This license can be found online at http://www.opensource.org/licenses/MIT.
Disclaimer
This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unfair_data_generator-0.1.0.tar.gz.
File metadata
- Download URL: unfair_data_generator-0.1.0.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.2 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0205317c16af366ce624aedbc62a3e2bb3fb8f4ee8c0956bef4a634c4af9051a
|
|
| MD5 |
0853ad6edd2f222ecbfdaec913542bdf
|
|
| BLAKE2b-256 |
34558cbc93ee54f6eeb51539c3421b4f850be308233d68384f59dedae4753c14
|
File details
Details for the file unfair_data_generator-0.1.0-py3-none-any.whl.
File metadata
- Download URL: unfair_data_generator-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.2 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ae89b0cf9eb904145e27bf8365e852eb6c44931cdce24bc074d7748e370d8eb
|
|
| MD5 |
4cd8c886b8022e582c36e285f4de7c86
|
|
| BLAKE2b-256 |
c35a029d1d0546c0fc0432cc23c85aaeab26770116a6f1c6d4e6a7c997dfcdc0
|