Skip to main content

No project description provided

Project description

Unfair Data Generator

Repository size License GitHub commit activity Percentage of issues still open Average time to resolve an issue GitHub contributors

📋 About📦 Installation🚀 Usage⚖️ Supported Equality Types📜 License

📋 About

Unfair Data Generator is a Python library designed for generating biased classification datasets with intentional unfairness patterns. This tool extends scikit-learn's make_classification function to include sensitive group information and fairness constraints, allowing users to create controlled datasets with specific bias patterns for testing and developing fairness algorithms. ⚖️🧪

Unfair Data Generator supports various fairness criteria violations and provides comprehensive tools for visualization and evaluation, making it an essential tool for fairness research and education. 💡

✨ Features

  • Biased Dataset Generation: Create classification datasets with intentional bias across sensitive groups. 🗃️
  • Fairness Evaluation: Built-in tools for evaluating model fairness across different groups. ⚖️
  • Visualization: Visualization capabilities for understanding bias patterns and fairness metrics. 📈
  • Flexible Configuration: Support for various equality types (demographic parity, equal opportunity, equal opportunity, equalized odds). ⚙️
  • Leaky Features: Generate features that leak sensitive information to simulate real-world bias. 🔓
  • Multiple Groups: Support for 2-5 sensitive groups with intuitive weather-based naming. 🌦️
  • Scikit-learn Compatible: Extends familiar scikit-learn patterns and interfaces. 🎯

📦 Installation

🚀 Usage

The following example demonstrates how to generate a biased dataset and evaluate fairness using unfair-data-generator. More examples can be found in the examples directory.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

from unfair_data_generator.unfair_classification import make_unfair_classification
from unfair_data_generator.util.helpers import get_params_for_certain_equality_type
from unfair_data_generator.util.model_trainer import train_and_evaluate_model_with_classifier
from unfair_data_generator.util.visualizer import (
    visualize_TPR_FPR_metrics, 
    visualize_accuracy, 
    visualize_groups_separately
)

# Configure dataset parameters
fairness_type = "Demographic parity"
n_sensitive_groups = 3

# Generate group-specific parameters for fairness violation
group_params = get_params_for_certain_equality_type(fairness_type, n_sensitive_groups)

# Generate biased dataset
X, y, Z, centroids = make_unfair_classification(
    n_samples=5000,
    n_features=10,
    n_informative=3,
    n_leaky=2,
    random_state=42
    group_params=group_params,
    return_sensitive_group_centroids=True,
)

# Visualize group-specific patterns
visualize_groups_separately(X, y, Z)
visualize_group_classes(X, y, Z, centroids)

# Train model and evaluate fairness
metrics = train_and_evaluate_model_with_classifier(X, y, Z)

# Visualize fairness metrics
title = f"{fairness_type} with {n_sensitive_groups} sensitive groups"
visualize_TPR_FPR_metrics(metrics, title)
visualize_accuracy(metrics, title)

⚖️ Supported Equality Types

The library supports generating datasets that systematically violate specific fairness criteria. Each type creates different bias patterns:

  • Equal quality
    Different classification performance across groups.
  • Demographic parity
    Unequal positive prediction rates across groups.
  • Equal opportunity
    Unequal true positive rates across groups.
  • Equalized odds
    Unequal true positive and false positive rates across groups.

📜 License

This package is distributed under the MIT License. This license can be found online at http://www.opensource.org/licenses/MIT.

Disclaimer

This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unfair_data_generator-0.1.0.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unfair_data_generator-0.1.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file unfair_data_generator-0.1.0.tar.gz.

File metadata

  • Download URL: unfair_data_generator-0.1.0.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.2 Windows/10

File hashes

Hashes for unfair_data_generator-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0205317c16af366ce624aedbc62a3e2bb3fb8f4ee8c0956bef4a634c4af9051a
MD5 0853ad6edd2f222ecbfdaec913542bdf
BLAKE2b-256 34558cbc93ee54f6eeb51539c3421b4f850be308233d68384f59dedae4753c14

See more details on using hashes here.

File details

Details for the file unfair_data_generator-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for unfair_data_generator-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ae89b0cf9eb904145e27bf8365e852eb6c44931cdce24bc074d7748e370d8eb
MD5 4cd8c886b8022e582c36e285f4de7c86
BLAKE2b-256 c35a029d1d0546c0fc0432cc23c85aaeab26770116a6f1c6d4e6a7c997dfcdc0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page