Skip to main content

Fairness-aware tools with a scikit-learn compatible API

Project description

scikit-fair

Fairness-aware preprocessing for machine learning with a scikit-learn compatible API.

scikit-fair (skfair) is a Python library providing a suite of fairness preprocessing algorithms for binary classification. It integrates seamlessly with scikit-learn pipelines and follows the imbalanced-learn API for sampling methods, so it slots into any existing sklearn workflow without friction.

This is a current implementation which will be expanded in the near future.


Installation from source

git clone https://github.com/jmcfig/scikit-fair.git
cd scikit-fair
pip install -e .

Requirements: Python ≥ 3.9, numpy ≥ 1.22, pandas ≥ 1.5, scikit-learn ≥ 1.3, imbalanced-learn ≥ 0.12, cvxpy ≥ 1.3.


Quick start

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from skfair.datasets import load_adult
from skfair.preprocessing import Massaging
from skfair.metrics import accuracy, disparate_impact, statistical_parity_difference

# 1. Load data
X, y = load_adult(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Baseline — no fairness preprocessing
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

sens = X_test["sex"].values
print(f"Baseline — Accuracy: {accuracy(y_test.values, y_pred):.3f}  "
      f"DI: {disparate_impact(y_test.values, y_pred, sens):.3f}  "
      f"SPD: {statistical_parity_difference(y_test.values, y_pred, sens):.3f}")

# 3. Apply Massaging to reduce label bias
sampler = Massaging(sens_attr="sex", priv_group=1)
X_fair, y_fair = sampler.fit_resample(X_train, y_train)

clf_fair = LogisticRegression(max_iter=1000)
clf_fair.fit(X_fair, y_fair)
y_pred_fair = clf_fair.predict(X_test)

print(f"Fair    — Accuracy: {accuracy(y_test.values, y_pred_fair):.3f}  "
      f"DI: {disparate_impact(y_test.values, y_pred_fair, sens):.3f}  "
      f"SPD: {statistical_parity_difference(y_test.values, y_pred_fair, sens):.3f}")

Algorithms

Class Family Reference
Reweighing Weighting Kamiran & Calders (2012)
FairBalance Weighting Yu et al. (2024)
ReweighingClassifier Meta-estimator
FairBalanceClassifier Meta-estimator
Massaging Label modification Kamiran & Calders (2012)
FairwayRemover Label modification Fairway (2019)
FairOversampling Oversampling Dablan et al.
FairSmote Oversampling Chakraborty et al. (2021)
FAWOS Oversampling Salazar et al. (2021)
HeterogeneousFOS Oversampling Sonoda et al. (2023)
GeometricFairnessRepair Feature transformation Feldman et al. (2015)
OptimizedPreprocessing Feature transformation Calmon et al. (2017)
LearningFairRepresentations Feature transformation Zemel et al. (2013)
FairMask Meta-estimator Peng et al. (2021)
IntersectionalBinarizer Utility
DropColumns Utility

Usage patterns

Each family of algorithms has its own API contract.

Samplers — fit_resample(X, y)

Label-modification and oversampling methods return a resampled dataset. They extend imblearn.BaseSampler and work directly inside an imblearn.Pipeline.

from skfair.preprocessing import FairSmote

sampler = FairSmote(sens_attr="sex", random_state=0)
X_resampled, y_resampled = sampler.fit_resample(X_train, y_train)

Weighting methods — fit_transform(X, y)

Reweighing and FairBalance return the original X unchanged alongside a weight Series. Pass the weights to your classifier via sample_weight.

from skfair.preprocessing import Reweighing

rw = Reweighing(sens_attr="sex", priv_group=1)
X_unchanged, weights = rw.fit_transform(X_train, y_train)

clf = LogisticRegression(max_iter=1000)
clf.fit(X_unchanged, y_train, sample_weight=weights)

Classifier wrappers — standard fit / predict

ReweighingClassifier and FairBalanceClassifier encapsulate the weighting step inside a full sklearn-compatible classifier, including sample_weight handling.

from skfair.preprocessing import ReweighingClassifier

clf = ReweighingClassifier(
    estimator=LogisticRegression(max_iter=1000),
    sens_attr="sex",
    priv_group=1,
)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

Feature transformers — fit_transform(X)

GeometricFairnessRepair, OptimizedPreprocessing, and LearningFairRepresentations transform X directly and slot into sklearn.Pipeline as standard transformers.

from skfair.preprocessing import GeometricFairnessRepair

repair = GeometricFairnessRepair(
    sensitive_attribute="sex",
    repair_columns=["age", "hours-per-week"],
    lambda_param=1.0,
)
X_repaired = repair.fit_transform(X_train)

example Pipeline

Combine preprocessing with downstream estimators, optionally using DropColumns to remove the sensitive attribute just before the classifier.

from imblearn.pipeline import Pipeline
from skfair.preprocessing import FairSmote, DropColumns

pipe = ImbPipeline([
    ("fair_smote", FairSmote(sens_attr="sex", random_state=42)),
    ("drop_sens", DropColumns("sex")), #optional
    ("classifier", LogisticRegression(solver="liblinear", max_iter=1000, random_state=42)),
])
pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)

Intersectional privilege

Define complex, multi-column privilege criteria with IntersectionalBinarizer.

from skfair.preprocessing import IntersectionalBinarizer

binarizer = IntersectionalBinarizer(
    privileged_definition={"race": "White", "sex": "Male"},
    group_col_name="_is_privileged",
)
X_with_group = binarizer.fit_transform(X_train)

Metrics

Five group-fairness metrics (and six performance metrics) share a unified signature: metric(y_true, y_pred, sensitive_attr).

Fairness metrics

Function Definition Perfect value
disparate_impact P(Ŷ=1|S=0) / P(Ŷ=1|S=1) 1.0
statistical_parity_difference P(Ŷ=1|S=0) − P(Ŷ=1|S=1) 0.0
equal_opportunity_difference TPR(S=0) − TPR(S=1) 0.0
average_odds_difference 0.5 × [(FPR diff) + (TPR diff)] 0.0
true_negative_rate_difference TNR(S=0) − TNR(S=1) 0.0

Performance metrics

accuracy, true_positive_rate, false_positive_rate, true_negative_rate, false_negative_rate, balanced_accuracy.

from skfair.metrics import (
    disparate_impact,
    statistical_parity_difference,
    equal_opportunity_difference,
    accuracy,
    balanced_accuracy,
)

sens = X_test["sex"].values
print(f"Accuracy:          {accuracy(y_test.values, y_pred):.3f}")
print(f"Balanced accuracy: {balanced_accuracy(y_test.values, y_pred):.3f}")
print(f"Disparate impact:  {disparate_impact(y_test.values, y_pred, sens):.3f}")
print(f"Stat. parity diff: {statistical_parity_difference(y_test.values, y_pred, sens):.3f}")
print(f"Equal opp. diff:   {equal_opportunity_difference(y_test.values, y_pred, sens):.3f}")

Datasets

Three standard fairness benchmarks are bundled so far.

Loader Samples Features Sensitive attribute Label
load_adult 48 842 14 sex (1 = male) income > 50k
load_german 1 000 20 sex credit risk
load_heart_disease 740 13 sex heart disease
from skfair.datasets import load_adult, load_german, load_heart_disease

X, y = load_adult(preprocessed=True) #load adult pipeline ready, with simple preprocessing
X, y = load_german()
X, y = load_heart_disease()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_fair-0.0.1.tar.gz (42.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scikit_fair-0.0.1-py3-none-any.whl (59.1 kB view details)

Uploaded Python 3

File details

Details for the file scikit_fair-0.0.1.tar.gz.

File metadata

  • Download URL: scikit_fair-0.0.1.tar.gz
  • Upload date:
  • Size: 42.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for scikit_fair-0.0.1.tar.gz
Algorithm Hash digest
SHA256 7b43d00026096605181729c1c3efbb118b6c7f1abffd777aa0da2fa84c4fd5eb
MD5 a107fc2a1ce86ff1db120c7e16fe88c8
BLAKE2b-256 80315794f01625a343b95eec28bdb1a0edf1a5a0cf1faa3d42f132822525b9a0

See more details on using hashes here.

File details

Details for the file scikit_fair-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: scikit_fair-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 59.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for scikit_fair-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fba94e7278eeca13bcb2551c5f3e6904401fec82e5691ef9c9c0d4a701f54389
MD5 d396f6e1cd67d1c93a64e80c985c3a28
BLAKE2b-256 31b77b0b6f9df499835a95d10521787693ff8191808f7e63288f2880971a3110

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page