Skip to main content

Lightweight ML bias detection toolkit for building fairer AI systems

Project description

FairLens

PyPI version Python 3.8+ License: MIT

A lightweight toolkit for detecting bias in ML models and datasets.

What is this?

FairLens started as a side project after I got frustrated with how complicated existing fairness tools are. I wanted something where you could just point it at a dataset or model and get a quick sense of whether there might be bias issues worth investigating.

It's not trying to replace comprehensive tools like AIF360 or Fairlearn - those are great if you need the full research toolkit. This is more for the "let me quickly check this before I ship it" use case.

Installation

pip install fairlens-kit

For visualization support:

pip install fairlens-kit[viz]

Basic Usage

Dataset Analysis

import fairlens as fl
import pandas as pd

df = pd.read_csv("your_data.csv")

# Check for potential bias
report = fl.check_dataset(
    df, 
    target='outcome', 
    protected=['gender', 'race']
)
print(report)

This gives you a breakdown of label rates across groups, flags large disparities, and checks for potential proxy variables.

Model Auditing

import fairlens as fl
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)

# Audit the model
result = fl.audit_model(
    model,
    X_test,
    y_test,
    protected=test_data['gender']
)
print(result)

Output looks something like:

============================================================
FAIRNESS AUDIT REPORT - UNFAIR
============================================================

Model: Model
Protected Attribute: gender
Groups: Female, Male

GROUP FAIRNESS METRICS
----------------------------------------
Demographic Parity Ratio: 0.672 (threshold: >=0.8)
Equalized Odds Ratio: 0.734 (threshold: >=0.8)

ISSUES DETECTED
----------------------------------------
  - Demographic parity ratio (0.672) below threshold (0.8)
  - 'Female' receives positive predictions 32.8% less often than 'Male'

RECOMMENDATIONS
----------------------------------------
  - Consider rebalancing training data or using threshold adjustment

Visualization

import fairlens as fl

fl.plot_bias(df, target='hired', protected='gender')

Built-in Datasets

The library includes some common fairness benchmark datasets so you can test things out:

import fairlens as fl

adult = fl.datasets.load_adult()       # Income prediction
compas = fl.datasets.load_compas()     # Recidivism (the ProPublica one)  
credit = fl.datasets.load_german_credit()
bank = fl.datasets.load_bank_marketing()

These are synthetic versions for quick offline testing. If you want the real data:

adult = fl.fetch_adult()          # Real UCI Adult from OpenML (48k rows)
compas = fl.fetch_compas()        # Real ProPublica COMPAS (7k rows)
credit = fl.fetch_german_credit() # Real German Credit from OpenML (1k rows)

Fetchers download and cache locally in ~/.fairlens/datasets/. If the network is unavailable, they fall back to the synthetic versions automatically.

Metrics

Group Fairness

from fairlens.metrics import (
    demographic_parity_ratio,
    demographic_parity_difference,
    equalized_odds_ratio,
    equalized_odds_difference,
)

# Demographic parity - are positive prediction rates similar across groups?
dpr = demographic_parity_ratio(y_pred, protected)

# Equalized odds - are TPR and FPR similar across groups?
eor = equalized_odds_ratio(y_true, y_pred, protected)

Calibration

from fairlens.metrics import expected_calibration_error, brier_score

ece = expected_calibration_error(y_true, y_prob)

Individual Fairness

from fairlens.metrics import consistency_score

# Do similar individuals get similar predictions?
score = consistency_score(X, y_pred, n_neighbors=5)

Intersectional Fairness

Single-attribute analysis can miss disparities. Checking gender and race separately might look fine, but "Black women" as a group could be getting significantly worse predictions:

from fairlens import compute_intersectional_metrics

report = compute_intersectional_metrics(
    y_true, y_pred,
    {'gender': gender_arr, 'race': race_arr}
)
print(report)
# Shows metrics for all cross-groups (M_White, F_Black, etc.)
# Plus per-attribute DP ratios for comparison

Bootstrap Confidence Intervals

Point estimates of fairness metrics can be misleading on small datasets. Wrap any metric with bootstrap resampling to get a confidence interval:

from fairlens import bootstrap_metric, demographic_parity_ratio

ci = bootstrap_metric(
    demographic_parity_ratio,
    y_pred, protected,
    n_bootstrap=1000,
    random_state=42,
)
print(f"DP Ratio: {ci.estimate:.3f}, 95% CI: [{ci.lower:.3f}, {ci.upper:.3f}]")
print(f"Statistically unfair: {ci.upper < 0.8}")

Multi-class Fairness

For classification beyond binary (e.g., job recommendation with multiple roles), fairness is computed per class via one-vs-rest decomposition:

from fairlens import compute_multiclass_fairness

report = compute_multiclass_fairness(y_true, y_pred, protected)
print(report.worst_class)       # Which class has the worst DP ratio
print(report.macro_avg_dp_ratio) # Average across all classes

Fairness Thresholds

The commonly used thresholds (following the "80% rule" from disparate impact law):

Metric Threshold What it means
Demographic Parity Ratio >= 0.8 Positive rates within 20% of each other
Equalized Odds Ratio >= 0.8 TPR/FPR ratios within 20%
Demographic Parity Diff <= 0.1 Absolute difference in rates < 10%

These aren't magic numbers - they're starting points. What counts as "fair enough" depends heavily on context.

Report Generation

from fairlens.audit import generate_html_report, generate_markdown_report

result = fl.audit_model(model, X_test, y_test, protected)

generate_html_report(result, "fairness_report.html")
generate_markdown_report(result, "fairness_report.md")

Bias Mitigation

Threshold Optimizer (post-processing)

Finds group-specific classification thresholds to equalize positive prediction rates:

from fairlens import ThresholdOptimizer

opt = ThresholdOptimizer(objective='demographic_parity')
opt.fit(y_true, y_prob, protected)
fair_preds = opt.predict(y_prob, protected)

print(opt.get_results())
# Shows per-group thresholds and DP ratio improvement

Reweighter (pre-processing)

Computes sample weights so the weighted label distribution is independent of the protected attribute. Use these weights when retraining:

from fairlens import Reweighter

rw = Reweighter()
weights = rw.fit_transform(y_train, protected_train)
model.fit(X_train, y_train, sample_weight=weights)

Mitigation Suggestions

The library can also suggest strategies based on what issues it finds:

from fairlens.mitigation import print_suggestions

print_suggestions(result.fairness_issues, include_code=True)

Comparison with Other Tools

Tool Good for Less good for
AIF360 Comprehensive research, many algorithms Quick checks, simple use cases
Fairlearn Integration with sklearn Non-Microsoft ecosystems
What-If Tool Visual exploration Non-TensorFlow models
FairLens Quick audits, simple API, built-in mitigation Deep research, large-scale production pipelines

If you need cutting-edge research algorithms or large-scale production fairness pipelines, AIF360 or Fairlearn are probably better choices. FairLens is more about making fairness checks and basic mitigation accessible without a steep learning curve.

Limitations

  • Individual fairness metrics are computationally expensive on large datasets
  • Mitigation algorithms (threshold optimizer, reweighter) cover common cases but aren't as extensive as AIF360
  • Bootstrap confidence intervals add computation time proportional to n_bootstrap
  • The built-in synthetic datasets are approximations; use fetch_* for real data when possible

References

Papers that informed this:

  • Hardt et al. 2016 - "Equality of Opportunity in Supervised Learning"
  • Barocas, Hardt, Narayanan - "Fairness and Machine Learning" (free online textbook, highly recommend)
  • The ProPublica COMPAS investigation (2016)

Related tools:

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fairlens_kit-0.1.0.tar.gz (53.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fairlens_kit-0.1.0-py3-none-any.whl (50.8 kB view details)

Uploaded Python 3

File details

Details for the file fairlens_kit-0.1.0.tar.gz.

File metadata

  • Download URL: fairlens_kit-0.1.0.tar.gz
  • Upload date:
  • Size: 53.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for fairlens_kit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7849766343e5d0bf5cd7015b1e6ded1bbdff2c44f3dfab2ba9145acbdbe3c8b6
MD5 544c6287e2ecc4535f476df147817ca5
BLAKE2b-256 3152a4150f17b391163aa0a807454b3d9a5f52e3d95a3d4304b1aaf63168dd1c

See more details on using hashes here.

File details

Details for the file fairlens_kit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fairlens_kit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for fairlens_kit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 165d80b20a5572641d70816ef6eece4b50ad0ef85b75de652a570ecc0b07e75c
MD5 69bf535919798fa1ca62d90761d21f31
BLAKE2b-256 7f8a298b0f9682b4badd18374e18d2082f84a59a9ba3f57963d98700a3c3b541

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page