Lightweight ML bias detection toolkit for building fairer AI systems
Project description
FairLens
A lightweight toolkit for detecting bias in ML models and datasets.
What is this?
FairLens started as a side project after I got frustrated with how complicated existing fairness tools are. I wanted something where you could just point it at a dataset or model and get a quick sense of whether there might be bias issues worth investigating.
It's not trying to replace comprehensive tools like AIF360 or Fairlearn - those are great if you need the full research toolkit. This is more for the "let me quickly check this before I ship it" use case.
Installation
pip install fairlens-kit
For visualization support:
pip install fairlens-kit[viz]
Basic Usage
Dataset Analysis
import fairlens as fl
import pandas as pd
df = pd.read_csv("your_data.csv")
# Check for potential bias
report = fl.check_dataset(
df,
target='outcome',
protected=['gender', 'race']
)
print(report)
This gives you a breakdown of label rates across groups, flags large disparities, and checks for potential proxy variables.
Model Auditing
import fairlens as fl
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Audit the model
result = fl.audit_model(
model,
X_test,
y_test,
protected=test_data['gender']
)
print(result)
Output looks something like:
============================================================
FAIRNESS AUDIT REPORT - UNFAIR
============================================================
Model: Model
Protected Attribute: gender
Groups: Female, Male
GROUP FAIRNESS METRICS
----------------------------------------
Demographic Parity Ratio: 0.672 (threshold: >=0.8)
Equalized Odds Ratio: 0.734 (threshold: >=0.8)
ISSUES DETECTED
----------------------------------------
- Demographic parity ratio (0.672) below threshold (0.8)
- 'Female' receives positive predictions 32.8% less often than 'Male'
RECOMMENDATIONS
----------------------------------------
- Consider rebalancing training data or using threshold adjustment
Visualization
import fairlens as fl
fl.plot_bias(df, target='hired', protected='gender')
Built-in Datasets
The library includes some common fairness benchmark datasets so you can test things out:
import fairlens as fl
adult = fl.datasets.load_adult() # Income prediction
compas = fl.datasets.load_compas() # Recidivism (the ProPublica one)
credit = fl.datasets.load_german_credit()
bank = fl.datasets.load_bank_marketing()
These are synthetic versions for quick offline testing. If you want the real data:
adult = fl.fetch_adult() # Real UCI Adult from OpenML (48k rows)
compas = fl.fetch_compas() # Real ProPublica COMPAS (7k rows)
credit = fl.fetch_german_credit() # Real German Credit from OpenML (1k rows)
Fetchers download and cache locally in ~/.fairlens/datasets/. If the network is unavailable, they fall back to the synthetic versions automatically.
Metrics
Group Fairness
from fairlens.metrics import (
demographic_parity_ratio,
demographic_parity_difference,
equalized_odds_ratio,
equalized_odds_difference,
)
# Demographic parity - are positive prediction rates similar across groups?
dpr = demographic_parity_ratio(y_pred, protected)
# Equalized odds - are TPR and FPR similar across groups?
eor = equalized_odds_ratio(y_true, y_pred, protected)
Calibration
from fairlens.metrics import expected_calibration_error, brier_score
ece = expected_calibration_error(y_true, y_prob)
Individual Fairness
from fairlens.metrics import consistency_score
# Do similar individuals get similar predictions?
score = consistency_score(X, y_pred, n_neighbors=5)
Intersectional Fairness
Single-attribute analysis can miss disparities. Checking gender and race separately might look fine, but "Black women" as a group could be getting significantly worse predictions:
from fairlens import compute_intersectional_metrics
report = compute_intersectional_metrics(
y_true, y_pred,
{'gender': gender_arr, 'race': race_arr}
)
print(report)
# Shows metrics for all cross-groups (M_White, F_Black, etc.)
# Plus per-attribute DP ratios for comparison
Bootstrap Confidence Intervals
Point estimates of fairness metrics can be misleading on small datasets. Wrap any metric with bootstrap resampling to get a confidence interval:
from fairlens import bootstrap_metric, demographic_parity_ratio
ci = bootstrap_metric(
demographic_parity_ratio,
y_pred, protected,
n_bootstrap=1000,
random_state=42,
)
print(f"DP Ratio: {ci.estimate:.3f}, 95% CI: [{ci.lower:.3f}, {ci.upper:.3f}]")
print(f"Statistically unfair: {ci.upper < 0.8}")
Multi-class Fairness
For classification beyond binary (e.g., job recommendation with multiple roles), fairness is computed per class via one-vs-rest decomposition:
from fairlens import compute_multiclass_fairness
report = compute_multiclass_fairness(y_true, y_pred, protected)
print(report.worst_class) # Which class has the worst DP ratio
print(report.macro_avg_dp_ratio) # Average across all classes
Fairness Thresholds
The commonly used thresholds (following the "80% rule" from disparate impact law):
| Metric | Threshold | What it means |
|---|---|---|
| Demographic Parity Ratio | >= 0.8 | Positive rates within 20% of each other |
| Equalized Odds Ratio | >= 0.8 | TPR/FPR ratios within 20% |
| Demographic Parity Diff | <= 0.1 | Absolute difference in rates < 10% |
These aren't magic numbers - they're starting points. What counts as "fair enough" depends heavily on context.
Report Generation
from fairlens.audit import generate_html_report, generate_markdown_report
result = fl.audit_model(model, X_test, y_test, protected)
generate_html_report(result, "fairness_report.html")
generate_markdown_report(result, "fairness_report.md")
Bias Mitigation
Threshold Optimizer (post-processing)
Finds group-specific classification thresholds to equalize positive prediction rates:
from fairlens import ThresholdOptimizer
opt = ThresholdOptimizer(objective='demographic_parity')
opt.fit(y_true, y_prob, protected)
fair_preds = opt.predict(y_prob, protected)
print(opt.get_results())
# Shows per-group thresholds and DP ratio improvement
Reweighter (pre-processing)
Computes sample weights so the weighted label distribution is independent of the protected attribute. Use these weights when retraining:
from fairlens import Reweighter
rw = Reweighter()
weights = rw.fit_transform(y_train, protected_train)
model.fit(X_train, y_train, sample_weight=weights)
Mitigation Suggestions
The library can also suggest strategies based on what issues it finds:
from fairlens.mitigation import print_suggestions
print_suggestions(result.fairness_issues, include_code=True)
Comparison with Other Tools
| Tool | Good for | Less good for |
|---|---|---|
| AIF360 | Comprehensive research, many algorithms | Quick checks, simple use cases |
| Fairlearn | Integration with sklearn | Non-Microsoft ecosystems |
| What-If Tool | Visual exploration | Non-TensorFlow models |
| FairLens | Quick audits, simple API, built-in mitigation | Deep research, large-scale production pipelines |
If you need cutting-edge research algorithms or large-scale production fairness pipelines, AIF360 or Fairlearn are probably better choices. FairLens is more about making fairness checks and basic mitigation accessible without a steep learning curve.
Limitations
- Individual fairness metrics are computationally expensive on large datasets
- Mitigation algorithms (threshold optimizer, reweighter) cover common cases but aren't as extensive as AIF360
- Bootstrap confidence intervals add computation time proportional to
n_bootstrap - The built-in synthetic datasets are approximations; use
fetch_*for real data when possible
References
Papers that informed this:
- Hardt et al. 2016 - "Equality of Opportunity in Supervised Learning"
- Barocas, Hardt, Narayanan - "Fairness and Machine Learning" (free online textbook, highly recommend)
- The ProPublica COMPAS investigation (2016)
Related tools:
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fairlens_kit-0.1.0.tar.gz.
File metadata
- Download URL: fairlens_kit-0.1.0.tar.gz
- Upload date:
- Size: 53.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7849766343e5d0bf5cd7015b1e6ded1bbdff2c44f3dfab2ba9145acbdbe3c8b6
|
|
| MD5 |
544c6287e2ecc4535f476df147817ca5
|
|
| BLAKE2b-256 |
3152a4150f17b391163aa0a807454b3d9a5f52e3d95a3d4304b1aaf63168dd1c
|
File details
Details for the file fairlens_kit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fairlens_kit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 50.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
165d80b20a5572641d70816ef6eece4b50ad0ef85b75de652a570ecc0b07e75c
|
|
| MD5 |
69bf535919798fa1ca62d90761d21f31
|
|
| BLAKE2b-256 |
7f8a298b0f9682b4badd18374e18d2082f84a59a9ba3f57963d98700a3c3b541
|