Measure bias from data and machine learning models.
Project description
Parity
Overview
This repository contains codes that demonstrate the use of fairness metrics, bias mitigations and explainability tool.
Installation
Install using:
foo@bar:~$ pip install parity-fairness
Bias Measurement Usage
Setup the data such that the target column is a binary string target. Then find out which features are the privileged categories
and which values are privileged values
. Afterwards, feed them into the function called show_bias
like:
from parity.fairness_metrics import show_bias
priv_category = 'Race-White'
priv_value = 'True'
target_label = 'high pay'
unencoded_target_label = 'True'
cols_to_drop = ''
show_bias(data, priv_category, priv_value, target_label, unencoded_target_label, cols_to_drop)
Bias and Fairness
A common problem with most machine learning models is bias from data. This notebook shows how to measure those biases and perform bias mitigation. A python package called aif360 can give us metrics and algorithms for bias measurement and mitigation
Metrics
- Statistical Parity Difference
- Equal Opportunity Difference
- Average Absolute Odds Difference
- Disparate Impact
- Theil Index
Statistical Parity Difference
This measure is based on the following formula :
Statistical imparity is the difference between the probability that a random individual drawn from unprivileged is labeled 1 (so here that he has more than 50K for income) and the probability that a random individual from privileged is labeled 1.
Fairer scores are close to 0.
More documentation here One definition of algorithmic fairness: statistical parity.
Equal Opportunity Difference
This metric is just a difference between the true positive rate of unprivileged group and the true positive rate of privileged group.
Fairer scores are close to 0.
Average Absolute Odds Difference
This measure is using both false positive rate and true positive rate to calculate the bias.
Fairer scores are close to 0.
Disparate Impact
For this metric we use the following formula :
Like the first metric we use both probabities of a random individual drawn from unprivileged or privileged with a label of 1 but here it's a ratio.
Better disparate impact should be closer to 1.
Theil Index
This measure is also known as the generalized entropy index but with $\alpha$ equals to 1. More information here Generalized Entropy Index).
Fairer scores are close to 0.
Some metrics need predictions while others just the original dataset. This is why we will use 2 classes of the aif360 package : ClassificationMetric
and BinaryLabelDatasetMetric
.
For metrics that require predictions:
- Equal Opportunity Difference:
equal_opportunity_difference()
- Average Absolute Odds Difference:
average_abs_odds_difference()
- Theil Index :
theil_index()
For metrics that don't require predictions:
- Statistical Parity Difference:
statistical_parity_difference()
- Disparate Impact:
disparate_impact()
Sample Display
The fairness metrics should display like this:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file parity-fairness-0.1.11.tar.gz
.
File metadata
- Download URL: parity-fairness-0.1.11.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.3.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3418551c7aaab324735efdced5fdbd0b3370fded2e4eee3d0fe46bb74845de7d |
|
MD5 | 6673385ca3dc4ee1812625a280fc938e |
|
BLAKE2b-256 | e54ef9a4854c8f1224d08b4c1e803b6121b4eb306036abfa18ded90a7ae22d69 |