A comprehensive set of fairness metrics, datasets and algorithms to address and mitigate bias in Natural Language Processing.
Project description
Fair Language Processing (FairLangProc)
The Fair Language Processing package is a extensible open-source Python library containing techniques developed by the research community to help detect and mitigate bias in Natural Language Processing throughout the AI application lifecycle.
The FairLangProc package includes:
- Data sets to test for biases in NLP models.
- Metrics based on different philosophies to quantified said biases.
- Algorithms to mitigate biases. It has been created with the intention of encouraging the use of bias mitigation strategies in the NLP community, and with the hope of democratizing these tools for the ever-increasing set of NLP practitioners. We invite you to use it and improve it.
The companion paper provides a comprehensive introduction to the concepts and capabilities, with all code available in notebooks.
We have developed the package with extensibility in mind. This library is still in development. We encourage your contributions.
Supported fairness datasets
| Data Set | Size | Reference |
|---|---|---|
| BBQ | 58,492 | Parrish et al., 2021 |
| BEC-Pro | 5,400 | Bartl et al., 2020 |
| BOLD | 23,679 | Dhamala et al., 2021 |
| BUG | 108,419 | Levy et al., 2021 |
| Crow-SPairs | 1,508 | Nangia et al., 2020 |
| GAP | 8,908 | Webster et al., 2018 |
| HolisticBias | 460,000 | Smith et al., 2022 |
| HONEST | 420 | Nozza et al., 2021 |
| StereoSet | 16,995 | Nadeem et al., 2020 |
| UnQover | 30 | Li et al., 2020 |
| WinoBias+ | 1,367 | Vanmassenhove et al., 2021 |
| WinoBias | 3,160 | Zhao et al., 2018 |
| WinoGender | 720 | Rudinger et al., 2018 |
Supported fairness metrics
- Generalized association tests (WEAT) (Caliskan et al., 2016)
- Log Probability Bias Score (LPBS) (Kurita et al., 2019)
- Categorical Bias Score (CBS) (Ahn et al., 2021)
- CrowS-Pairs Score (CPS) (Nangia et al., 2020)
- All Unmasked Score (AUL) (Kaneko et al., 2021)
- Demographic Representation (DR) (Liang et al., 2022)
- Stereotypical Association (SA) (Liang et al., 2022)
- HONEST (Nozza et al., 2021)
Supported bias mitigation algorithms
- Counterfactual Data Augmentation (CDA) (Webster et al. 2020)
- Projection based debiasing (Bolukbasi et al., 2023)
- Bias removaL wIth No Demographics (BLIND) (Orgad et al., 2023)
- Adapter-based DEbiasing of LanguagE models (Lauscher et al., 2021)
- Modular Debiasing with Diff Subnetworks (Hauzenberger et al., 2023)
- Entropy Attention Temperature (EAT) scaling (Zayed et al., 2023)
- Entropy Attention Regularizer (EAR) (Attanasio et al., 2022)
- Embedding based regularizer (Liu et al., 2020)
- Selective unfreezing (Gira et al., 2024)
Setup
Python
Has been tested and ran with Python 3.13. Compatibility with older versions is possible and expected, although no tests have been run to check the possible configurations.
To install the latest stable version from PyPI, run:
pip install FairLangProc
Manual installation
Clone the latest version of this repository:
git clone https://github.com/arturo-perez-peralta/FairLangProc
Using FairLangProc
The notebooks directory contains a diverse collection of jupyter notebooks that showcase how to use the different processors, metrics and data sets. If you'd like to run the examples requiring , download the data sets now and place them in a folder named Fair-LLM-Benchmarks inside the 'FairLangProc/datsets' path or simply clone the repository from Gallegos et al.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fairlangproc-0.1.0.tar.gz.
File metadata
- Download URL: fairlangproc-0.1.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52bac2d4c1d2e1b282cad9e9476d36e81fc18392960ce7c1fbef8d5bab78a369
|
|
| MD5 |
015fbb599e913d2edd97d0bed672c044
|
|
| BLAKE2b-256 |
6322c503db389256417ddfa0ec69f0274b3e6a0f2530620d757fe11229ec71be
|
File details
Details for the file fairlangproc-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fairlangproc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03d9db52aca377e5236db38ede378b37e366d7d6d5317ded50f4d13ef4991875
|
|
| MD5 |
365bf4a46e42e1f9bdb8b940d48a822a
|
|
| BLAKE2b-256 |
0785ff91df57b6716fd38d08f5ab79fcae781dbfba91eae59ec025e3b71c9fcc
|