A comprehensive set of fairness metrics, datasets and algorithms to address and mitigate bias in Natural Language Processing.
Project description
Fair Language Processing (FairLangProc)
The Fair Language Processing package is a extensible open-source Python library containing techniques developed by the research community to help detect and mitigate bias in Natural Language Processing throughout the AI application lifecycle.
The FairLangProc package includes:
- Data sets to test for biases in NLP models.
- Metrics based on different philosophies to quantified said biases.
- Algorithms to mitigate biases.
It has been created with the intention of encouraging the use of bias mitigation strategies in the NLP community, and with the hope of democratizing these tools for the ever-increasing set of NLP practitioners. We invite you to use it and improve it.
- Companion paper: https://arxiv.org/abs/2508.03677.
- Source code: https://github.com/arturo-perez-peralta/FairLangProc/tree/main/FairLangProc.
- Notebooks with examples: https://github.com/arturo-perez-peralta/FairLangProc/tree/main/notebooks.
- Documentation: https://fairlangproc.readthedocs.io/en/latest/.
We have developed the package with extensibility in mind. This library is still in development. We encourage your contributions.
Supported fairness datasets
| Data Set | Size | Reference |
|---|---|---|
| BBQ | 58,492 | Parrish et al., 2021 |
| BEC-Pro | 5,400 | Bartl et al., 2020 |
| BOLD | 23,679 | Dhamala et al., 2021 |
| BUG | 108,419 | Levy et al., 2021 |
| Crow-SPairs | 1,508 | Nangia et al., 2020 |
| GAP | 8,908 | Webster et al., 2018 |
| HolisticBias | 460,000 | Smith et al., 2022 |
| HONEST | 420 | Nozza et al., 2021 |
| StereoSet | 16,995 | Nadeem et al., 2020 |
| UnQover | 30 | Li et al., 2020 |
| WinoBias+ | 1,367 | Vanmassenhove et al., 2021 |
| WinoBias | 3,160 | Zhao et al., 2018 |
| WinoGender | 720 | Rudinger et al., 2018 |
Supported fairness metrics
- Generalized association tests (WEAT) (Caliskan et al., 2016)
- Log Probability Bias Score (LPBS) (Kurita et al., 2019)
- Categorical Bias Score (CBS) (Ahn et al., 2021)
- CrowS-Pairs Score (CPS) (Nangia et al., 2020)
- All Unmasked Score (AUL) (Kaneko et al., 2021)
- Demographic Representation (DR) (Liang et al., 2022)
- Stereotypical Association (SA) (Liang et al., 2022)
- HONEST (Nozza et al., 2021)
Supported bias mitigation algorithms
- Counterfactual Data Augmentation (CDA) (Webster et al. 2020)
- Projection based debiasing (Bolukbasi et al., 2023)
- Bias removaL wIth No Demographics (BLIND) (Orgad et al., 2023)
- Adapter-based DEbiasing of LanguagE models (Lauscher et al., 2021)
- Modular Debiasing with Diff Subnetworks (Hauzenberger et al., 2023)
- Entropy Attention Temperature (EAT) scaling (Zayed et al., 2023)
- Entropy Attention Regularizer (EAR) (Attanasio et al., 2022)
- Embedding based regularizer (Liu et al., 2020)
- Selective unfreezing (Gira et al., 2024)
Setup
Python
To install the latest stable version from PyPI, run:
pip install FairLangProc
Has been tested and ran with both Python 3.13 and Python 3.10. Compatibility with older versions is possible and expected, although we are still testing older configurations. The minimum tested versions of the requirements are:
- pandas>=2.2.3
- scikit-learn>=1.6.1
- torch>=2.6.0
- transformers>=4.47.1
- datasets>=3.4.1
- adapter-transformers>=1.1.0
- accelerate>=0.26.0
- pytest>=8.4.1
Manual installation
Clone the latest version of this repository:
git clone https://github.com/arturo-perez-peralta/FairLangProc
Using FairLangProc
The notebooks directory contains a diverse collection of jupyter notebooks that showcase how to use the different processors, metrics and data sets. If you'd like to run the examples requiring , download the data sets now and place them in a folder named Fair-LLM-Benchmarks inside the 'FairLangProc/datsets' path or simply clone the repository from Gallegos et al.
Using the BiasDataLoader
In order to run the BiasDataLoader first you need to download the datasets from the repository from I. Gallegos https://github.com/i-gallegos/Fair-LLM-Benchmark. In order to do this you first need to find
the path of the package. This can be done with the following Python command:
python -c "import FairLangProc; print(FairLangProc.__file__)"
Now you only need to download the datasets:
git clone https://github.com/i-gallegos/Fair-LLM-Benchmark [absolute path to your Python packages folder]/FairLangProc/datasets
Credits
For attribution in academic contexts, please use the bibtex entry below:
@misc{pérezperalta2025fairlangprocpythonpackagefairness,
title={FairLangProc: A Python package for fairness in NLP},
author={Arturo Pérez-Peralta and Sandra Benítez-Peña and Rosa E. Lillo},
year={2025},
eprint={2508.03677},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.03677},
}
We thank Víctor Agulló for his inputs on many different questions that arose during the making of the package as well as for his contributions on different parts of the code, specially those related to the BiasDataLoader method.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fairlangproc-0.1.4.tar.gz.
File metadata
- Download URL: fairlangproc-0.1.4.tar.gz
- Upload date:
- Size: 75.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbc39103e320205ea303267cb856a97771fc43a42e8111fd31e511ec441c88f2
|
|
| MD5 |
cb719a24e815be9593a0e5b4f6123bd4
|
|
| BLAKE2b-256 |
d6687469a8dab00db788b7c735220000d946b40eeafb3eeba3a5ff6e3d1d1f70
|
File details
Details for the file fairlangproc-0.1.4-py3-none-any.whl.
File metadata
- Download URL: fairlangproc-0.1.4-py3-none-any.whl
- Upload date:
- Size: 91.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
479b396e145e1a220acecc05b314ef652d1a331fbd7f16f5695b13e03454848f
|
|
| MD5 |
039688025372d945ab3d1f6997cfd329
|
|
| BLAKE2b-256 |
cedb675a867f84350b131c954669c50a2cbacafd7d8706d52d18cb70df9d020e
|