Skip to main content

A comprehensive set of fairness metrics, datasets and algorithms to address and mitigate bias in Natural Language Processing.

Project description

Fair Language Processing (FairLangProc)

License: MIT PyPI version Docs Python 3.10 Tests

The Fair Language Processing package is a extensible open-source Python library containing techniques developed by the research community to help detect and mitigate bias in Natural Language Processing throughout the AI application lifecycle.

The FairLangProc package includes:

  1. Data sets to test for biases in NLP models.
  2. Metrics based on different philosophies to quantified said biases.
  3. Algorithms to mitigate biases.

It has been created with the intention of encouraging the use of bias mitigation strategies in the NLP community, and with the hope of democratizing these tools for the ever-increasing set of NLP practitioners. We invite you to use it and improve it.

We have developed the package with extensibility in mind. This library is still in development. We encourage your contributions.

Supported fairness datasets

Data Set Size Reference
BBQ 58,492 Parrish et al., 2021
BEC-Pro 5,400 Bartl et al., 2020
BOLD 23,679 Dhamala et al., 2021
BUG 108,419 Levy et al., 2021
Crow-SPairs 1,508 Nangia et al., 2020
GAP 8,908 Webster et al., 2018
HolisticBias 460,000 Smith et al., 2022
HONEST 420 Nozza et al., 2021
StereoSet 16,995 Nadeem et al., 2020
UnQover 30 Li et al., 2020
WinoBias+ 1,367 Vanmassenhove et al., 2021
WinoBias 3,160 Zhao et al., 2018
WinoGender 720 Rudinger et al., 2018

Supported fairness metrics

Supported bias mitigation algorithms

Setup

Python

To install the latest stable version from PyPI, run:

pip install FairLangProc

Has been tested and ran with both Python 3.13 and Python 3.10. Compatibility with older versions is possible and expected, although we are still testing older configurations. The minimum tested versions of the requirements are:

  • pandas>=2.2.3
  • scikit-learn>=1.6.1
  • torch>=2.6.0
  • transformers>=4.47.1
  • datasets>=3.4.1
  • adapter-transformers>=1.1.0
  • accelerate>=0.26.0
  • pytest>=8.4.1

Manual installation

Clone the latest version of this repository:

git clone https://github.com/arturo-perez-peralta/FairLangProc

Using FairLangProc

The notebooks directory contains a diverse collection of jupyter notebooks that showcase how to use the different processors, metrics and data sets. If you'd like to run the examples requiring , download the data sets now and place them in a folder named Fair-LLM-Benchmarks inside the 'FairLangProc/datsets' path or simply clone the repository from Gallegos et al.

Using the BiasDataLoader

In order to run the BiasDataLoader first you need to download the datasets from the repository from I. Gallegos https://github.com/i-gallegos/Fair-LLM-Benchmark. In order to do this you first need to find the path of the package. This can be done with the following Python command:

python -c "import FairLangProc; print(FairLangProc.__file__)" 

Now you only need to download the datasets:

git clone https://github.com/i-gallegos/Fair-LLM-Benchmark [absolute path to your Python packages folder]/FairLangProc/datasets

Credits

For attribution in academic contexts, please use the bibtex entry below:

@misc{pérezperalta2025fairlangprocpythonpackagefairness,
      title={FairLangProc: A Python package for fairness in NLP}, 
      author={Arturo Pérez-Peralta and Sandra Benítez-Peña and Rosa E. Lillo},
      year={2025},
      eprint={2508.03677},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.03677}, 
}

We thank Víctor Agulló for his inputs on many different questions that arose during the making of the package as well as for his contributions on different parts of the code, specially those related to the BiasDataLoader method.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fairlangproc-0.1.5.tar.gz (42.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fairlangproc-0.1.5-py3-none-any.whl (91.2 kB view details)

Uploaded Python 3

File details

Details for the file fairlangproc-0.1.5.tar.gz.

File metadata

  • Download URL: fairlangproc-0.1.5.tar.gz
  • Upload date:
  • Size: 42.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for fairlangproc-0.1.5.tar.gz
Algorithm Hash digest
SHA256 9fc5c4a2964fd0e1a1a65e7d8d6d1790b15b688d48c4ff6da20460371d9dc4c1
MD5 0ca8f919f938f5a2955aa9be8a75ff2e
BLAKE2b-256 1de239fd46a6f0bee1c87eff52910f8f78095c0f786bb00e6869f45d5aeddab7

See more details on using hashes here.

File details

Details for the file fairlangproc-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: fairlangproc-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 91.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for fairlangproc-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 602de0889e22f0076b3042fd18ad310b2740ddc86671c69f83e86cf6fddd1a22
MD5 b0dc1d422826895270b4a089c3c006e9
BLAKE2b-256 ec764414b17150228c9df378753a890bfd8cc3bc54682365cf0eb5f25785f34a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page