Skip to main content

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

Project description

LangFair: Use-Case Level LLM Bias and Fairness Assessments

Build Status Documentation Status PyPI version Ruff

LangFair is a comprehensive Python library designed for conducting bias and fairness assessments of large language model (LLM) use cases. This repository includes a comprehensive framework for choosing bias and fairness metrics, along with demo notebooks and a technical playbook that discusses LLM bias and fairness risks, evaluation metrics, and best practices.

Explore our documentation site for detailed instructions on using LangFair.

🚀 Why Choose LangFair?

Static benchmark assessments, which are typically assumed to be sufficiently representative, often fall short in capturing the risks associated with all possible use cases of LLMs. These models are increasingly used in various applications, including recommendation systems, classification, text generation, and summarization. However, evaluating these models without considering use-case-specific prompts can lead to misleading assessments of their performance, especially regarding bias and fairness risks.

LangFair addresses this gap by adopting a Bring Your Own Prompts (BYOP) approach, allowing users to tailor bias and fairness evaluations to their specific use cases. This ensures that the metrics computed reflect the true performance of the LLMs in real-world scenarios, where prompt-specific risks are critical. Additionally, LangFair's focus is on output-based metrics that are practical for governance audits and real-world testing, without needing access to internal model states.

⚡ Quickstart Guide

(Optional) Create a virtual environment for using LangFair

We recommend creating a new virtual environment using venv before installing LangFair. To do so, please follow instructions here.

Installing LangFair

The latest version can be installed from PyPI:

pip install langfair

Usage Example

Below is a sample of code illustrating how to use LangFair's AutoEval class for text generation and summarization use cases. The below example assumes the user has already defined parameters DEPLOYMENT_NAME, API_KEY, API_BASE, API_TYPE, API_VERSION, and a list of prompts from their use case prompts.

Create langchain LLM object.

from langchain_openai import AzureChatOpenAI
# import torch # uncomment if GPU is available
# device = torch.device("cuda") # uncomment if GPU is available

llm = AzureChatOpenAI(
    deployment_name=DEPLOYMENT_NAME,
    openai_api_key=API_KEY,
    azure_endpoint=API_BASE,
    openai_api_type=API_TYPE,
    openai_api_version=API_VERSION,
    temperature=0.4 # User to set temperature
)

Run the AutoEval method for automated bias / fairness evaluation

from langfair.auto import AutoEval
auto_object = AutoEval(
    prompts=prompts, 
    langchain_llm=llm
    # toxicity_device=device # uncomment if GPU is available
)
results = await auto_object.evaluate()

Print the results and export to .txt file.

auto_object.export_results(file_name="metric_values.txt")
auto_object.print_results()

📚 Example Notebooks

Explore the following demo notebooks to see how to use LangFair for various bias and fairness evaluation metrics:

🛠 Choosing Bias and Fairness Metrics for an LLM Use Case

Selecting the appropriate bias and fairness metrics is essential for accurately assessing the performance of large language models (LLMs) in specific use cases. Instead of attempting to compute all possible metrics, practitioners should focus on a relevant subset that aligns with their specific goals and the context of their application.

Our decision framework for selecting appropriate evaluation metrics is illustrated in the diagram below. For more details, refer to our technical playbook.

Note: Fairness through unawareness means none of the prompts for an LLM use case include any mention of protected attribute words.

📊 Supported Bias and Fairness Metrics

Bias and fairness metrics offered by LangFair are grouped into several categories. The full suite of metrics is displayed below.

Toxicity Metrics
Counterfactual Fairness Metrics
Stereotype Metrics
Recommendation (Counterfactual) Fairness Metrics
Classification Fairness Metrics

📖 Associated Research

A technical description of LangFair's evaluation metrics and a practitioner's guide for selecting evaluation metrics is contained in this paper. If you use our framework for selecting evaluation metrics, we would appreciate citations to the following paper:

@misc{bouchard2024actionableframeworkassessingbias,
      title={An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use Cases}, 
      author={Dylan Bouchard},
      year={2024},
      eprint={2407.10853},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.10853}, 
}

📄 Code Documentation

Please refer to our documentation site for more details on how to use LangFair.

🤝 Development Team

The open-source version of LangFair is the culmination of extensive work carried out by a dedicated team of developers. While the internal commit history will not be made public, we believe it's essential to acknowledge the significant contributions of our development team who were instrumental in bringing this project to fruition:

🤗 Contributing

Contributions are welcome. Please refer here for instructions on how to contribute to LangFair.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langfair-0.1.2a0.tar.gz (50.6 kB view details)

Uploaded Source

Built Distribution

langfair-0.1.2a0-py3-none-any.whl (89.1 kB view details)

Uploaded Python 3

File details

Details for the file langfair-0.1.2a0.tar.gz.

File metadata

  • Download URL: langfair-0.1.2a0.tar.gz
  • Upload date:
  • Size: 50.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.10.15 Linux/5.10.0-33-cloud-amd64

File hashes

Hashes for langfair-0.1.2a0.tar.gz
Algorithm Hash digest
SHA256 fe411f37137e7ff7ad6ae587a1fc952ebb6f40d2d0011403fbb25e6501a8644f
MD5 c03ef23938aee5945b2f893a7eabd468
BLAKE2b-256 037e5cdf1533c1babd9a2dafd27a5afa08fa2d571541db50b956611112343da7

See more details on using hashes here.

File details

Details for the file langfair-0.1.2a0-py3-none-any.whl.

File metadata

  • Download URL: langfair-0.1.2a0-py3-none-any.whl
  • Upload date:
  • Size: 89.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.10.15 Linux/5.10.0-33-cloud-amd64

File hashes

Hashes for langfair-0.1.2a0-py3-none-any.whl
Algorithm Hash digest
SHA256 372f5afc873b176a228c8ade78bc4d82fb647125bf24d920a5e5380c1849cfcc
MD5 2bf58a8c0a2076e43b3e3019a37a2bac
BLAKE2b-256 ad6e9da1c4dde29671dcfd69efd24a04d0f87879285bddad7eea4608776f85d6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page