GuardBench: A Large-Scale Benchmark for Guardrail Models

These details have not been verified by PyPI

Project description

GuardBench

🔥 News

[October 9, 2025] GuardBench now supports four additional datasets: JBB Behaviors, NicheHazardQA, HarmEval, and TechHazardQA. Also, it now allows for choosing the metrics to show at the end of the evaluation. Supported metrics are: precision (Precision), recall (Recall), f1 (F1), mcc (Matthews Correlation Coefficient), auprc (AUPRC), sensitivity (Sensitivity), specificity (Specificity), g_mean (G-Mean), fpr (False Positive Rate), fnr (False Negative Rate).

⚡️ Introduction

GuardBench is a Python library for the evaluation of guardrail models, i.e., LLMs fine-tuned to detect unsafe content in human-AI interactions. GuardBench provides a common interface to 40 evaluation datasets, which are downloaded and converted into a standardized format for improved usability. It also allows to quickly compare results and export LaTeX tables for scientific publications. GuardBench's benchmarking pipeline can also be leveraged on custom datasets.

GuardBench was featured in EMNLP 2024. The related paper is available here.

GuardBench has a public leaderboard available on HuggingFace.

You can find the list of supported datasets here. A few of them requires authorization. Please, read this.

If you use GuardBench to evaluate guardrail models for your scientific publications, please consider citing our work.

✨ Features

40 datasets for guardrail models evaluation.
Automated evaluation pipeline.
User-friendly.
Extendable.
Reproducible and sharable evaluation.
Exportable evaluation reports.

🔌 Requirements

python>=3.10

💾 Installation

pip install guardbench

💡 Usage

from guardbench import benchmark

def moderate(
    conversations: list[list[dict[str, str]]],  # MANDATORY!
    # additional `kwargs` as needed
) -> list[float]:
    # do moderation
    # return list of floats (unsafe probabilities)

benchmark(
    moderate=moderate,  # User-defined moderation function
    model_name="My Guardrail Model",
    batch_size=1,              # Default value
    datasets="all",            # Default value
    metrics=["f1", "recall"],  # Default value
    # Note: you can pass additional `kwargs` for `moderate`
)

📖 Examples

Follow our tutorial on benchmarking Llama Guard with GuardBench.
More examples are available in the scripts folder.

📚 Documentation

Browse the documentation for more details about:

The datasets and how to obtain them.
The data format used by GuardBench.
How to use the Report class to compare models and export results as LaTeX tables.
How to leverage GuardBench's benchmarking pipeline on custom datasets.

🏆 Leaderboard

You can find GuardBench's leaderboard here. If you want to submit your results, please contact us.

👨‍💻 Authors

Elias Bassani (European Commission - Joint Research Centre)

🎓 Citation

@inproceedings{guardbench,
    title = "{G}uard{B}ench: A Large-Scale Benchmark for Guardrail Models",
    author = "Bassani, Elias  and
      Sanchez, Ignacio",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.1022",
    doi = "10.18653/v1/2024.emnlp-main.1022",
    pages = "18393--18409",
}

🎁 Feature Requests

Would you like to see other features implemented? Please, open a feature request.

📄 License

GuardBench is provided as open-source software licensed under EUPL v1.2.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.1

Oct 9, 2025

1.0.0

Nov 12, 2024

0.0.1

Aug 2, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

guardbench-1.0.1.tar.gz (56.1 kB view details)

Uploaded Oct 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

guardbench-1.0.1-py3-none-any.whl (83.4 kB view details)

Uploaded Oct 9, 2025 Python 3

File details

Details for the file guardbench-1.0.1.tar.gz.

File metadata

Download URL: guardbench-1.0.1.tar.gz
Upload date: Oct 9, 2025
Size: 56.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for guardbench-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`9082f44b79e697de3d50e299da95921172e8a5de42a003cd68d23bcf9b226135`
MD5	`0b21657e03dbc01bfb69d06913217081`
BLAKE2b-256	`e51bf4e50f45840790b4136029bb12c05805755b8d1e5cb04b15622e4f4fb539`

See more details on using hashes here.

File details

Details for the file guardbench-1.0.1-py3-none-any.whl.

File metadata

Download URL: guardbench-1.0.1-py3-none-any.whl
Upload date: Oct 9, 2025
Size: 83.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for guardbench-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a701929ba6b8748e47c2a7b1eddd56b1bf6890fa670ac19673f1dbce66cd4046`
MD5	`32e2984f1ad936c1674810f8ff579161`
BLAKE2b-256	`e79bbeb6b98ee85c5741e4a632fe3f44573c13313d9edc1f3c813ccd8a7da8de`

See more details on using hashes here.

guardbench 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

GuardBench

🔥 News

⚡️ Introduction

✨ Features

🔌 Requirements

💾 Installation

💡 Usage

📖 Examples

📚 Documentation

🏆 Leaderboard

👨‍💻 Authors

🎓 Citation

🎁 Feature Requests

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes