GuardBench: A Large-Scale Benchmark for Guardrail Models
Project description
GuardBench
🔥 News
- [October 9, 2025] GuardBench now supports four additional datasets: JBB Behaviors, NicheHazardQA, HarmEval, and TechHazardQA. Also, it now allows for choosing the metrics to show at the end of the evaluation. Supported metrics are:
precision(Precision),recall(Recall),f1(F1),mcc(Matthews Correlation Coefficient),auprc(AUPRC),sensitivity(Sensitivity),specificity(Specificity),g_mean(G-Mean),fpr(False Positive Rate),fnr(False Negative Rate).
⚡️ Introduction
GuardBench is a Python library for the evaluation of guardrail models, i.e., LLMs fine-tuned to detect unsafe content in human-AI interactions.
GuardBench provides a common interface to 40 evaluation datasets, which are downloaded and converted into a standardized format for improved usability.
It also allows to quickly compare results and export LaTeX tables for scientific publications.
GuardBench's benchmarking pipeline can also be leveraged on custom datasets.
GuardBench was featured in EMNLP 2024.
The related paper is available here.
GuardBench has a public leaderboard available on HuggingFace.
You can find the list of supported datasets here. A few of them requires authorization. Please, read this.
If you use GuardBench to evaluate guardrail models for your scientific publications, please consider citing our work.
✨ Features
- 40 datasets for guardrail models evaluation.
- Automated evaluation pipeline.
- User-friendly.
- Extendable.
- Reproducible and sharable evaluation.
- Exportable evaluation reports.
🔌 Requirements
python>=3.10
💾 Installation
pip install guardbench
💡 Usage
from guardbench import benchmark
def moderate(
conversations: list[list[dict[str, str]]], # MANDATORY!
# additional `kwargs` as needed
) -> list[float]:
# do moderation
# return list of floats (unsafe probabilities)
benchmark(
moderate=moderate, # User-defined moderation function
model_name="My Guardrail Model",
batch_size=1, # Default value
datasets="all", # Default value
metrics=["f1", "recall"], # Default value
# Note: you can pass additional `kwargs` for `moderate`
)
📖 Examples
- Follow our tutorial on benchmarking
Llama GuardwithGuardBench. - More examples are available in the
scriptsfolder.
📚 Documentation
Browse the documentation for more details about:
- The datasets and how to obtain them.
- The data format used by
GuardBench. - How to use the
Reportclass to compare models and export results asLaTeXtables. - How to leverage
GuardBench's benchmarking pipeline on custom datasets.
🏆 Leaderboard
You can find GuardBench's leaderboard here. If you want to submit your results, please contact us.
👨💻 Authors
- Elias Bassani (European Commission - Joint Research Centre)
🎓 Citation
@inproceedings{guardbench,
title = "{G}uard{B}ench: A Large-Scale Benchmark for Guardrail Models",
author = "Bassani, Elias and
Sanchez, Ignacio",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.1022",
doi = "10.18653/v1/2024.emnlp-main.1022",
pages = "18393--18409",
}
🎁 Feature Requests
Would you like to see other features implemented? Please, open a feature request.
📄 License
GuardBench is provided as open-source software licensed under EUPL v1.2.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file guardbench-1.0.1.tar.gz.
File metadata
- Download URL: guardbench-1.0.1.tar.gz
- Upload date:
- Size: 56.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9082f44b79e697de3d50e299da95921172e8a5de42a003cd68d23bcf9b226135
|
|
| MD5 |
0b21657e03dbc01bfb69d06913217081
|
|
| BLAKE2b-256 |
e51bf4e50f45840790b4136029bb12c05805755b8d1e5cb04b15622e4f4fb539
|
File details
Details for the file guardbench-1.0.1-py3-none-any.whl.
File metadata
- Download URL: guardbench-1.0.1-py3-none-any.whl
- Upload date:
- Size: 83.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a701929ba6b8748e47c2a7b1eddd56b1bf6890fa670ac19673f1dbce66cd4046
|
|
| MD5 |
32e2984f1ad936c1674810f8ff579161
|
|
| BLAKE2b-256 |
e79bbeb6b98ee85c5741e4a632fe3f44573c13313d9edc1f3c813ccd8a7da8de
|