Pluristic alignment evaluation benchmark for LLMs
Project description
PERSONA Bench
Reproducible Testbed for Evaluating and Improving Language Model Alignment with Diverse User Values
📄 Paper | 🗃️ Research Visualizations | 🤗 Hugging Face [Coming Soon] | 📚 Documentation
🌐 SynthLabs Research | 👥 Join the Team | 🤝 Let's Collaborate
PERSONA Bench is an extension of the PERSONA framework introduced in Castricato et al. 2024. It provides a reproducible testbed for evaluating and improving the alignment of language models with diverse user values.
Introduction
PERSONA established a strong correlation between human judges and language models in persona-based personalization tasks. Building on this foundation, we've developed a suite of robust evaluations to test a model's ability to perform personalization-related tasks. This repository provides practitioners with tools to assess and improve the pluralistic alignment of their language models.
Our evaluation suite uses inspect-ai to perform various assessments on persona-based tasks, offering insights into model performance across different demographic intersections, feature importance, and personalization capabilities.
Key Features
- 🎭 Main Evaluation: Assess personalized response generation
- 🧩 Leave One Out Analysis: Measure attribute impact on performance
- 🌐 Intersectionality: Evaluate model performance across different demographic intersections
- 🎯 Pass@K: Determine attempts needed for successful personalization
Quick Start
-
Install Poetry if you haven't already:
curl -sSL https://install.python-poetry.org | python3 -
-
Install the package:
poetry add persona-evaluation
-
Use in your Python script:
from persona_evaluation import evaluate_model results = evaluate_model("gpt-3.5-turbo", evaluation_type="main") print(results.summary())
Development Setup
-
Clone the repository:
git clone https://github.com/SynthLabs/PERSONA.git cd PERSONA
-
Install dependencies:
poetry install
-
Install pre-commit hooks:
poetry run pre-commit install
-
Set up HuggingFace authentication:
huggingface-cli login
-
Set up environment variables:
cp .env.example .env vim .env
Detailed Evaluations
Main Evaluation
The main evaluation script assesses a model's ability to generate personalized responses based on given personas from our custom filtered PRISM dataset.
Click to expand details
- Load PRISM dataset
- Generate utterances using target model with random personas
- Evaluate using GPT-4 as a critic model via a debate approach
- Analyze personalization effectiveness
Leave One Out Analysis
This evaluation measures the impact of individual attributes on personalization performance.
Click to expand details
- Uses sub-personas separated by LOO attributes
- Tests on multiple personas and PRISM questions
- Analyzes feature importance
Available attributes include age, sex, race, education, employment status, and many more. See example_loo_attributes.json
for the full list.
Intersectionality
Evaluate model performance across different demographic intersections.
Click to expand details
- Define intersections using JSON configuration
- Measure personalization across disjoint populations
- Analyze model performance for specific demographic combinations
Pass@K
Determines how many attempts are required to successfully personalize for a given persona.
Click to expand details
- Reruns main evaluation K times
- Counts attempts needed for successful personalization
- Provides insights into model consistency and reliability
Usage
Configure your .env
file before running the scripts. You can set the generate mode to one of the following:
baseline
: Generate an answer directly, not given the personaoutput_only
: Generate answer given the persona, without chain of thoughtchain_of_thought
: Generate chain of thought before answering, given the personademographic_summary
: Generate a summary of the persona before answering
# Activate the poetry environment
poetry shell
# Main Evaluation
inspect eval src/persona_evaluation/main_evaluation.py --model {model}
# Leave One Out Analysis
inspect eval src/persona_evaluation/main_loo.py --model {model}
# Intersectionality Evaluation
inspect eval src/persona_evaluation/main_intersectionality.py --model {model}
# Pass@K Evaluation
inspect eval src/persona_evaluation/main_pass_at_k.py --model {model}
Visualization
We provide scripts for visualizing evaluation results:
visualization_loo.py
: Leave One Out analysisvisualization_intersection.py
: Intersectionality evaluationvisualization_pass_at_k.py
: Pass@K evaluation
These scripts use the most recent log file by default. Use the --log
parameter to specify a different log file.
Dependencies
Key dependencies include:
- inspect-ai
- datasets
- pandas
- openai
- instructor
- seaborn
For development:
- tiktoken
- transformers
See pyproject.toml
for a complete list of dependencies.
Citation
If you use PERSONA in your research, please cite our paper:
@misc{castricato2024personareproducibletestbedpluralistic,
title={PERSONA: A Reproducible Testbed for Pluralistic Alignment},
author={Louis Castricato and Nathan Lile and Rafael Rafailov and Jan-Philipp Fränken and Chelsea Finn},
year={2024},
eprint={2407.17387},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.17387},
}
Community & Support
Join our Discord community for discussions, support, and updates or reach out to us at https://www.synthlabs.ai/contact.
Acknowledgements
This research is supported by SynthLabs. We thank our collaborators and the open-source community for their valuable contributions.
Copyright © 2024, SynthLabs. Released under the Apache License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file persona_bench-0.0.2.tar.gz
.
File metadata
- Download URL: persona_bench-0.0.2.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 835c1f14d8831c9a582f08ab537c74b296603b3e94830a7c63aa033d4a7df2ce |
|
MD5 | 52a77746ca821dbe683feaad88ef510d |
|
BLAKE2b-256 | cd0617452701ca57972e5fae31a6cb31fc16c3814b12f5a3c6335fca945a7a25 |
File details
Details for the file persona_bench-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: persona_bench-0.0.2-py3-none-any.whl
- Upload date:
- Size: 28.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be2aaf51e0a9520be1ef12d769521a4f5c8c6eb8fcad2fd4508b9c623c30252f |
|
MD5 | fb8e74bb9719046b25374a4b23108e07 |
|
BLAKE2b-256 | 1906eab6b6bc4aafc2dad938423e62c069c6a520a73a35aac37c8afe79b359e8 |