Skip to main content

Pluristic alignment evaluation benchmark for LLMs

Project description

PERSONA Bench

Reproducible Testbed for Evaluating and Improving Language Model Alignment with Diverse User Values

SynthLabs.ai/research/persona

PERSONA

GitHub Repository PyPI version
Documentation Contributor Guide License
arXiv SynthLabs Stanford AI Lab Discord Twitter Follow

📄 Paper | 🗃️ Research Visualizations | 🤗 Hugging Face [Coming Soon] | 📚 Documentation

🌐 SynthLabs Research | 👥 Join the Team | 🤝 Let's Collaborate

PERSONA Bench is an extension of the PERSONA framework introduced in Castricato et al. 2024. It provides a reproducible testbed for evaluating and improving the alignment of language models with diverse user values.

Introduction

PERSONA established a strong correlation between human judges and language models in persona-based personalization tasks. Building on this foundation, we've developed a suite of robust evaluations to test a model's ability to perform personalization-related tasks. This repository provides practitioners with tools to assess and improve the pluralistic alignment of their language models.

Our evaluation suite uses inspect-ai to perform various assessments on persona-based tasks, offering insights into model performance across different demographic intersections, feature importance, and personalization capabilities.

Key Features

  • 🎭 Main Evaluation: Assess personalized response generation
  • 🧩 Leave One Out Analysis: Measure attribute impact on performance
  • 🌐 Intersectionality: Evaluate model performance across different demographic intersections
  • 🎯 Pass@K: Determine attempts needed for successful personalization

Quick Start

  1. Install Poetry if you haven't already:

    curl -sSL https://install.python-poetry.org | python3 -
    
  2. Install the package:

    poetry add persona-bench
    
  3. Use in your Python script:

    from dotenv import load_dotenv
    from persona_bench import evaluate_model
    
    # optional, you can also pass the environment variables directly to evaluate_model
    load_dotenv()
    
    eval = evaluate_model("gpt-3.5-turbo", evaluation_type="main")
    print(eval.results.model_dump())
    

Development Setup

  1. Clone the repository:

    git clone https://github.com/SynthLabs/PERSONA.git
    cd PERSONA
    
  2. Install dependencies:

    poetry install
    
  3. Install pre-commit hooks:

    poetry run pre-commit install
    
  4. Set up HuggingFace authentication:

    huggingface-cli login
    
  5. Set up environment variables:

    cp .env.example .env
    vim .env
    

Detailed Evaluations

Main Evaluation

The main evaluation script assesses a model's ability to generate personalized responses based on given personas from our custom filtered PRISM dataset.

Click to expand details
  1. Load PRISM dataset
  2. Generate utterances using target model with random personas
  3. Evaluate using GPT-4 as a critic model via a debate approach
  4. Analyze personalization effectiveness

Leave One Out Analysis

This evaluation measures the impact of individual attributes on personalization performance.

Click to expand details
  • Uses sub-personas separated by LOO attributes
  • Tests on multiple personas and PRISM questions
  • Analyzes feature importance

Available attributes include age, sex, race, education, employment status, and many more. See example_LOO_JSON.json for the full list.

Intersectionality

Evaluate model performance across different demographic intersections.

Click to expand details
  • Define intersections using JSON configuration
  • Measure personalization across disjoint populations
  • Analyze model performance for specific demographic combinations

Pass@K

Determines how many attempts are required to successfully personalize for a given persona.

Click to expand details
  • Reruns main evaluation K times
  • Counts attempts needed for successful personalization
  • Provides insights into model consistency and reliability

Usage

Configure your .env file before running the scripts. You can set the generate mode to one of the following:

  • baseline: Generate an answer directly, not given the persona
  • output_only: Generate answer given the persona, without chain of thought
  • chain_of_thought: Generate chain of thought before answering, given the persona
  • demographic_summary: Generate a summary of the persona before answering
# Activate the poetry environment
poetry shell

# Main Evaluation
inspect eval src/persona_bench/main_evaluation.py --model {model}

# Leave One Out Analysis
inspect eval src/persona_bench/main_loo.py --model {model}

# Intersectionality Evaluation
inspect eval src/persona_bench/main_intersectionality.py --model {model}

# Pass@K Evaluation
inspect eval src/persona_bench/main_pass_at_k.py --model {model}

Visualization

We provide scripts for visualizing evaluation results:

  • visualization_loo.py: Leave One Out analysis
  • visualization_intersection.py: Intersectionality evaluation
  • visualization_pass_at_k.py: Pass@K evaluation

These scripts use the most recent log file by default. Use the --log parameter to specify a different log file.

Dependencies

Key dependencies include:

  • inspect-ai
  • datasets
  • pandas
  • openai
  • instructor
  • seaborn

For development:

  • tiktoken
  • transformers

See pyproject.toml for a complete list of dependencies.

Citation

If you use PERSONA in your research, please cite our paper:

@misc{castricato2024personareproducibletestbedpluralistic,
      title={PERSONA: A Reproducible Testbed for Pluralistic Alignment},
      author={Louis Castricato and Nathan Lile and Rafael Rafailov and Jan-Philipp Fränken and Chelsea Finn},
      year={2024},
      eprint={2407.17387},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.17387},
}

Community & Support

Join our Discord community for discussions, support, and updates or reach out to us at https://www.synthlabs.ai/contact.

Acknowledgements

This research is supported by SynthLabs. We thank our collaborators and the open-source community for their valuable contributions.


Copyright © 2024, SynthLabs. Released under the Apache License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persona_bench-0.0.7.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

persona_bench-0.0.7-py3-none-any.whl (37.2 kB view details)

Uploaded Python 3

File details

Details for the file persona_bench-0.0.7.tar.gz.

File metadata

  • Download URL: persona_bench-0.0.7.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.5.0-1025-azure

File hashes

Hashes for persona_bench-0.0.7.tar.gz
Algorithm Hash digest
SHA256 5a9bf5e76e5308047d9f9871e98f06a829407e4f4dadcfaad69f3161f4e18e15
MD5 27c859972f5196c80a7d6a6197579a22
BLAKE2b-256 54611f4838507e2c16d72c72dbe3e2354663c57982dfcd169355673153a3821d

See more details on using hashes here.

File details

Details for the file persona_bench-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: persona_bench-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 37.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/6.5.0-1025-azure

File hashes

Hashes for persona_bench-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b6c99d5e338667512701b4acd20d7e47d7062ee7a0b5205cd7c798bba8ad8aff
MD5 6664c581adc08da754cb7d5aee62c163
BLAKE2b-256 b37230df1aef7090ab56e6486e519f47bf760d75228f756ecd5239a423d84ed9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page