Skip to main content

Annotation Efficient Preference Optimization

Project description

Annotation-Efficient Preference Optimization

illustration

This repository implements the Annotation-Efficient Preference Optimization (AEPO) algorithm.

The code is tested on Ubuntu 20.04 using Python 3.9 and CUDA 11.0 (Docker image nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04).

Install

You can install aepo via pip.

pip install aepo

Source install is available too. Clone this repository and run pip install ..

git clone git@github.com:CyberAgentAILab/annotation-efficient-po.git
cd annotation-efficient-po
pip install .

Usage

The command line interface is available. The input dataset can be csv file or a dataset uploaded to Huggingface Hub. The dataset should have a column named prompt or instruction. aepo recognize it as the user prompt given to the system and the rest of the columns to be the responses generated by the system.

I prepared an example dataset in dataset/alpaca_samples.csv. The csv file includes 128 responses generated by HuggingFaceH4/mistral-7b-sft-beta for each instruction of the alpaca_human_preference split of tatsu-lab/alpaca_farm. You can try aepo using this dataset with the following command:

aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 2 --num_instructions 10

--num_responses is the number of input responses you use. The dataset has to have responses larger than or equal to --num_responses. --num_annotations is the number of responses after the subsampling process. It is also the number of times the reward model is queried per instruction.

Example: Running AEPO

You can generate a pair of responses for each instruction using aepo using the following command.

aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 2 --num_instructions 10

To subsample four responses for e.g., LiPO, set --num_annotations to four.

aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 4 --num_instructions 10

Example: Running West-of-N over 8 samples

West-of-N is a strategy to pick the Best-of-N as the chosen response, and Worst-of-N as a rejected response. It is shown to be effective for DPO and reward modeling. You can run West-of-N using this package by setting --num_annotations == --num_responses.

aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 8 --num_instructions 10

This command will generate a dataset with 8 responses, ranked by their rewards. If you only need the best and worst of the N samples, then use --west_of_n option.

aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 8 --num_instructions 10 --west_of_n

This will pick the best and worst responses as the chosen and rejected. The rest of the responses are discarded. It would be useful to construct a pairwise preference dataset.

Reference

Jinnai, Y., Honda, U. (2024). Annotation-Efficient Preference Optimization for Language Model Alignment. arXiv preprint arXiv:2405.13541.

Bibtex:

@misc{jinnai2024annotationefficient,
      title={Annotation-Efficient Preference Optimization for Language Model Alignment}, 
      author={Yuu Jinnai and Ukyo Honda},
      year={2024},
      eprint={2405.13541},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

For any questions, feel free to raise an issue or contact me at jinnai_yu@cyberagent.co.jp.

Acknowledgements

AlpacaFarm dataset is licensed under Attribution-NonCommercial 4.0 International.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aepo-0.1.6.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

aepo-0.1.6-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file aepo-0.1.6.tar.gz.

File metadata

  • Download URL: aepo-0.1.6.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.6 Linux/5.15.146.1-microsoft-standard-WSL2

File hashes

Hashes for aepo-0.1.6.tar.gz
Algorithm Hash digest
SHA256 ae19a214199a71d567de6a79ee89caad8276c30b37f0f2e7da9c0438ad431d5e
MD5 923aaa9338f4498d05f6b6aa8e188205
BLAKE2b-256 3cd0148aaa2063cdaa82ca253226ddb23626cd6b7b1bc8b6952f288bf1ecb71f

See more details on using hashes here.

File details

Details for the file aepo-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: aepo-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.6 Linux/5.15.146.1-microsoft-standard-WSL2

File hashes

Hashes for aepo-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 27a365ed0df905d83af0b59d0e0a4082755c6c82fb6d2df21600fc3905bd1e17
MD5 8d1b710c9748b624c3c763eb5f91aa89
BLAKE2b-256 39f926388e90cb0e4864da940fe1d9b705706d4b904f48b96fb856e2422644ef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page