Annotation Efficient Preference Optimization
Project description
Annotation-Efficient Preference Optimization
This repository implements the Annotation-Efficient Preference Optimization (AEPO) algorithm.
The code is tested on Ubuntu 20.04 using Python 3.9 and CUDA 11.0 (Docker image nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04).
Install
You can install aepo via pip.
pip install aepo
Source install is available too. Clone this repository and run pip install .
.
git clone git@github.com:CyberAgentAILab/annotation-efficient-po.git
cd annotation-efficient-po
pip install .
Usage
The command line interface is available. The input dataset can be csv file or a dataset uploaded to Huggingface Hub. The dataset should have a column named prompt or instruction. aepo recognize it as the user prompt given to the system and the rest of the columns to be the responses generated by the system.
I prepared an example dataset in dataset/alpaca_samples.csv
.
The csv file includes 128 responses generated by HuggingFaceH4/mistral-7b-sft-beta for each instruction of the alpaca_human_preference
split of tatsu-lab/alpaca_farm.
You can try aepo using this dataset with the following command:
aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 2 --num_instructions 10
--num_responses
is the number of input responses you use. The dataset has to have responses larger than or equal to --num_responses
. --num_annotations
is the number of responses after the subsampling process. It is also the number of times the reward model is queried per instruction.
Example: Running AEPO
You can generate a pair of responses for each instruction using aepo using the following command.
aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 2 --num_instructions 10
To subsample four responses for e.g., LiPO, set --num_annotations
to four.
aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 4 --num_instructions 10
Example: Running West-of-N over 8 samples
West-of-N is a strategy to pick the Best-of-N as the chosen response, and Worst-of-N as a rejected response. It is shown to be effective for DPO and reward modeling.
You can run West-of-N using this package by setting --num_annotations
== --num_responses
.
aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 8 --num_instructions 10
This command will generate a dataset with 8 responses, ranked by their rewards. If you only need the best and worst of the N samples, then use --west_of_n
option.
aepo dataset/alpaca_samples.csv --num_responses 8 --num_annotations 8 --num_instructions 10 --west_of_n
This will pick the best and worst responses as the chosen and rejected. The rest of the responses are discarded. It would be useful to construct a pairwise preference dataset.
Reference
TBA. Yuu Jinnai and Honda Ukyo. Annotation-Efficient Preference Optimization for Language Model Alignment, 2024.
Contact
For any questions, feel free to raise an issue or contact me at jinnai_yu@cyberagent.co.jp.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file aepo-0.1.4.tar.gz
.
File metadata
- Download URL: aepo-0.1.4.tar.gz
- Upload date:
- Size: 27.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.10.12 Linux/5.10.16.3-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b786bb95afc835a1227062d409a64172cceaed7de9aceefc9de217ec67f0c09c |
|
MD5 | dc2d748142da5ad0520e144aa091f255 |
|
BLAKE2b-256 | d6370dd130514b4f50755a416b0f7c7d3d1053a94fb65326316e13b5295b14b3 |
File details
Details for the file aepo-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: aepo-0.1.4-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.10.12 Linux/5.10.16.3-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e93ac5d2f1b5489d12cc5c4f33f7cfc8bcd5fbd8576f6098aee710e271a7b2f6 |
|
MD5 | 9561092b829952930658d5d5b124e97e |
|
BLAKE2b-256 | 4c4cd00f2179a0b84a718dbe037f00a621f13f5ca7c802804603d59b9ff4cce7 |