Evaluate the Quality of Critique
Project description
The Critique of Critique
This is the official repository for The Critique of Critique.
Table of contents
Introduction
We introduce MetaCritique, a new judge that can effectively evaluate human-written or LLMs-generated critique by generating critique.
Meta-P: precision score of MetaCritique that evaluates factuality of hypothesis critique.
Meta-R: recall score of MetaCritique that evaluates comprehensiveness of hypothesis critique.
Meta-F1: overall rating that is harmonic mean of precision score and recall score.
Leaderboard
We release the benchmarking results of multiple critique models.
Critique Model | Meta-Precision | Meta-Recall | Meta-F1 score |
---|---|---|---|
AUTO-J | 76.43 | 70.65 | 71.14 |
GPT 3.5 | 80.79 | 64.27 | 68.72 |
UltraCM | 73.64 | 66.77 | 67.79 |
Human Critique from Shepherd | 83.19 | 60.65 | 64.02 |
SelFee | 69.56 | 51.05 | 54.22 |
Quick Start
Installation
pip install meta-critique
Usage
from meta_critique import MetaCritique
api_key = ... # here is your OpenAi key
inputs = [
{"question": "<question>", "response": "<response1>", "hypothesis_critique": "<hypothesis_critique>"},
{"question": "<question>", "response": "<response2>", "hypothesis_critique": "<hypothesis_critique>"},
...
]
meta_critique_instance = MetaCritique(
model_type="gpt-4",
batch_size=5,
api_key=api_key,
api_base=None,
seed=None,
cache_dir="tmp_cache",
)
precision_score, recall_score, f1_score = meta_critique_instance.score(inputs)
where
question
: The user query for the model to generate the response.response
: The response generated by the model.hypothesis_critique
: The critique written by either human or LLMs.reference_answer
: (Optional) The reference answer.reference_critique
: (Optional) The reference critique.- str: a critique text
- dict: {"critique": <reference_critique>, "aius": <optional_aius_from_reference_critique>}
You can find a test sample from eval_examples/test_samples.json
Citation
If you find our work useful or use meta-critique, please cite our paper:
@article{sun2024metacritique,
title={The Critique of Critique},
author={Shichao Sun, Junlong Li, Weizhe Yuan, Ruifeng Yuan, Wenjie Li, Pengfei Liu},
journal={arXiv preprint arXiv:2401.04518},
year={2024}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.