Skip to main content

Evaluate the Quality of Critique

Project description

The Critique of Critique

This is the official repository for The Critique of Critique.

Table of contents

Introduction

We introduce MetaCritique, a new judge that can effectively evaluate human-written or LLMs-generated critique by generating critique.

Meta-P: precision score of MetaCritique that evaluates factuality of hypothesis critique.

Meta-R: recall score of MetaCritique that evaluates comprehensiveness of hypothesis critique.

Meta-F1: overall rating that is harmonic mean of precision score and recall score.

Leaderboard

We release the benchmarking results of multiple critique models.

Critique Model Meta-Precision Meta-Recall Meta-F1 score
AUTO-J 76.43 70.65 71.14
GPT 3.5 80.79 64.27 68.72
UltraCM 73.64 66.77 67.79
Human Critique from Shepherd 83.19 60.65 64.02
SelFee 69.56 51.05 54.22

Quick Start

Installation

pip install meta-critique

Usage

from meta_critique import MetaCritique
api_key = ...  # here is your OpenAi key
inputs = [
            {"question": "<question>", "response": "<response1>", "hypothesis_critique": "<hypothesis_critique>"},
            {"question": "<question>", "response": "<response2>", "hypothesis_critique": "<hypothesis_critique>"},
          ...
        ]

meta_critique_instance = MetaCritique(
        model_type="gpt-4",
        batch_size=5,
        api_key=api_key,
        api_base=None,
        seed=None,
        cache_dir="tmp_cache",
    )
precision_score, recall_score, f1_score = meta_critique_instance.score(inputs)

where

  • question: The user query for the model to generate the response.
  • response: The response generated by the model.
  • hypothesis_critique: The critique written by either human or LLMs.
  • reference_answer: (Optional) The reference answer.
  • reference_critique: (Optional) The reference critique.
    • str: a critique text
    • dict: {"critique": <reference_critique>, "aius": <optional_aius_from_reference_critique>}

You can find a test sample from eval_examples/test_samples.json

Citation

If you find our work useful or use meta-critique, please cite our paper:

@article{sun2024metacritique,
  title={The Critique of Critique},
  author={Shichao Sun, Junlong Li, Weizhe Yuan, Ruifeng Yuan, Wenjie Li, Pengfei Liu},
  journal={arXiv preprint arXiv:2401.04518},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meta-critique-0.1.2.tar.gz (18.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page