Skip to main content

Evaluate the Quality of Critique

Project description

The Critique of Critique

This is the official repository for The Critique of Critique.

Table of contents

Introduction

We introduce MetaCritique, a new judge that can effectively evaluate human-written or LLMs-generated critique by generating critique.

Meta-P: precision score of MetaCritique that evaluates factuality of hypothesis critique.

Meta-R: recall score of MetaCritique that evaluates comprehensiveness of hypothesis critique.

Meta-F1: overall rating that is harmonic mean of precision score and recall score.

Leaderboard

We release the benchmarking results of multiple critique models.

Critique Model Meta-Precision Meta-Recall Meta-F1 score
AUTO-J 76.43 70.65 71.14
GPT 3.5 80.79 64.27 68.72
UltraCM 73.64 66.77 67.79
Human Critique from Shepherd 83.19 60.65 64.02
SelFee 69.56 51.05 54.22

Quick Start

Installation

pip install meta-critique

Usage

from meta_critique import MetaCritique
api_key = ...  # here is your OpenAi key
inputs = [
            {"question": "<question>", "response": "<response1>", "hypothesis_critique": "<hypothesis_critique>"},
            {"question": "<question>", "response": "<response2>", "hypothesis_critique": "<hypothesis_critique>"},
          ...
        ]

meta_critique_instance = MetaCritique(
        model_type="gpt-4",
        batch_size=5,
        api_key=api_key,
        api_base=None,
        seed=None,
        cache_dir="tmp_cache",
    )
precision_score, recall_score, f1_score = meta_critique_instance.score(inputs)

where

  • question: The user query for the model to generate the response.
  • response: The response generated by the model.
  • hypothesis_critique: The critique written by either human or LLMs.
  • reference_answer: (Optional) The reference answer.
  • reference_critique: (Optional) The reference critique.
    • str: a critique text
    • dict: {"critique": <reference_critique>, "aius": <optional_aius_from_reference_critique>}

You can find a test sample from eval_examples/test_samples.json

Citation

If you find our work useful or use meta-critique, please cite our paper:

@article{sun2024metacritique,
  title={The Critique of Critique},
  author={Shichao Sun, Junlong Li, Weizhe Yuan, Ruifeng Yuan, Wenjie Li, Pengfei Liu},
  journal={arXiv preprint arXiv:2401.04518},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meta-critique-0.1.2.tar.gz (18.8 kB view details)

Uploaded Source

File details

Details for the file meta-critique-0.1.2.tar.gz.

File metadata

  • Download URL: meta-critique-0.1.2.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for meta-critique-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4fcc3eaf591cd7a37ffca07fd3ac4bcc08324833c05d632758e55069629d07fc
MD5 b09a118da34e00553b9708a4e06c7390
BLAKE2b-256 8bf5257472e8928db2e96182d16006ef79693f086cf396530d606e6de68a4374

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page