SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic Mistakes

Project description

SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic Mistakes

SESCORE2, is a SSL method to train a metric for general text generation tasks without human ratings. We develop a technique to synthesize candidate sentences with varying levels of mistakes for training. To make these self-constructed samples realistic, we introduce retrieval augmented synthesis on anchor text; It outperforms SEScore in four text generation tasks with three languages (The overall kendall correlation improves 14.3%).

Paper: https://arxiv.org/abs/2212.09305

Author Email: wendaxu@cs.ucsb.edu

Maintainer Email: zihan_ma@ucsb.edu

Install all dependencies:

```
pip install sescore2
```

Instructions to score sentences using SEScore2:

Currently, the PyPI version only support English (en) and German (de) Checkpoint. The model checkpoint is trained using mT5-xl and using Human rating data to fine-tune.

To run SEScore2 for text generation evaluation:

```
from sescore2 import SEScore2

scorer = SEScore2('en') # Download and load in metric with specified language, en (English), de (German), ja ('Japanese')

refs = ["Jova becomes Western Hemisphere's strongest hurricane so far in 2023 ... for now", "Jova becomes Western Hemisphere's strongest hurricane so far in 2023 ... for now"]

outs = ["Jova set to become Western Hemisphere's most powerful hurricane in 2023...so far", "Jova set to become Western Hemisphere's weakest hurricane in 2023"]

scores_ls = scorer.score(refs, outs, 1)
```

GitHub Page

If you want to reproduce the synthetic data, and use the original XLM/RemBERT SEScore2 weight, please refer to the GitHub repository: https://github.com/xu1998hz/SEScore2

Install all dependencies for GitHub version:

```
pip install -r requirement/requirements.txt

# To evaluate WMT shared metric task using official script
git clone https://github.com/google-research/mt-metrics-eval.git
cd mt-metrics-eval
pip install .

# Download evaluation data for WMT20, 21 and 22
alias mtme='python3 -m mt_metrics_eval.mtme'
mtme --download  # Puts ~1G of data into $HOME/.mt-metrics-eval.
```

Score sentences using SEScore2 for GitHub version:

Download weights and data from Google Drive (https://drive.google.com/drive/folders/1I9oji2_rwvifuUSqO-59Fi_vIok_Wvq8?usp=sharing) We support five languages: English, German, Spanish, Chinese and Japanese.

```
from SEScore2 import SEScore2
from train.regression import *

scorer = SEScore2('en') # load in metric with specified language, en (English), de (German), ja ('Japanese')

refs = ["SEScore is a simple but effective next generation text generation evaluation metric", "SEScore it really works"]

outs = ["SEScore is a simple effective text evaluation metric for next generation", "SEScore is not working"]

scores_ls = scorer.score(refs, outs, 1)
```

Project details

Release history Release notifications | RSS feed

1.0.8

Nov 27, 2023

This version

1.0.7

Nov 27, 2023

1.0.6

Oct 18, 2023

1.0.5

Oct 18, 2023

1.0.4

Oct 5, 2023

1.0.3

Sep 19, 2023

1.0.2

Sep 8, 2023

1.0.1

Sep 7, 2023

1.0.0 yanked

Sep 7, 2023

Reason this release was yanked:

Old version

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sescore2-1.0.7.tar.gz (76.9 kB view details)

Uploaded Nov 27, 2023 Source

File details

Details for the file sescore2-1.0.7.tar.gz.

File metadata

Download URL: sescore2-1.0.7.tar.gz
Upload date: Nov 27, 2023
Size: 76.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for sescore2-1.0.7.tar.gz
Algorithm	Hash digest
SHA256	`5b2d89d9c054deaf12581fb2fa7a9a4904c517d779723c949f759f5fe1a02b27`
MD5	`4c4e2d2f9ff83bdddd008b6883bb770b`
BLAKE2b-256	`5a5136a2a1a0cbf2f5b3e5d5c0f9e4f47a8178dde60a10f6310a3aff40b0c6a3`

See more details on using hashes here.

sescore2 1.0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic Mistakes

Paper: https://arxiv.org/abs/2212.09305

Author Email: wendaxu@cs.ucsb.edu

Maintainer Email: zihan_ma@ucsb.edu

Install all dependencies:

Instructions to score sentences using SEScore2:

GitHub Page

Install all dependencies for GitHub version:

Score sentences using SEScore2 for GitHub version:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes