High-quality Machine Translation Evaluation Tool
Project description
🚀ReMedy: Machine Translation Evaluation via Reward Modeling
Learning High-Quality Machine Translation Evaluation from Human Preferences with Reward Modeling
✨ About ReMedy
ReMedy is a new state-of-the-art machine translation (MT) evaluation framework that reframes the task as reward modeling rather than direct regression. Instead of relying on noisy human scores, ReMedy learns from pairwise human preferences, leading to better alignment with human judgments.
- 📈 State-of-the-art accuracy on WMT22–24 (39 language pairs, 111 systems)
- ⚖️ Segment- and system-level evaluation, outperforming GPT-4, PaLM-540B, Finetuned-PaLM2, MetricX-13B, and XCOMET
- 🔍 More robust on low-quality and out-of-domain translations (ACES, MSLC benchmarks)
- 🧠 Can be used as a reward model in RLHF pipelines to improve MT systems
ReMedy demonstrates that reward modeling with pairwise preferences offers a more reliable and human-aligned approach for MT evaluation.
📚 Contents
- 📦 Quick Installation
- ⚙️ Requirements
- 🚀 Usage
- ⚙️ Full Argument List
- 🧠 Model Variants
- 🔁 Reproducing WMT Results
- 📚 Citation
📦 Quick Installation
ReMedy requires Python ≥ 3.12, and leverages VLLM for fast inference.
✅ Recommended: Install via pip
pip install --upgrade pip
pip install remedy-mt-eval
🛠️ Install from Source
git clone https://github.com/Smu-Tan/Remedy
cd Remedy
pip install -e .
📜 Install via Poetry
git clone https://github.com/Smu-Tan/Remedy
cd Remedy
poetry install
⚙️ Requirements
Python≥ 3.12transformers≥ 4.51.1vllm≥ 0.8.5torch≥ 2.6.0- (See
pyproject.tomlfor full dependencies)
🚀 Usage
💾 Download ReMedy Models
Before using, you can download the model from HuggingFace:
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download ShaomuTan/ReMedy-9B-22 --local-dir Models/ReMedy-9B-22
You can replace ReMedy-9B-22 with other variants like ReMedy-9B-23.
🔹 Basic Usage
remedy-score \
--model ShaomuTan/ReMedy-9B-22 \
--src_file testcase/en.src \
--mt_file testcase/en-de.hyp \
--ref_file testcase/de.ref \
--src_lang en --tgt_lang de \
--cache_dir Models \
--save_dir testcase \
--num_gpus 4 \
--calibrate
🔹 Reference-Free Mode (Quality Estimation)
remedy-score \
--model ShaomuTan/ReMedy-9B-22 \
--src_file testcase/en.src \
--mt_file testcase/en-de.hyp \
--no_ref \
--src_lang en --tgt_lang de \
--cache_dir Models \
--save_dir testcase/QE \
--num_gpus 4 \
--calibrate
📄 Output Files
src-tgt_raw_scores.txtsrc-tgt_sigmoid_scores.txtsrc-tgt_calibration_scores.txtsrc-tgt_detailed_results.tsvsrc-tgt_result.json
Inspired by SacreBLEU, ReMedy provides JSON-style results to ensure transparency and comparability.
📘 Example JSON Output
{
"metric_name": "remedy-9B-22",
"raw_score": 4.502863049214531,
"sigmoid_score": 0.9613502018042875,
"calibration_score": 0.9029647169507162,
"calibration_temp": 1.7999999999999998,
"signature": "metric_name:remedy-9B-22|lp:en-de|ref:yes|version:0.1.1",
"language_pair": "en-de",
"source_language": "en",
"target_language": "de",
"segments": 2037,
"version": "0.1.1",
"args": {
"src_file": "testcase/en.src",
"mt_file": "testcase/en-de.hyp",
"src_lang": "en",
"tgt_lang": "de",
"model": "Models/remedy-9B-22",
"cache_dir": "Models",
"save_dir": "testcase",
"ref_file": "testcase/de.ref",
"no_ref": false,
"calibrate": true,
"num_gpus": 4,
"num_seqs": 256,
"max_length": 4096,
"enable_truncate": false,
"version": false,
"list_languages": false
}
}
⚙️ Full Argument List
📋 Show CLI Arguments
🔸 Required
--src_file # Path to source file
--mt_file # Path to MT output file
--src_lang # Source language code
--tgt_lang # Target language code
--model # Model path or HuggingFace ID
--save_dir # Output directory
🔸 Optional
--ref_file # Reference file path
--no_ref # Reference-free mode
--cache_dir # Cache directory
--calibrate # Enable calibration
--num_gpus # Number of GPUs
--num_seqs # Number of sequences (default: 256)
--max_length # Max token length (default: 4096)
--enable_truncate # Truncate sequences
--version # Print version
--list_languages # List supported languages
🧠 Model Variants
| Model | Size | Base Model | Ref/QE | Download |
|---|---|---|---|---|
| ReMedy-9B-22 | 9B | Gemma-2-9B | Both | 🤗 HuggingFace |
| ReMedy-9B-23 | 9B | Gemma-2-9B | Both | 🤗 HuggingFace |
| ReMedy-9B-24 | 9B | Gemma-2-9B | Both | 🤗 HuggingFace |
More variants coming soon...
🔁 Reproducing WMT Results
Click to show instructions for reproducing WMT22–24 evaluation
1. Clone ReMedy repo
git clone https://github.com/Smu-Tan/Remedy
cd Remedy
2. Install mt-metrics-eval
# Install MTME and download WMT data
git clone https://github.com/google-research/mt-metrics-eval.git
cd mt-metrics-eval
pip install .
cd ..
python3 -m mt_metrics_eval.mtme --download
3. Run ReMedy on WMT data
sbatch wmt/wmt22.sh
sbatch wmt/wmt23.sh
sbatch wmt/wmt24.sh
📄 Results will be comparable with other metrics reported in WMT shared tasks.
📚 Citation
If you use ReMedy, please cite the following paper:
@article{tan2024remedy,
title={ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling},
author={Tan, Shaomu and Monz, Christof},
journal={arXiv preprint},
year={2024}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file remedy_mt_eval-0.1.3.tar.gz.
File metadata
- Download URL: remedy_mt_eval-0.1.3.tar.gz
- Upload date:
- Size: 28.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa5b74c0d8a3b746a118ca2b29206860cd2675830a95fd78595f4251926abbcb
|
|
| MD5 |
fdda9da59ae212d2c9330051348fcc5c
|
|
| BLAKE2b-256 |
d84a3669d515809ffe3818357deb7ad634a7439a4d6bcbd8b2d85979c10a8ce5
|
File details
Details for the file remedy_mt_eval-0.1.3-py3-none-any.whl.
File metadata
- Download URL: remedy_mt_eval-0.1.3-py3-none-any.whl
- Upload date:
- Size: 34.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3024ca11b393a7e434bce3e363e910392b0031e69193f572f8709981f6ed816e
|
|
| MD5 |
fd4aaf1f6631e14187dfd97040903133
|
|
| BLAKE2b-256 |
f05a9e6e3d8927c881fced8bdf4dfe124c343b5b57fa4fb434b1383b08024119
|