A visual platform for contrastive evaluation of machine translation systems
Project description
MT-Telescope
MT-Telescope is a toolkit for comparative analysis of MT systems that provides a number of tools that add rigor and depth to MT evaluation. With this package we endeavour to make it easier for researchers and industry practitioners to compare MT systems by giving you easy access to:
- SOTA MT evaluation metrics such as COMET (rei, et al 2020).
- Statistical tests such as bootstrap resampling (Koehn, et al 2004).
- Dynamic Filters to select parts of your testset with specific phenomena
- Visual interface/plots to compare systems side-by-side segment-by-segment.
We highly recommend reading the following papers to learn more about how to perform better MT-Evaluation:
- Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers
- To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation
Install:
Via pip:
pip install mt-telescope
Note: This is a pre-release currently.
Locally:
Create a virtual environment and make sure you have poetry installed.
Finally run:
git clone https://github.com/Unbabel/MT-Telescope
cd MT-Telescope
poetry install --no-dev
Scoring:
To get the system level scores for a particular MT simply run telescope score
.
telescope score -s {path/to/sources} -t {path/to/translations} -r {path/to/references} -l {target_language} -m COMET -m chrF
Comparing two systems:
For comparison between two systems you can run telescope using:
- The command line interface
- A web browser
Command Line Interface (CLI):
For running system comparisons with CLI you should use the telescope compare
command.
Usage: telescope compare [OPTIONS]
Options:
-s, --source FILENAME Source segments. [required]
-x, --system_x FILENAME System X MT outputs. [required]
-y, --system_y FILENAME System Y MT outputs. [required]
-r, --reference FILENAME Reference segments. [required]
-l, --language TEXT Language of the evaluated text. [required]
-m, --metric [COMET|sacreBLEU|chrF|ZeroEdit|BERTScore|TER|Prism|GLEU]
MT metric to run. [required]
-f, --filter [named-entities|duplicates]
MT metric to run.
--seg_metric [COMET|ZeroEdit|BLEURT|BERTScore|Prism|GLEU]
Segment-level metric to use for segment-
level analysis.
-o, --output_folder TEXT Folder you wish to use to save plots.
--bootstrap
--num_splits INTEGER Number of random partitions used in
Bootstrap resampling.
--sample_ratio FLOAT Folder you wish to use to save plots.
--help Show this message and exit.
Example 1: Running several metrics
Running BLEU, chrF BERTScore and COMET to compare two systems:
telescope compare \
-s path/to/src/file.txt \
-x path/to/system-x/file.txt \
-y path/to/system-y \
-r path/to/ref/file.txt \
-l en \
-m BLEU -m chrF -m BERTScore -m COMET
Example 2: Saving a comparison report
telescope compare \
-s path/to/src/file.txt \
-x path/to/system-x/file.txt \
-y path/to/system-y \
-r path/to/ref/file.txt \
-l en \
-m COMET \
--output_folder FOLDER-PATH
Web Interface
To run a web interface simply run:
telescope streamlit
Some metrics like COMET can take some time to run inside streamlit. You can switch the COMET model to a more lightweight model with the following env variable:
export COMET_MODEL=wmt21-cometinho-da
Cite:
@inproceedings{rei-etal-2021-mt,
title = "{MT}-{T}elescope: {A}n interactive platform for contrastive evaluation of {MT} systems",
author = {Rei, Ricardo and Stewart, Craig and Farinha, Ana C and Lavie, Alon},
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-demo.9",
doi = "10.18653/v1/2021.acl-demo.9",
pages = "73--80",
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mt-telescope-0.0.2.tar.gz
.
File metadata
- Download URL: mt-telescope-0.0.2.tar.gz
- Upload date:
- Size: 28.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a48f5a114e007dc7e7da0184b31cb3c99c7a7308dc668ba148272f545b304503 |
|
MD5 | 936e973ec726ff2287c11e657a6f736d |
|
BLAKE2b-256 | eafaa82d5d4044bbf6ab06b1cf9b707e6e17b8ebd64bcf3454220cd381563699 |
Provenance
File details
Details for the file mt_telescope-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: mt_telescope-0.0.2-py3-none-any.whl
- Upload date:
- Size: 52.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d973393ebb2c0f6ba528956328d564eaa97a1809dd9157916d4a61f724d20e78 |
|
MD5 | 371a0d1221ae833180577369ec18dbeb |
|
BLAKE2b-256 | 11bdd25f67df923aeb2a9a108c26c897d3285000e02847a346ae612216578afb |