Skip to main content

High-quality Machine Translation Evaluation

Project description



License GitHub stars PyPI Code Style

Quick Installation

Detailed usage examples and instructions can be found in the Full Documentation.

Simple installation from PyPI

Pre-release of version 1.0:

pip install unbabel-comet==1.0.0rc4

To develop locally install Poetry and run the following commands:

git clone https://github.com/Unbabel/COMET
poetry install

Scoring MT outputs:

Via Bash:

Examples from WMT20:

echo -e "Dem Feuer konnte Einhalt geboten werden\nSchulen und Kindergärten wurden eröffnet." >> src.de
echo -e "The fire could be stopped\nSchools and kindergartens were open" >> hyp.en
echo -e "They were able to control the fire.\nSchools and kindergartens opened" >> ref.en
comet-score -s src.de -t hyp.en -r ref.en

You can select another model/metric with the --model flag and for reference-free (QE-as-a-metric) models you don't need to pass a reference.

comet-score -s src.de -t hyp.en --model wmt20-comet-qe-da

Following the work on Uncertainty-Aware MT Evaluation you can use the --mc_dropout flag to get a variance/uncertainty value for each segment score. If this value is high, it means that the metric is less confident in that prediction.

comet-score -s src.de -t hyp.en -r ref.en --mc_dropout 30

When comparing two MT systems we encourage you to run the comet-compare command to get a contrastive statistical significance with bootstrap resampling (Koehn, et al 2004).

comet-compare --help

For even more detailed MT contrastive evaluation please take a look at our new tool MT-Telescope.

Scoring within Python:

from comet import download_model, load_from_checkpoint

model_path = download_model("wmt20-comet-da")
model = load_from_checkpoint(model_path)
data = [
    {
        "src": "Dem Feuer konnte Einhalt geboten werden",
        "mt": "The fire could be stopped",
        "ref": "They were able to control the fire."
    },
    {
        "src": "Schulen und Kindergärten wurden eröffnet.",
        "mt": "Schools and kindergartens were open",
        "ref": "Schools and kindergartens opened"
    }
]
seg_scores, sys_score = model.predict(data, batch_size=8, gpus=1)

Languages Covered:

All the above mentioned models are build on top of XLM-R which cover the following languages:

Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.

Thus, results for language pairs containing uncovered languages are unreliable!

COMET Models:

We recommend the two following models to evaluate your translations:

  • wmt20-comet-da: DEFAULT Reference-based Regression model build on top of XLM-R (large) and trained of Direct Assessments from WMT17 to WMT19. Same as wmt-large-da-estimator-1719 from previous versions.
  • wmt20-comet-qe-da: Reference-FREE Regression model build on top of XLM-R (large) and trained of Direct Assessments from WMT17 to WMT19. Same as wmt-large-qe-estimator-1719 from previous versions.

This two models were developed to participate on the WMT20 Metrics shared task (Mathur et al. 2020). This two metric/models are to the date, the best performing metrics at segment-level in the MQM data released recently by Google (Freitag et al. 2020). Also, in a large-scale study performed by Microsoft Research this two metrics ranked 1st and 2nd in terms of system-level decision accuracy (Kocmi et al. 2020).

For more information about the available COMET models we invite you to read our metrics descriptions here

Train your own Metric:

Instead of using pretrained models your can train your own model with the following command:

comet-train --cfg configs/models/{your_model_config}.yaml

unittest:

In order to run the toolkit tests you must run the following command:

coverage run --source=comet -m unittest discover
coverage report -m

Publications

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unbabel-comet-1.0.0rc4.tar.gz (28.7 kB view details)

Uploaded Source

Built Distribution

unbabel_comet-1.0.0rc4-py3-none-any.whl (47.1 kB view details)

Uploaded Python 3

File details

Details for the file unbabel-comet-1.0.0rc4.tar.gz.

File metadata

  • Download URL: unbabel-comet-1.0.0rc4.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for unbabel-comet-1.0.0rc4.tar.gz
Algorithm Hash digest
SHA256 072abf7d7a85da84ae47611e68a223892dce215eb8dcdf207f0d97067594320c
MD5 bbed8e1912e2ef4a78378fea55120828
BLAKE2b-256 913ee69decc9b6d91e7c440fcc1bb9d044492f11ecb928b6862e9f8c8fcbb501

See more details on using hashes here.

File details

Details for the file unbabel_comet-1.0.0rc4-py3-none-any.whl.

File metadata

  • Download URL: unbabel_comet-1.0.0rc4-py3-none-any.whl
  • Upload date:
  • Size: 47.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for unbabel_comet-1.0.0rc4-py3-none-any.whl
Algorithm Hash digest
SHA256 3ba4594eee627d0143d5b01d6e912a43dea84faef72fd55c7292a2639a7e938b
MD5 451c7cd502dd91427a909e34d43b8ec2
BLAKE2b-256 ad403bd7915e5657f0906cd5108ab2e78a757f4ed0836fe7d52c40215440051c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page