Skip to main content

High-quality Machine Translation Evaluation

Project description



License GitHub stars PyPI Code Style

Quick Installation

Detailed usage examples and instructions can be found in the Full Documentation.

Simple installation from PyPI

Pre-release of version 1.0:

pip install unbabel-comet==1.0.0rc3

To develop locally install Poetry and run the following commands:

git clone https://github.com/Unbabel/COMET
poetry install

Scoring MT outputs:

Via Bash:

Examples from WMT20:

echo -e "Dem Feuer konnte Einhalt geboten werden\nSchulen und Kindergärten wurden eröffnet." >> src.de
echo -e "The fire could be stopped\nSchools and kindergartens were open" >> hyp.en
echo -e "They were able to control the fire.\nSchools and kindergartens opened" >> ref.en
comet-score -s src.de -t hyp.en -r ref.en

You can select another model/metric with the --model flag and for reference-free (QE-as-a-metric) models you don't need to pass a reference.

comet-score -s src.de -t hyp.en -r ref.en --model wmt21-comet-qe-da

Following the work on Uncertainty-Aware MT Evaluation you can use the --mc_dropout flag to get a variance/uncertainty value for each segment score. If this value is high, it means that the metric is less confident in that prediction.

comet-score -s src.de -t hyp.en -r ref.en --mc_dropout 30

Languages Covered:

All the above mentioned models are build on top of XLM-R which cover the following languages:

Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.

Thus, results for language pairs containing uncovered languages are unreliable!

Scoring within Python:

COMET implements the Pytorch-Lightning model interface which means that you'll need to initialize a trainer in order to run inference.

from comet import download_model, load_from_checkpoint

model_path = download_model("wmt20-comet-da")
model = load_from_checkpoint(model_path)
data = [
    {
        "src": "Dem Feuer konnte Einhalt geboten werden",
        "mt": "The fire could be stopped",
        "ref": "They were able to control the fire."
    },
    {
        "src": "Schulen und Kindergärten wurden eröffnet.",
        "mt": "Schools and kindergartens were open",
        "ref": "Schools and kindergartens opened"
    }
]
predictions, system_score = model.predict(data, batch_size=8, gpus=1)

Model Zoo:

Model Description
wmt20-comet-da RECOMMENDED: Regression model build on top of XLM-R (large) trained on DA from WMT17, to WMT19. This model was presented at the WMT20 Metrics shared task: rei et al, 2020. Same as wmt-large-da-estimator-1719 from previous versions.
emnlp20-comet-rank Translation Ranking model build on top of XLM-R (base) trained with DARR from WMT17 and WMT18. This model was presented at EMNLP20: rei et al, 2020.

Note: Scores between models are not comparable! each model learns its own distribution and the scale might differ.

QE-as-a-metric:

Model Description
wmt20-comet-qe-da Reference-free Regression model build on top of XLM-R (large) trained on DA from WMT17, to WMT19. This model was presented at the WMT20 Metrics shared task: rei et al, 2020. Same as wmt-large-qe-estimator-1719 from previous versions.

Train your own Metric:

Instead of using pretrained models your can train your own model with the following command:

comet-train --cfg configs/models/{your_model_config}.yaml

Tensorboard:

Launch tensorboard with:

tensorboard --logdir="lightning_logs/"

unittest:

In order to run the toolkit tests you must run the following command:

coverage run --source=comet -m unittest discover
coverage report -m

Publications

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unbabel-comet-1.0.0rc3.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

unbabel_comet-1.0.0rc3-py3-none-any.whl (44.5 kB view details)

Uploaded Python 3

File details

Details for the file unbabel-comet-1.0.0rc3.tar.gz.

File metadata

  • Download URL: unbabel-comet-1.0.0rc3.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for unbabel-comet-1.0.0rc3.tar.gz
Algorithm Hash digest
SHA256 0ad69ff5a3ccefb01d71850b1b0096dc4c959cd4776f005f767b51f9cf0de891
MD5 265df9145a1ee048349136e37a85745a
BLAKE2b-256 b97eeb80dafddb3c9d17ba931bedcda30387197b87c84220386ca90d1731fb2a

See more details on using hashes here.

File details

Details for the file unbabel_comet-1.0.0rc3-py3-none-any.whl.

File metadata

  • Download URL: unbabel_comet-1.0.0rc3-py3-none-any.whl
  • Upload date:
  • Size: 44.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for unbabel_comet-1.0.0rc3-py3-none-any.whl
Algorithm Hash digest
SHA256 51060e8b703b973fc46442bb1ecb72451ece0cb9e87388988d4bb4c405ae70dd
MD5 5cf7ec41bbe74daf1b851ef8e2a7e29c
BLAKE2b-256 6896fcb3fe689b3b366159d323fec11ad23eb9d06c71189bc10a0ff65c0750c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page