Skip to main content

High-quality Machine Translation Evaluation

Project description



License GitHub stars PyPI Code Style

Quick Installation

Detailed usage examples and instructions can be found in the Full Documentation.

Simple installation from PyPI

pip install unbabel-comet

To develop locally install Poetry and run the following commands:

git clone https://github.com/Unbabel/COMET
poetry install

Scoring MT outputs:

Via Bash:

Examples from WMT20:

echo -e "Dem Feuer konnte Einhalt geboten werden\nSchulen und Kindergärten wurden eröffnet." >> src.de
echo -e "The fire could be stopped\nSchools and kindergartens were open" >> hyp.en
echo -e "They were able to control the fire.\nSchools and kindergartens opened" >> ref.en
comet-score -s src.de -t hyp.en -r ref.en

You can select another model/metric with the --model flag and for reference-less (QE-as-a-metric) models you dont need to pass a reference.

comet-score -s src.de -t hyp.en -r ref.en --model refless-wmt21-large-da-1520

Following the work on Uncertainty-Aware MT Evaluation you can use the --mc_dropout flag to get a variance/uncertainty value for each segment score. If this value is high, it means that the metric as less confidence is that prediction.

comet-score -s src.de -t hyp.en -r ref.en --mc_dropout 100

Languages Covered:

All the above mentioned models are build on top of XLM-R which cover the following languages:

Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.

Thus, results for language pairs containing uncovered languages are unreliable!

Scoring within Python:

COMET implements the Pytorch-Lightning model interface which means that you'll need to initialize a trainer in order to run inference.

import torch
from comet import download_model, load_from_checkpoint
from pytorch_lightning.trainer.trainer import Trainer
from torch.utils.data import DataLoader

model = load_from_checkpoint(
  download_model("wmt21-small-da-152012")
)
data = [
    {
        "src": "Dem Feuer konnte Einhalt geboten werden",
        "mt": "The fire could be stopped",
        "ref": "They were able to control the fire."
    },
    {
        "src": "Schulen und Kindergärten wurden eröffnet.",
        "mt": "Schools and kindergartens were open",
        "ref": "Schools and kindergartens opened"
    }
]
data = [dict(zip(data, t)) for t in zip(*data.values())]
dataloader = DataLoader(
  dataset=data,
  batch_size=16,
  collate_fn=lambda x: model.prepare_sample(x, inference=True),
  num_workers=4,
)
trainer = Trainer(gpus=1, deterministic=True, logger=False)
predictions = trainer.predict(
  model, dataloaders=dataloader, return_predictions=True
)
predictions = torch.cat(predictions, dim=0).tolist()

Note: Using the python interface you will get a list of segment-level scores. You can obtain the corpus-level score by averaging the segment-level scores

Model Zoo:

:TODO: Update model zoo after the shared task.

Model Description
wmt21-large-da-1520 RECOMMENDED: Regression model build on top of XLM-R (large) trained on DA from WMT15, to WMT20
wmt21-small-da-152012 Same as the model above but trained on a small version of XLM-R that was distilled from XLM-R large

QE-as-a-metric:

Model Description
refless-wmt21-large-da-1520 Reference-less model trained on top of XLM-R large with DAs from WMT15 to WMT20.

Train your own Metric:

Instead of using pretrained models your can train your own model with the following command:

comet-train -cfg configs/models/{your_model_config}.yaml

Tensorboard:

Launch tensorboard with:

tensorboard --logdir="lightning_logs/"

unittest:

In order to run the toolkit tests you must run the following command:

coverage run --source=comet -m unittest discover
coverage report -m

Publications

@inproceedings{rei-etal-2020-comet,
    title = "{COMET}: A Neural Framework for {MT} Evaluation",
    author = "Rei, Ricardo  and
      Stewart, Craig  and
      Farinha, Ana C  and
      Lavie, Alon",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.213",
    pages = "2685--2702",
}
@inproceedings{rei-EtAl:2020:WMT,
  author    = {Rei, Ricardo  and  Stewart, Craig  and  Farinha, Ana C  and  Lavie, Alon},
  title     = {Unbabel's Participation in the WMT20 Metrics Shared Task},
  booktitle      = {Proceedings of the Fifth Conference on Machine Translation},
  month          = {November},
  year           = {2020},
  address        = {Online},
  publisher      = {Association for Computational Linguistics},
  pages     = {909--918},
}
@inproceedings{stewart-etal-2020-comet,
    title = "{COMET} - Deploying a New State-of-the-art {MT} Evaluation Metric in Production",
    author = "Stewart, Craig  and
      Rei, Ricardo  and
      Farinha, Catarina  and
      Lavie, Alon",
    booktitle = "Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)",
    month = oct,
    year = "2020",
    address = "Virtual",
    publisher = "Association for Machine Translation in the Americas",
    url = "https://www.aclweb.org/anthology/2020.amta-user.4",
    pages = "78--109",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unbabel-comet-1.0.0rc1.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

unbabel_comet-1.0.0rc1-py3-none-any.whl (43.9 kB view details)

Uploaded Python 3

File details

Details for the file unbabel-comet-1.0.0rc1.tar.gz.

File metadata

  • Download URL: unbabel-comet-1.0.0rc1.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.5.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.6.9

File hashes

Hashes for unbabel-comet-1.0.0rc1.tar.gz
Algorithm Hash digest
SHA256 37fe2b75c78f8313bf3f814094b4e7fbe7c602d9afc68152b78b6523b5d66d03
MD5 5670f8d81e9c535b7317e05ed1a5e151
BLAKE2b-256 f4dd0cec289d5dbd145925c415e33829ec798df52be4489ba18d18d2c0dca9b9

See more details on using hashes here.

File details

Details for the file unbabel_comet-1.0.0rc1-py3-none-any.whl.

File metadata

  • Download URL: unbabel_comet-1.0.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 43.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.5.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.6.9

File hashes

Hashes for unbabel_comet-1.0.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 75fefd813edf9bbbaed22a41aee5bcb431ac5e5cdd04217105f37dd4528fb750
MD5 f72ffe9fc2168d6349fa7c6485270639
BLAKE2b-256 8c601f35e3393b6752f4e3da41bfe6956c913ada4d826ca30d0b245bcf4074b0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page