Skip to main content

High-quality Machine Translation Evaluation

Project description



License GitHub stars PyPI Code Style

Quick Installation

Detailed usage examples and instructions can be found in the Full Documentation.

Simple installation from PyPI

pip install unbabel-comet

To develop locally install Poetry and run the following commands:

git clone https://github.com/Unbabel/COMET
poetry install

Scoring MT outputs:

Via Bash:

Examples from WMT20:

echo -e "Dem Feuer konnte Einhalt geboten werden\nSchulen und Kindergärten wurden eröffnet." >> src.de
echo -e "The fire could be stopped\nSchools and kindergartens were open" >> hyp.en
echo -e "They were able to control the fire.\nSchools and kindergartens opened" >> ref.en
comet score -s src.de -h hyp.en -r ref.en

You can export your results to a JSON file using the --to_json flag and select another model/metric with --model.

comet score -s src.de -h hyp.en -r ref.en --model wmt-large-hter-estimator --to_json segments.json

Via Python:

from comet.models import download_model
model = download_model("wmt-large-da-estimator-1719")
data = [
    {
        "src": "Dem Feuer konnte Einhalt geboten werden",
        "mt": "The fire could be stopped",
        "ref": "They were able to control the fire."
    },
    {
        "src": "Schulen und Kindergärten wurden eröffnet.",
        "mt": "Schools and kindergartens were open",
        "ref": "Schools and kindergartens opened"
    }
]
model.predict(data, cuda=True, show_progress=True)

Simple Pythonic way to convert list or segments to model inputs:

source = ["Dem Feuer konnte Einhalt geboten werden", "Schulen und Kindergärten wurden eröffnet."]
hypothesis = ["The fire could be stopped", "Schools and kindergartens were open"]
reference = ["They were able to control the fire.", "Schools and kindergartens opened"]

data = {"src": source, "mt": hypothesis, "ref": reference}
data = [dict(zip(data, t)) for t in zip(*data.values())]

model.predict(data, cuda=True, show_progress=True)

Note: Using the python interface you will get a list of segment-level scores. You can obtain the corpus-level score by averaging the segment-level scores

Model Zoo:

Model Description
wmt-large-da-estimator-1719 RECOMMENDED: Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19
wmt-base-da-estimator-1719 Estimator model build on top of XLM-R (base) trained on DA from WMT17, WMT18 and WMT19
wmt-large-hter-estimator Estimator model build on top of XLM-R (large) trained to regress on HTER.
wmt-base-hter-estimator Estimator model build on top of XLM-R (base) trained to regress on HTER.
emnlp-base-da-ranker Translation ranking model that uses XLM-R to encode sentences. This model was trained with WMT17 and WMT18 Direct Assessments Relative Ranks (DARR).

QE-as-a-metric:

Model Description
wmt-large-qe-estimator-1719 Quality Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19.

Train your own Metric:

Instead of using pretrained models your can train your own model with the following command:

comet train -f {config_file_path}.yaml

Supported encoders:

Tensorboard:

Launch tensorboard with:

tensorboard --logdir="experiments/"

Download Command:

To download public available corpora to train your new models you can use the download command. For example to download the APEQUEST HTER corpus just run the following command:

comet download -d apequest --saving_path data/

unittest:

In order to run the toolkit tests you must run the following command:

coverage run --source=comet -m unittest discover
coverage report -m

Publications

@inproceedings{rei-etal-2020-comet,
    title = "{COMET}: A Neural Framework for {MT} Evaluation",
    author = "Rei, Ricardo  and
      Stewart, Craig  and
      Farinha, Ana C  and
      Lavie, Alon",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.213",
    pages = "2685--2702",
}
@inproceedings{rei-EtAl:2020:WMT,
  author    = {Rei, Ricardo  and  Stewart, Craig  and  Farinha, Ana C  and  Lavie, Alon},
  title     = {Unbabel's Participation in the WMT20 Metrics Shared Task},
  booktitle      = {Proceedings of the Fifth Conference on Machine Translation},
  month          = {November},
  year           = {2020},
  address        = {Online},
  publisher      = {Association for Computational Linguistics},
  pages     = {909--918},
}
@inproceedings{stewart-etal-2020-comet,
    title = "{COMET} - Deploying a New State-of-the-art {MT} Evaluation Metric in Production",
    author = "Stewart, Craig  and
      Rei, Ricardo  and
      Farinha, Catarina  and
      Lavie, Alon",
    booktitle = "Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)",
    month = oct,
    year = "2020",
    address = "Virtual",
    publisher = "Association for Machine Translation in the Americas",
    url = "https://www.aclweb.org/anthology/2020.amta-user.4",
    pages = "78--109",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unbabel-comet-0.1.0.tar.gz (38.6 kB view details)

Uploaded Source

Built Distribution

unbabel_comet-0.1.0-py3-none-any.whl (53.7 kB view details)

Uploaded Python 3

File details

Details for the file unbabel-comet-0.1.0.tar.gz.

File metadata

  • Download URL: unbabel-comet-0.1.0.tar.gz
  • Upload date:
  • Size: 38.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.6.6 Darwin/18.7.0

File hashes

Hashes for unbabel-comet-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ee9b3621b7aec6411a72fc58d7de71b14a302a30122c780749dd7c78521d2c6e
MD5 52a46329ef81a99bd5d0a6ef3441b959
BLAKE2b-256 21849abc8ac0b60ffe75c06939d3b33f913953913c5d1b66e7624d6cfa6b295e

See more details on using hashes here.

File details

Details for the file unbabel_comet-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: unbabel_comet-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 53.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.6.6 Darwin/18.7.0

File hashes

Hashes for unbabel_comet-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fb5dc0fe067b1d7613d50a268f9a05606de9ae59f74354b03cfa95408d9250e9
MD5 2edf721363c6625e9fb2c9d2351827c1
BLAKE2b-256 95aca094d9343a868f5eacc617a33c495f17e6599b8f5137fdc981c3b5f0e2a1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page