Skip to main content

High-quality Machine Translation Evaluation

Project description

comet_logo

Note: This is a Pre-Release Version. We are currently working on results for the WMT2020 shared task and will likely update the repository in the beginning of October (after the shared task results).

Quick Installation

Detailed usage examples and instructions can be found in the Full Documentation.

To install COMET as a package, simply run

pip install unbabel-comet

Scoring MT outputs:

Via Bash:

comet score -s path/to/sources.txt -h path/to/hypothesis.txt -r path/to/references.txt

You can export your results to a JSON file using the --to_json flag and select another model/metric with --model.

comet score -s path/to/sources.txt -h path/to/hypothesis.txt -r path/to/references.txt --model wmt-large-hter-estimator --to_json output.json

Via Python:

from comet.models import download_model
model = download_model("wmt-large-da-estimator-1719", "path/where/to/save/models/")
data = [
    {
        "src": "Hello world!",
        "mt": "Oi mundo!",
        "ref": "Olá mundo!"
    },
    {
        "src": "This is a sample",
        "mt": "este é um exemplo",
        "ref": "isto é um exemplo!"
    }
]
model.predict(data)

Simple Pythonic way to convert list or segments to model inputs:

source = ["Hello world!", "This is a sample"]
hypothesis = ["Oi mundo!", "este é um exemplo"]
reference = ["Olá mundo!", "isto é um exemplo!"]

data = {"src": source, "mt": hypothesis, "ref": reference}
data = [dict(zip(data, t)) for t in zip(*data.values())]

model.predict(data)

Model Zoo:

Model Description
wmt-large-da-estimator-1719 RECOMMENDED: Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19
wmt-base-da-estimator-1719 Estimator model build on top of XLM-R (base) trained on DA from WMT17, WMT18 and WMT19
wmt-large-da-estimator-1718 Estimator model build on top of XLM-R (large) trained on DA from WMT17 and WMT18
wmt-base-da-estimator-1718 Estimator model build on top of XLM-R (base) trained on DA from WMT17 and WMT18
wmt-large-hter-estimator Estimator model build on top of XLM-R (large) trained to regress on HTER.
wmt-base-hter-estimator Estimator model build on top of XLM-R (base) trained to regress on HTER.
emnlp-base-da-ranker Translation ranking model that uses XLM-R to encode sentences. This model was trained with WMT17 and WMT18 Direct Assessments Relative Ranks (DARR).

QE-as-a-metric:

Model Description
wmt-large-qe-estimator-1719 Quality Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19.

Train your own Metric:

Instead of using pretrained models your can train your own model with the following command:

comet train -f {config_file_path}.yaml

Supported encoders:

Tensorboard:

Launch tensorboard with:

tensorboard --logdir="experiments/lightning_logs/"

Download Command:

To download public available corpora to train your new models you can use the download command. For example to download the APEQUEST HTER corpus just run the following command:

comet download -d apequest --saving_path data/

unittest:

pip install coverage

In order to run the toolkit tests you must run the following command:

coverage run --source=comet -m unittest discover
coverage report -m

Code Style:

To make sure all the code follows the same style we use Black.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unbabel-comet-0.0.3.tar.gz (40.3 kB view details)

Uploaded Source

File details

Details for the file unbabel-comet-0.0.3.tar.gz.

File metadata

  • Download URL: unbabel-comet-0.0.3.tar.gz
  • Upload date:
  • Size: 40.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.9

File hashes

Hashes for unbabel-comet-0.0.3.tar.gz
Algorithm Hash digest
SHA256 9a2622d027d16557b8e1c4d5b4c01a6f11a90b5619f959497d6494725890aba5
MD5 89c0813dc3b6ef4d4d01c3d4d0e5eb4e
BLAKE2b-256 eb2381f44a732c99783f5a2de22a851e0e2d01b0ab4b74fff6b6305359f6be7b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page