unbabel-comet

High-quality Machine Translation Evaluation

These details have not been verified by PyPI

Project links

Download

Project description

Note: This is a Pre-Release Version. We are currently working on results for the WMT2020 shared task and will likely update the repository in the beginning of October (after the shared task results).

Quick Installation

Detailed usage examples and instructions can be found in the Full Documentation.

To install COMET as a package, simply run

pip install unbabel-comet

Scoring MT outputs:

Via Bash:

comet score -s path/to/sources.txt -h path/to/hypothesis.txt -r path/to/references.txt

You can export your results to a JSON file using the --to_json flag and select another model/metric with --model.

comet score -s path/to/sources.txt -h path/to/hypothesis.txt -r path/to/references.txt --model wmt-large-hter-estimator --to_json output.json

Via Python:

from comet.models import download_model
model = download_model("wmt-large-da-estimator-1719", "path/where/to/save/models/")
data = [
    {
        "src": "Hello world!",
        "mt": "Oi mundo!",
        "ref": "Olá mundo!"
    },
    {
        "src": "This is a sample",
        "mt": "este é um exemplo",
        "ref": "isto é um exemplo!"
    }
]
model.predict(data)

Simple Pythonic way to convert list or segments to model inputs:

source = ["Hello world!", "This is a sample"]
hypothesis = ["Oi mundo!", "este é um exemplo"]
reference = ["Olá mundo!", "isto é um exemplo!"]

data = {"src": source, "mt": hypothesis, "ref": reference}
data = [dict(zip(data, t)) for t in zip(*data.values())]

model.predict(data)

Model Zoo:

Model	Description
`wmt-large-da-estimator-1719`	RECOMMENDED: Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19
`wmt-base-da-estimator-1719`	Estimator model build on top of XLM-R (base) trained on DA from WMT17, WMT18 and WMT19
`wmt-large-da-estimator-1718`	Estimator model build on top of XLM-R (large) trained on DA from WMT17 and WMT18
`wmt-base-da-estimator-1718`	Estimator model build on top of XLM-R (base) trained on DA from WMT17 and WMT18
`wmt-large-hter-estimator`	Estimator model build on top of XLM-R (large) trained to regress on HTER.
`wmt-base-hter-estimator`	Estimator model build on top of XLM-R (base) trained to regress on HTER.
`emnlp-base-da-ranker`	Translation ranking model that uses XLM-R to encode sentences. This model was trained with WMT17 and WMT18 Direct Assessments Relative Ranks (DARR).

QE-as-a-metric:

Model	Description
`wmt-large-qe-estimator-1719`	Quality Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19.

Train your own Metric:

Instead of using pretrained models your can train your own model with the following command:

comet train -f {config_file_path}.yaml

Supported encoders:

Tensorboard:

Launch tensorboard with:

tensorboard --logdir="experiments/lightning_logs/"

Download Command:

To download public available corpora to train your new models you can use the download command. For example to download the APEQUEST HTER corpus just run the following command:

comet download -d apequest --saving_path data/

unittest:

pip install coverage

In order to run the toolkit tests you must run the following command:

coverage run --source=comet -m unittest discover
coverage report -m

Code Style:

To make sure all the code follows the same style we use Black.

Project details

These details have not been verified by PyPI

Project links

Download

Release history Release notifications | RSS feed

2.2.2

Mar 13, 2024

2.2.1

Jan 8, 2024

2.2.0

Oct 23, 2023

2.1.1

Oct 13, 2023

2.1.0

Sep 21, 2023

2.0.2

Aug 3, 2023

2.0.1

Apr 5, 2023

2.0.0

Mar 13, 2023

1.1.3

Oct 4, 2022

1.1.2

Jun 6, 2022

1.1.1

Jun 1, 2022

1.1.0

Apr 2, 2022

1.0.1

Nov 19, 2021

1.0.0

Nov 19, 2021

1.0.0rc9 pre-release

Oct 21, 2021

1.0.0rc8 pre-release

Oct 18, 2021

1.0.0rc7 pre-release

Oct 18, 2021

1.0.0rc6 pre-release

Sep 28, 2021

1.0.0rc5 pre-release

Sep 4, 2021

1.0.0rc4 pre-release

Aug 16, 2021

1.0.0rc3 pre-release

Aug 15, 2021

1.0.0rc2 pre-release

Aug 10, 2021

1.0.0rc1 pre-release

Jul 27, 2021

0.1.0

Mar 11, 2021

0.0.7

Feb 9, 2021

0.0.6.post2

Nov 25, 2020

0.0.6.post1

Nov 24, 2020

0.0.6

Nov 21, 2020

0.0.4

Oct 8, 2020

This version

0.0.3

Sep 22, 2020

0.0.2

Sep 22, 2020

0.0.1 yanked

Sep 22, 2020

Reason this release was yanked:

missing MANIFEST with reqs

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unbabel-comet-0.0.3.tar.gz (40.3 kB view details)

Uploaded Sep 22, 2020 Source

File details

Details for the file unbabel-comet-0.0.3.tar.gz.

File metadata

Download URL: unbabel-comet-0.0.3.tar.gz
Upload date: Sep 22, 2020
Size: 40.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.9

File hashes

Hashes for unbabel-comet-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`9a2622d027d16557b8e1c4d5b4c01a6f11a90b5619f959497d6494725890aba5`
MD5	`89c0813dc3b6ef4d4d01c3d4d0e5eb4e`
BLAKE2b-256	`eb2381f44a732c99783f5a2de22a851e0e2d01b0ab4b74fff6b6305359f6be7b`