High-quality Machine Translation Evaluation
Reason this release was yanked:
missing MANIFEST with reqs
Project description
Quick Installation
To install COMET as a package, simply run
pip install unbabel-comet
Scoring MT outputs:
Via Bash:
comet score -s path/to/sources.txt -h path/to/hypothesis.txt -r path/to/references.txt --model wmt-large-da-estimator-1719
You can export your results to a JSON file using the --to_json
flag.
comet score -s path/to/sources.txt -h path/to/hypothesis.txt -r path/to/references.txt --model wmt-large-da-estimator-1719 --to_json output.json
Via Python:
from comet.models import download_model
model = download_model("wmt-large-da-estimator-1719", "path/where/to/save/models")
data = [
{
"src": "Hello world!",
"mt": "Oi mundo!",
"ref": "Olá mundo!"
},
{
"src": "This is a sample",
"mt": "este é um exemplo",
"ref": "isto é um exemplo!"
}
]
model.predict(data)
Simple Pythonic way to convert list or segments to model inputs:
source = ["Hello world!", "This is a sample"]
hypothesis = ["Oi mundo!", "este é um exemplo"]
reference = ["Olá mundo!", "isto é um exemplo!"]
data = {"src": source, "mt": hypothesis, "ref": reference}
data = [dict(zip(data, t)) for t in zip(*data.values())]
model.predict(data)
Model Zoo:
Model | Description |
---|---|
wmt-large-da-estimator-1719 |
RECOMMENDED: Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19 |
wmt-base-da-estimator-1719 |
Estimator model build on top of XLM-R (base) trained on DA from WMT17, WMT18 and WMT19 |
wmt-large-da-estimator-1718 |
Estimator model build on top of XLM-R (large) trained on DA from WMT17 and WMT18 |
wmt-base-da-estimator-1718 |
Estimator model build on top of XLM-R (base) trained on DA from WMT17 and WMT18 |
wmt-large-hter-estimator |
Estimator model build on top of XLM-R (large) trained to regress on HTER. |
wmt-base-hter-estimator |
Estimator model build on top of XLM-R (base) trained to regress on HTER. |
emnlp-base-da-ranker |
Translation ranking model that uses XLM-R to encode sentences. This model was trained with WMT17 and WMT18 Direct Assessments Relative Ranks (DARR). |
QE-as-a-metric:
Model | Description |
---|---|
wmt-large-qe-estimator-1719 |
Quality Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19. |
Train your own Metric:
Instead of using pretrained models your can train your own model with the following command:
comet train -f {config_file_path}.yaml
Supported encoders:
- Learning Joint Multilingual Sentence Representations with Neural Machine Translation
- Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- XLM-R: Unsupervised Cross-lingual Representation Learning at Scale
Tensorboard:
Launch tensorboard with:
tensorboard --logdir="experiments/lightning_logs/"
Download Command:
To download public available corpora to train your new models you can use the download
command. For example to download the APEQUEST HTER corpus just run the following command:
comet download -d apequest --saving_path data/
unittest:
pip install coverage
In order to run the toolkit tests you must run the following command:
coverage run --source=comet -m unittest discover
coverage report -m
Code Style:
To make sure all the code follows the same style we use Black.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file unbabel-comet-0.0.1.tar.gz
.
File metadata
- Download URL: unbabel-comet-0.0.1.tar.gz
- Upload date:
- Size: 39.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88cda5e41c8df4cee98e318688547d7aa918107f5cf6873c01d4da48b1267dbf |
|
MD5 | 8703c4635a4ddaef07367060a5a7a647 |
|
BLAKE2b-256 | 02e88cfa04c6945a2cd393c2ec9aee4a87378145ace29e5fbf697394aa49f0a6 |