Transformer based translation quality estimation
Project description
TransQuest : Translation Quality Estimation with Cross-lingual Transformers.
TransQuest provides state-of-the-art models for translation quality estimation.
We are the winning solution in WMT 2020 Quality Estimation Shared Task - Sentence-Level Direct Assessment
Features
- Sentence-level translation quality estimation on both aspects: predicting post editing efforts and direct assessment.
- Perform significantly better than current state-of-the-art quality estimation methods like DeepQuest and OpenKiwi in all the languages experimented.
- Pre-trained quality estimation models for seven languages.
Installation
You first need to install PyTorch. The recommended PyTorch version is 1.5. Please refer to PyTorch installation page regarding the specific install command for your platform.
When PyTorch has been installed, you can install TransQuest from source or from pip.
From Source
git clone https://github.com/TharinduDR/TransQuest.git
cd TransQuest
pip install -r requirements.txt
From pip
pip install transquest
Run the examples
Examples are included in the repository but are not shipped with the library.
- WMT 2020 Sentence-level Direct Assessment QE Shared Task
- WMT 2020 Sentence-level Post-Editing Effort QE Shared Task
- WMT 2019 Sentence-level Post-Editing Effort QE Shared Task
- WMT 2018 Sentence-level Post-Editing Effort QE Shared Task
TransQuest Model Zoo
Following pre-trained models are released. We will be keep releasing new models. Please keep in touch.
Language Pair | Objective | Algorithm | Model Link | Data | Pearson | MAE | RMSE |
---|---|---|---|---|---|---|---|
Romanian-English (NMT) | Direct | TransQuest | model.zip | WMT 2020 | 0.8982 | 0.3121 | 0.4097 |
SiameseTransQuest | model.zip | WMT 2020 | 0.8501 | 0.3637 | 0.4932 | ||
OpenKiwi | WMT 2020 | 0.6845 | 0.7596 | 1.0522 | |||
Estonian-English (NMT) | Direct | TransQuest | model.zip | WMT 2020 | 0.7748 | 0.5904 | 0.7321 |
SiameseTransQuest | model.zip | WMT 2020 | 0.6804 | 0.7047 | 0.9022 | ||
OpenKiwi | WMT 2020 | 0.4770 | 0.9176 | 1.1382 | |||
Nepalese-English (NMT) | Direct | TransQuest | model.zip | WMT 2020 | 0.7914 | 0.3975 | 0.5078 |
SiameseTransQuest | model.zip | WMT 2020 | 0.6081 | 0.6531 | 0.7950 | ||
OpenKiwi | WMT 2020 | 0.3860 | 0.7353 | 0.8713 | |||
Sinhala-English (NMT) | Direct | TransQuest | model.zip | WMT 2020 | 0.6525 | 0.4510 | 0.5570 |
SiameseTransQuest | model.zip | WMT 2020 | 0.5957 | 0.5078 | 0.6466 | ||
OpenKiwi | WMT 2020 | 0.3737 | 0.7517 | 0.8978 | |||
Russian-English (NMT) | Direct | TransQuest | model.zip | WMT 2020 | 0.7734 | 0.5076 | 0.6856 |
SiameseTransQuest | model.zip | WMT 2020 | |||||
OpenKiwi | WMT 2020 | 0.5479 | 0.8253 | 1.1930 | |||
English-German (NMT) | Direct | TransQuest | model.zip | WMT 2020 | 0.4669 | 0.6474 | 0.7762 |
SiameseTransQuest | model.zip | WMT 2020 | |||||
OpenKiwi | WMT 2020 | 0.1455 | 0.6791 | 0.9670 | |||
HTER | TransQuest | model.zip | WMT 2020 | 0.4994 | 0.1486 | 0.1842 | |
SiameseTransQuest | model.zip | WMT 2020 | |||||
OpenKiwi | WMT 2020 | 0.3916 | 0.1500 | 0.1896 | |||
English-Chinese (NMT) | Direct | TransQuest | model.zip | WMT 2020 | 0.4779 | 0.9865 | 1.1338 |
SiameseTransQuest | model.zip | WMT 2020 | 0.4067 | 1.0389 | 1.1973 | ||
OpenKiwi | WMT 2020 | 0.1676 | 0.6559 | 0.8503 | |||
HTER | TransQuest | model.zip | WMT 2020 | 0.5910 | 0.1400 | 0.1717 | |
SiameseTransQuest | model.zip | WMT 2020 | |||||
OpenKiwi | WMT 2020 | 0.5058 | 0.1470 | 0.1814 | |||
English-Latvian (SMT) | HTER | TransQuest | model.zip | WMT 2018 | 0.7141 | 0.1041 | 0.1420 |
SiameseTransQuest | WMT 2018 | ||||||
Quest++ | WMT 2018 | 0.3528 | 0.1554 | 0.1919 | |||
English-Latvian (NMT) | HTER | TransQuest | model.zip | WMT 2018 | 0.7450 | 0.1162 | 0.1601 |
SiameseTransQuest | WMT 2018 | ||||||
Quest++ | WMT 2018 | 0.4435 | 0.1625 | 0.2164 | |||
English-German (SMT) | HTER | TransQuest | model.zip | WMT 2018 | 0.7355 | 0.0967 | 0.1300 |
SiameseTransQuest | WMT 2018 | ||||||
Quest++ | WMT 2018 | 0.3653 | 0.1402 | 0.1772 | |||
English-Czech (SMT) | HTER | TransQuest | model.zip | WMT 2018 | 0.7150 | 0.1198 | 0.1611 |
SiameseTransQuest | WMT 2018 | ||||||
Quest++ | WMT 2018 | 0.3943 | 0.1651 | 0.2110 | |||
German-English (SMT) | HTER | TransQuest | model.zip | WMT 2018 | 0.7878 | 0.0934 | 0.1277 |
SiameseTransQuest | WMT 2018 | ||||||
Quest++ | WMT 2018 | 0.3323 | 0.1508 | 0.1928 |
Once downloading them and unzipping it, they can be loaded easily
model = QuestModel("xlmroberta", "path", num_labels=1,
use_cuda=torch.cuda.is_available(), args=transformer_config)
model = SiameseTransQuestModel("path")
Citations
Please consider citing us if you use the library.
@InProceedings{transquest:2020,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {TransQuest: Translation Quality Estimation with Cross-lingual Transformers},
booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
year = {2020}
}
The task specific paper on 2020 WMT sentence-level DA that won the first place in the competition.
@InProceedings{transquest:2020,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {TransQuest at WMT2020: Sentence-Level Direct Assessment},
booktitle = {Proceedings of the Fifth Conference on Machine Translation},
year = {2020}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.