Skip to main content

Translation Edit Rate on the character level

Project description

CharacTER

CharacTER: Translation Edit Rate on Character Level

CharacTer is a novel character level metric inspired by the commonly applied translation edit rate (Ter). It is defined as the minimum number of character edits required to adjust a hypothesis, until it completely matches the reference, normalized by the length of the hypothesis sentence. CharacTer calculates the character level edit distance while performing the shift edit on word level. Unlike the strict matching criterion in Ter, a hypothesis word is considered to match a reference word and could be shifted, if the edit distance between them is below a threshold value. The Levenshtein distance between the reference and the shifted hypothesis sequence is computed on the character level. In addition, the lengths of hypothesis sequences instead of reference sequences are used for normalizing the edit distance, which effectively counters the issue that shorter translations normally achieve lower Ter.

Paper can be found under ./WMT2016_CharacTer.pdf

Implementations in CharacTER.py

You may have to install the python package "python-Levenshtein" first.

usage: CharacTER.py [-h] -r REF -o HYP [-v]

CharacTER: Character Level Translation Edit Rate

optional arguments:
-h, --help show this help message and exit
-r REF, --ref REF Reference file
-o HYP, --hyp HYP Hypothesis file
-v, --verbose Print score of each sentence

Please apply 'PYTHONIOENCODING' in environment variables, if UnicodeEncodeError occurs.

Modifications Bram Vanroy

Bram Vanroy packaged this library to be compatible with PyPi. Therefore, some packaging modifications have been done but implementation-wise nothing has changed. The PDF-file of the paper was removed in favor of adding a CITATION file.

The original license applies, i.e., GPL v3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cer-1.0.0.tar.gz (17.1 kB view hashes)

Uploaded Source

Built Distribution

cer-1.0.0-py3-none-any.whl (16.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page