Skip to main content

Library to automate machine translation evaluation

Project description

mteval

Introduction

This library enables easy, automated machine translation evaluation using the evaluation tools sacreBLEU and COMET. While the evaluation tools readily provide command line access, they lack dataset handling and translation of datasets with major online machine translation services. This is provided by this mteval library along with code that logs evaluation results and enables easier automation for multiple datasets and MT systems from Python.

Install

Installing the library from PyPI

pip install mteval

Setting up Cloud authentication and parameters in the environment

This library currently supports the cloud translation services Amazon Translate, DeepL, Google Translate and Microsoft Translator. To authenticate with the services and configure them, you need to set the following enviroment variables:

export GOOGLE_APPLICATION_CREDENTIALS='/path/to/google/credentials/file.json'
export GOOGLE_PROJECT_ID=''
export MS_SUBSCRIPTION_KEY=''
export MS_REGION=''
export AWS_DEFAULT_REGION=''
export AWS_ACCESS_KEY_ID=''
export AWS_SECRET_ACCESS_KEY=''
export DEEPL_API_KEY=''
export MMT_API_KEY=''

How to obtain subscription credentials

You can set the environment values by adding above export statements to your .bashrc file in Linux or in Jupyter notebook by adding environment variables to the kernel configuration file kernel.json.

This library has only been tested on Linux, not Windows or MacOS.

On Google Colab: Loading the environment from a .env file

Google Colab, which is a hosted cloud solution for Jupyter notebooks with GPU runtimes, doesn’t support persistent environment variables. The environment variables can be stored in a .env file on Google Drive and loaded at each start of a notebook using mteval.

import os
running_in_colab = 'google.colab' in str(get_ipython())
if running_in_colab:
    from google.colab import drive
    drive.mount('/content/drive')
    homedir = "/content/drive/MyDrive"
else:
    homedir = os.getenv('HOME')

Run the following cell to install mteval from PyPI

!pip install mteval

Run the following cell to install mteval from the Github repository

!pip install git+https://github.com/polyglottech/mteval.git
from dotenv import load_dotenv

if running_in_colab:
    # Colab doesn't have a mechanism to set environment variables other than python-dotenv
    env_file = homedir+'/secrets/.env'

Also make sure to store the Google Cloud credentials JSON file on Google Drive, e.g. in the /content/drive/MyDrive/secrets/ folder.

How to use

This is a short example how to translate a few sentences and how to score the machine translations with BLEU using human reference translations. See the reference documentation for a complete list of functions.

from mteval.microsoftmt import *
from mteval.bleu import *
import json
sources = ["Puissiez-vous passer une semaine intéressante et enrichissante avec nous.",
           "Honorables sénateurs, je connais, bien entendu, les références du ministre de l'Environnement et je pense que c'est une personne admirable.",
           "Il est certain que le renforcement des forces de maintien de la paix et l'envoi d'autres casques bleus ne suffiront pas, compte tenu du mauvais fonctionnement des structures de contrôle et de commandement là-bas."]
references = ["May you have an interesting and useful week with us.",
              "Honourable senators, I am, of course, familiar with the credentials of the Minister of the Environment and consider him an admirable person.",
              "Surely, strengthening and adding more peacekeepers is not sufficient when we know the command and control structures are not working."]

hypotheses = []
msmt = microsofttranslate()
for source in sources:
    translation = msmt.translate_text("fr","en",source)
    print(translation)
    hypotheses.append(translation)

score = json.loads(measure_bleu(hypotheses,references,"en"))
print(score)

The source texts and references are from the Canadian Hansard corpus. For real-world evaluation, the set would have to be at least 100-200 segments long.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mteval-0.0.2.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

mteval-0.0.2-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file mteval-0.0.2.tar.gz.

File metadata

  • Download URL: mteval-0.0.2.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for mteval-0.0.2.tar.gz
Algorithm Hash digest
SHA256 2175eacbceb492515f44141259fe0df8ae469cd79304134e8acc3b88ee904daa
MD5 4357291d85dc7ec4d6c0e017f5a8f374
BLAKE2b-256 0efceded1f01a8682d7a3ba01b0d83f62a521b58d01a0f8963d515047469cb5b

See more details on using hashes here.

File details

Details for the file mteval-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: mteval-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for mteval-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b0596b9db5c50a5eea5ef80d600a1de2334008b95f8c5e35232a4667fc443ab7
MD5 64814e7cc7313ab111d7721822336fb2
BLAKE2b-256 089ee201da25253cc93568ab19e723e1febd1d8eea85572f7e9e3a74e6012f67

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page