a tool for evaluation
Project description
# SLTev
SLTev is an open-source tool for assessing the quality of spoken language translation (SLT) in a comprehensive way. Based on timestamped reference transcript and reference translation into a target language, SLTev reports the quality, delay and stability of a given SLT candidate output.
SLTev can also evaluate the intermediate steps alone: the output of automatic speech recognition (ASR) and machine translation (MT).
## Requirements
python3.6 or higher
some pip-installed modules: - sacreBLEU [3] - requests, gitpython, gitdir, filelock
mwerSegmenter [1]
mosestokenizer [2]
## File Naming Convention
Depending on whether your system produces (spoken language) translation (SLT), or just the speech recognition (ASR), you should use the following naming template of your input and output files.
### Reference Transcripts: .OSt, .OStt - <file-name> . <language> . <OSt/OStt> - e.g. kaccNlwi6lUCEM.en.OSt, kaccNlwi6lUCEM.cs.OStt
### Word Alignment for Better Estimation: .align - <file-name> . <source-language> . <target-language> . <align> - e.g. kaccNlwi6lUCEM.en.de.align
### System Outputs from Translation: .slt, .mt - <file-name> . <source-language> . <target-language> . <slt/mt> - e.g. kaccNlwi6lUCEM.en.de.slt, kaccNlwi6lUCEM.cs.en.mt
### System Outputs from ASR: .asr - <file-name> . <source-language> . <source-language> . <asr> - e.g. kaccNlwi6lUCEM.en.en.asr
## Installation
Install the Python module (Python 3 only)
` pip3 install SLTev `
Also, you can install from the source:
` python3 setup.py install `
## Package Overview
SLTev: Contains scripts for running SLTev
sample-data: Contains sample input and output files
## Evaluating
SLTev scoring relies on reference outputs (golden transcript for ASR, reference translation for MT and SLT).
You can run SLTev and provide it with your custom reference outputs, or you can pick the easier option: use our provided test set (elitr-testset) to evaluate your system on our inputs. The added benefit of elitr-testset scoring is that it makes your results comparable to others (subject to SLTev and test set versions, of course).
### Evaluating on elitr-testset
SLTev works best if you want to evaluate your system on files provided in elitr-testset (https://github.com/ELITR/elitr-testset).
The procedure is simple: 1. Choose an “index”, i.e. a subset of files that you want to test on, here: https://github.com/ELITR/elitr-testset/tree/master/indices We illustrate the rest with SLTev-sample as the index.
2. Ask SLTev to provide you with the current version of input files: ` SLTev -g SLTev-sample --outdir my-evaluation-run-1 # To use your existing checkout of elitr-testset, add -T /PATH/TO/YOUR/elitr-testset `
Run your models on files in my-evaluation-run-1 and put the outputs into the same directory, with filename suffixes as described above.
4. Run SLTev to get the scores: ` SLTev -e my-evaluation-run-1/ `
### Evaluating with Your Custom Reference Files
Put all input (.OSt, *.OStt, *.align) and output files (.slt/asr/mt) in a folder (e.g. My-folder) and the evaluation would be similar to part 3 (SLTev -e My-folder)
## Terminology and Abbreviations
In the following, we use this notation:
OS … original speech (sound)
OSt … original speech manually transcribed
OStt … original speech manually transcribed with word-level timestamps
IS … human interpreter’s speech (sound)
ISt … IS manually transcribed with word-level timestamps
TT … human textual translation, created from transcribed original speech (OSt); corresponds sentence-by-sentence to OSt
ASR … the unrevised output of speech recognition system; timestamped at the word level
SLT … the unrevised output of spoken language translation, i.e. sentences in the target language corresponding to sentences in the source language; the source of SLT is OS
MT … the unrevised output of text-based translation; the source of MT is ASR (machine-transcribed OS) or OSt (human-transcribed OS)
- ## References
[1] Evgeny Matusov, Gregor Leusch, Oliver Bender, and Hermann Ney. 2005b. Evaluating machine-translation output with automatic sentence segmentation. In International Workshop on Spoken Language Translation, pages 148–154, Pittsburgh, PA, USA. [2] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin and Evan Herbst. 2007. Proceedings of the ACL (Association for Computational Linguistics). [3] Post, Matt. 2018. Association for Computational Linguistics, pages 186-191.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for SLTev-1.0.5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0a22e25a082e3ebe98ef78aeba286a7ec40f27e7b004f947959b74ecc2babe8 |
|
MD5 | 43f63cb0b500a807b9f2312f60d0ff48 |
|
BLAKE2b-256 | 5b2c3c386ae416d62a6f96af8ae12f2a960ffea0b774a8839aa2102c95cedef1 |