5 projects
subset2evaluate
Find informative examples to efficiently (human-)evaluate NLG models.
pearmut
A tool for evaluation of model outputs, primarily MT.
mt-thresholds
Tool to check how metric deltas for machine translation reflect on system-level human accuracies.
tokenization-scorer
Package for evaluating text tokenizations.
sacrecomet
Tool to guide you through reporting the use of COMET for machine translation evaluation.