Skip to main content

Multi-metric evaluation toolkit supporting MT, ASR, TTS, SimulST, VC, and Paralinguistics with optimized CJK language support

Project description

MultiMetric-Eval

English | 中文

PyPI version Python 3.8+ License: MIT

MultiMetric-Eval is an evaluation toolkit centered on translation and speech translation. It provides a unified way to score text translation quality, speech output quality, preservation-related properties, and streaming latency.

What It Can Be Used For

This project is best suited for these directions:

  • MT or S2TT text-side evaluation with BLEU, chrF++, COMET, and BLEURT
  • S2ST evaluation by combining text quality, speech quality, speaker similarity, and latency
  • Streaming or simultaneous speech translation latency evaluation with a custom agent
  • Preservation analysis for speech translation outputs, including speaker similarity, emotion, and paralinguistic similarity

Capability Boundary

MultiMetric-Eval is an evaluator, not a model training or inference framework.

It is a good fit when you already have model outputs and want to score them in a consistent way.

It is not designed to be:

  • a general-purpose ASR toolkit
  • a general-purpose TTS toolkit
  • a model serving framework
  • a replacement for task-specific toolkits in unrelated speech domains

Core Modules

Module Main Use Typical Metrics
TranslationEvaluator Text-side translation quality sacreBLEU, chrF++, COMET, BLEURT
SpeechQualityEvaluator Naturalness and text-speech consistency UTMOS, WER_Consistency, CER_Consistency
SpeakerSimilarityEvaluator Speaker preservation wavlm_similarity, resemblyzer_similarity
EmotionEvaluator Emotion preservation or classification accuracy Emotion2Vec_Cosine_Similarity, Audio_Emotion_Accuracy
ParalinguisticEvaluator Non-verbal and paralinguistic similarity Paralinguistic_Fidelity_Cosine
LatencyEvaluator Streaming / simultaneous translation latency StartOffset, ATD, CustomATD, RTF

Installation

Basic install:

pip install multimetriceval

Optional extras:

pip install "multimetriceval[comet]"
pip install "multimetriceval[whisper]"
pip install "multimetriceval[emotion]"
pip install "multimetriceval[paralinguistics]"
pip install "multimetriceval[all]"

If you need BLEURT:

pip install git+https://github.com/lucadiliello/bleurt-pytorch.git

Import

PyPI package name:

multimetriceval

Python import name:

multimetric_eval

Example:

from multimetric_eval import TranslationEvaluator, SpeechQualityEvaluator

Quick Start

Text Translation

from multimetric_eval import TranslationEvaluator

evaluator = TranslationEvaluator(
    use_bleu=True,
    use_chrf=True,
    use_comet=False,
    use_bleurt=False,
    device="cuda",
)

results = evaluator.evaluate_all(
    reference=["我喜欢看电影。"],
    target_text=["我喜欢看电影。"],
    source=["I like watching movies."],
    target_lang="zh",
)

print(results)

Speech Quality

from multimetric_eval import SpeechQualityEvaluator

evaluator = SpeechQualityEvaluator(
    use_wer=True,
    use_utmos=True,
    whisper_model="medium",
    device="cuda",
)

results = evaluator.evaluate_all(
    target_audio="./generated_wavs",
    target_text=["你好世界", "这是一个测试"],
    target_lang="zh",
)

print(results)

Examples

Examples have been moved into the examples/ directory.

Python Examples

  • examples/python/translation_eval.py
  • examples/python/speech_quality_eval.py
  • examples/python/speaker_similarity_eval.py
  • examples/python/emotion_eval.py
  • examples/python/paralinguistic_eval.py
  • examples/python/latency_eval.py

Bash Examples

  • examples/bash/install_extras.sh
  • examples/bash/run_latency_cli.sh

Full Evaluation Pipelines

For larger end-to-end evaluation scripts, see test/:

  • test/run_full_eval_seamless.py
  • test/run_full_eval_vallex.py
  • test/run_full_eval_simulmega.py
  • test/run_full_eval_cascade.py

Input Conventions

Common text inputs support:

  • Python List[str]
  • .txt files with one sample per line
  • .json files

Common audio inputs support:

  • folder path
  • Python List[str]
  • .txt files
  • .json files

Notes

  • For zh / ja / ko, the toolkit uses CJK-aware handling for text-side evaluation.
  • SpeechQualityEvaluator returns CER_Consistency for zh / ja / ko, and WER_Consistency for most other languages.
  • ParalinguisticEvaluator currently reports only Paralinguistic_Fidelity_Cosine, an embedding-based continuous similarity metric between source and target audio.
  • Some modules rely on optional dependencies or local model paths in offline environments.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multimetriceval-0.8.0.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multimetriceval-0.8.0-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file multimetriceval-0.8.0.tar.gz.

File metadata

  • Download URL: multimetriceval-0.8.0.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for multimetriceval-0.8.0.tar.gz
Algorithm Hash digest
SHA256 9c543d680eb6f44f3453b557e806b24733f6aaa166da5a80d92120c7c9300c8b
MD5 c9e1206718e7d8cd3993b7b7b413a2c0
BLAKE2b-256 9b0aa22d1ad297c8dd693c2df4af242c260ebafdc774836ec5b6076a6bcb55bc

See more details on using hashes here.

File details

Details for the file multimetriceval-0.8.0-py3-none-any.whl.

File metadata

File hashes

Hashes for multimetriceval-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5fde51ac92b70ec8509b2d46f7deea40174950aaa6143e6f35e54d797fce73ac
MD5 91957828af1e22934b2f7822f46098db
BLAKE2b-256 a5e6a3cc77c156bea9c8762446975cccff9e58d431ebb37d71a9affb419a37bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page