Skip to main content

Multi-metric evaluation toolkit supporting MT, ASR, TTS, SimulST, VC, and Paralinguistics with optimized CJK language support

Project description

MultiMetric-Eval

English | 中文

PyPI version Python 3.8+ License: MIT

MultiMetric-Eval is an evaluation toolkit centered on translation and speech translation. It provides a unified way to score text translation quality, speech output quality, preservation-related properties, and streaming latency.

What It Can Be Used For

This project is best suited for these directions:

  • MT or S2TT text-side evaluation with BLEU, chrF++, COMET, and BLEURT
  • S2ST evaluation by combining text quality, speech quality, speaker similarity, and latency
  • Streaming or simultaneous speech translation latency evaluation with a custom agent
  • Preservation analysis for speech translation outputs, including speaker similarity, emotion, and paralinguistic similarity

Core Modules

Module Main Use Typical Metrics
TranslationEvaluator Text-side translation quality sacreBLEU, chrF++, COMET, BLEURT
SpeechQualityEvaluator Naturalness and text-speech consistency UTMOS, WER_Consistency, CER_Consistency
SpeakerSimilarityEvaluator Speaker preservation wavlm_similarity, resemblyzer_similarity
EmotionEvaluator Emotion preservation or classification accuracy Emotion2Vec_Cosine_Similarity, Audio_Emotion_Accuracy
ParalinguisticEvaluator Non-verbal and paralinguistic preservation Paralinguistic_Fidelity_Cosine, Acoustic_Event_Preservation_Rate, Acoustic_Event_Preservation_Macro_F1, Acoustic_Event_Preservation_Macro_Recall
LatencyEvaluator Streaming / simultaneous translation latency StartOffset, ATD, CustomATD, RTF, Model_Generate_RTF

Installation

Basic install:

pip install multimetriceval

Optional extras:

pip install "multimetriceval[comet]"
pip install "multimetriceval[whisper]"
pip install "multimetriceval[emotion]"
pip install "multimetriceval[paralinguistics]"
pip install "multimetriceval[all]"

If you need BLEURT:

pip install git+https://github.com/lucadiliello/bleurt-pytorch.git

Import

PyPI package name:

multimetriceval

Python import name:

multimetric_eval

Example:

from multimetric_eval import TranslationEvaluator, SpeechQualityEvaluator

Quick Start

Quick-start scripts live under examples/.

Python examples:

  • examples/python/translation_eval.py
  • examples/python/speech_quality_eval.py
  • examples/python/speaker_similarity_eval.py
  • examples/python/emotion_eval.py
  • examples/python/paralinguistic_eval.py
  • examples/python/paralinguistic_identity_baseline.py
  • examples/python/latency_eval.py

Shell examples:

  • examples/bash/install_extras.sh
  • examples/bash/run_latency_cli.sh

Latency output now distinguishes two RTF variants:

  • Real_Time_Factor_(RTF): system-level RTF. This includes agent policy overhead, pre/post-processing, and other runtime costs around model inference.
  • Model_Generate_RTF: model-level RTF. This is reported only when the agent explicitly records model inference time via record_model_inference_time(...) or returns it in Segment.config["model_inference_time"].

Examples

Examples have been moved into the examples/ directory.

Input Conventions

Common text inputs support:

  • Python List[str]
  • .txt files with one sample per line
  • .json files

Common audio inputs support:

  • folder path
  • Python List[str]
  • .txt files
  • .json files

Notes

  • For zh / ja / ko, the toolkit uses CJK-aware handling for text-side evaluation.
  • SpeechQualityEvaluator returns CER_Consistency for zh / ja / ko, and WER_Consistency for most other languages.
  • ParalinguisticEvaluator always supports Paralinguistic_Fidelity_Cosine, a continuous CLAP-based audio similarity score between source and target speech.
  • The discrete branch is now an utterance-level single-label preservation task. With source-side gold labels, it reports Acoustic_Event_Preservation_Rate, Acoustic_Event_Preservation_Macro_F1, and Acoustic_Event_Preservation_Macro_Recall.
  • The discrete branch does not use timestamps. It answers whether the source-side acoustic event is preserved somewhere in the target utterance, not whether it is aligned at the same time position.
  • If source-side gold labels are not available, the evaluator can still run in prediction-only mode and reports Predicted_Event_Consistency_Rate, Predicted_Event_Consistency_Macro_F1, and Predicted_Event_Consistency_Macro_Recall.
  • The default discrete predictor is a closed-set CLAP classifier over candidate_labels. Users may replace it with any custom predictor object that implements predict(audio_paths, candidate_labels).
  • Dataset-specific label mapping is intentionally outside the core package. Pass candidate_labels and label_normalizer at call time so the same evaluator works across datasets without changing core code.
  • For offline environments, clap_model_path accepts either a Hugging Face repo id or a local model directory or snapshot.
  • In S2S latency evaluation, alignment prefers the model's native transcript when available. If the model is audio-only, the evaluator can optionally use ASR fallback to prepare alignment text.
  • For S2S forced alignment, pass language-appropriate MFA models through alignment_acoustic_model and alignment_dictionary_model. The defaults are English.
  • Some modules rely on optional dependencies or local model paths in offline environments.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multimetriceval-0.8.3.tar.gz (37.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multimetriceval-0.8.3-py3-none-any.whl (40.4 kB view details)

Uploaded Python 3

File details

Details for the file multimetriceval-0.8.3.tar.gz.

File metadata

  • Download URL: multimetriceval-0.8.3.tar.gz
  • Upload date:
  • Size: 37.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for multimetriceval-0.8.3.tar.gz
Algorithm Hash digest
SHA256 093a52c5c2cf618a2948d10f026238cb0aae11b3536ab2b45e88f52d98172235
MD5 b2266762ac1f742f100c1c9d82fa4205
BLAKE2b-256 b2dafdf01ad4c184f07d7ccb9d6a6fe73d2779c710a27e2f4945c17f72d771fb

See more details on using hashes here.

File details

Details for the file multimetriceval-0.8.3-py3-none-any.whl.

File metadata

File hashes

Hashes for multimetriceval-0.8.3-py3-none-any.whl
Algorithm Hash digest
SHA256 725ad9dfaf05920c756b60650af6e6ad24ba1c49b1c9e239ea781b9e6461da4b
MD5 d98acba6cc53eb100feba8a6b5bf4948
BLAKE2b-256 cad0eaae109de8e63c47192b9a5dbd76cfe8b68c77cc70739b5db35fbc972e23

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page