Multi-metric evaluation toolkit supporting MT, ASR, TTS, SimulST, VC, and Paralinguistics with optimized CJK language support

These details have not been verified by PyPI

Project links

Project description

MultiMetric-Eval

English | 中文

MultiMetric-Eval is an evaluation toolkit centered on translation and speech translation. It provides a unified way to score text translation quality, speech output quality, preservation-related properties, and streaming latency.

What It Can Be Used For

This project is best suited for these directions:

MT or S2TT text-side evaluation with BLEU, chrF++, COMET, and BLEURT
S2ST evaluation by combining text quality, speech quality, speaker similarity, and latency
Streaming or simultaneous speech translation latency evaluation with a custom agent
Preservation analysis for speech translation outputs, including speaker similarity, emotion, and paralinguistic similarity

Capability Boundary

MultiMetric-Eval is an evaluator, not a model training or inference framework.

It is a good fit when you already have model outputs and want to score them in a consistent way.

It is not designed to be:

a general-purpose ASR toolkit
a general-purpose TTS toolkit
a model serving framework
a replacement for task-specific toolkits in unrelated speech domains

Core Modules

Module	Main Use	Typical Metrics
`TranslationEvaluator`	Text-side translation quality	`sacreBLEU`, `chrF++`, `COMET`, `BLEURT`
`SpeechQualityEvaluator`	Naturalness and text-speech consistency	`UTMOS`, `WER_Consistency`, `CER_Consistency`
`SpeakerSimilarityEvaluator`	Speaker preservation	`wavlm_similarity`, `resemblyzer_similarity`
`EmotionEvaluator`	Emotion preservation or classification accuracy	`Emotion2Vec_Cosine_Similarity`, `Audio_Emotion_Accuracy`
`ParalinguisticEvaluator`	Non-verbal and paralinguistic similarity	`Paralinguistic_Fidelity_Cosine`, `Discrete_Acoustic_Event_F1_Strict`, `Discrete_Acoustic_Event_F1_Relaxed`
`LatencyEvaluator`	Streaming / simultaneous translation latency	`StartOffset`, `ATD`, `CustomATD`, `RTF`, `Model_Generate_RTF`

Installation

Basic install:

pip install multimetriceval

Optional extras:

pip install "multimetriceval[comet]"
pip install "multimetriceval[whisper]"
pip install "multimetriceval[emotion]"
pip install "multimetriceval[paralinguistics]"
pip install "multimetriceval[all]"

If you need BLEURT:

pip install git+https://github.com/lucadiliello/bleurt-pytorch.git

Import

PyPI package name:

multimetriceval

Python import name:

multimetric_eval

Example:

from multimetric_eval import TranslationEvaluator, SpeechQualityEvaluator

Quick Start

Text Translation

from multimetric_eval import TranslationEvaluator

evaluator = TranslationEvaluator(
    use_bleu=True,
    use_chrf=True,
    use_comet=False,
    use_bleurt=False,
    device="cuda",
)

results = evaluator.evaluate_all(
    reference=["我喜欢看电影。"],
    target_text=["我喜欢看电影。"],
    source=["I like watching movies."],
    target_lang="zh",
)

print(results)

Speech Quality

from multimetric_eval import SpeechQualityEvaluator

evaluator = SpeechQualityEvaluator(
    use_wer=True,
    use_utmos=True,
    whisper_model="medium",
    device="cuda",
)

results = evaluator.evaluate_all(
    target_audio="./generated_wavs",
    target_text=["你好世界", "这是一个测试"],
    target_lang="zh",
)

print(results)

Latency

from multimetric_eval import GenericAgent, LatencyEvaluator, ReadAction, WriteAction


class WaitUntilEndAgent(GenericAgent):
    def policy(self, states=None):
        states = states or self.states

        if not states.source_finished:
            return ReadAction()

        if not states.target_finished:
            prediction = "hello world"
            self.record_model_inference_time(0.12)
            return WriteAction(prediction, finished=True)

        return ReadAction()


agent = WaitUntilEndAgent()
evaluator = LatencyEvaluator(agent, segment_size=20)

Paralinguistics

from multimetric_eval import ParalinguisticEvaluator

evaluator = ParalinguisticEvaluator(
    use_continuous_fidelity=True,
    use_discrete_event_f1=True,
    discrete_event_config={
        "detector_backend": "panns",
        "score_threshold": 0.3,
    },
    device="cuda",
)

results = evaluator.evaluate_all(
    source_audio=["./src_wavs/sample_001.wav"],
    target_audio=["./tgt_wavs/sample_001.wav"],
    source_event_annotations=[
        [
            {"label": "laugh", "start_ms": 1200, "end_ms": 1850},
            {"label": "cough", "start_ms": 4200, "end_ms": 4550},
        ]
    ],
    event_label_mapping={
        "Laughter": "laugh",
        "Giggle": "laugh",
        "Cough": "cough",
    },
)

print(results)

Latency output now distinguishes two RTF variants:

Real_Time_Factor_(RTF): system-level RTF. This includes agent policy overhead, pre/post-processing, and other runtime costs around model inference.
Model_Generate_RTF: model-level RTF. This is reported only when the agent explicitly records model inference time via record_model_inference_time(...) or returns it in Segment.config["model_inference_time"].

Examples

Examples have been moved into the examples/ directory.

Python Examples

examples/python/translation_eval.py
examples/python/speech_quality_eval.py
examples/python/speaker_similarity_eval.py
examples/python/emotion_eval.py
examples/python/paralinguistic_eval.py
examples/python/latency_eval.py

Bash Examples

examples/bash/install_extras.sh
examples/bash/run_latency_cli.sh

Full Evaluation Pipelines

For larger end-to-end evaluation scripts, see test/:

test/run_full_eval_seamless.py
test/run_full_eval_vallex.py
test/run_full_eval_simulmega.py
test/run_full_eval_cascade.py

Input Conventions

Common text inputs support:

Python List[str]
.txt files with one sample per line
.json files

Common audio inputs support:

folder path
Python List[str]
.txt files
.json files

Notes

For zh / ja / ko, the toolkit uses CJK-aware handling for text-side evaluation.
SpeechQualityEvaluator returns CER_Consistency for zh / ja / ko, and WER_Consistency for most other languages.
ParalinguisticEvaluator reports Paralinguistic_Fidelity_Cosine through CLAP and can also report discrete event preservation with Discrete_Acoustic_Event_F1_Strict and Discrete_Acoustic_Event_F1_Relaxed.
The built-in discrete event detector currently uses a PANNs backend and requires the paralinguistics extra.
For discrete event F1, source-side event labels are expected to be canonical; event_label_mapping is applied on target-side detector labels so users can adapt different datasets or label ontologies.
Samples with no reference events and no predicted events are skipped for discrete event F1 aggregation.
In S2S latency evaluation, alignment prefers the model's native transcript when available. If the model is audio-only, the evaluator can optionally use ASR fallback to prepare alignment text.
For S2S forced alignment, pass language-appropriate MFA models through alignment_acoustic_model and alignment_dictionary_model. The defaults are English.
Some modules rely on optional dependencies or local model paths in offline environments.

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.4

Apr 21, 2026

0.8.3

Apr 20, 2026

0.8.2

Apr 19, 2026

This version

0.8.1

Apr 18, 2026

0.8.0

Apr 18, 2026

0.7.2

Apr 16, 2026

0.7.1

Apr 14, 2026

0.7.0

Mar 31, 2026

0.6.3

Mar 31, 2026

0.6.2

Mar 30, 2026

0.6.1

Mar 27, 2026

0.6.0

Mar 26, 2026

0.5.4

Mar 9, 2026

0.5.3

Mar 9, 2026

0.5.2

Mar 9, 2026

0.5.1

Mar 9, 2026

0.5.0

Mar 7, 2026

0.4.4

Mar 4, 2026

0.4.2

Feb 26, 2026

0.4.1

Feb 14, 2026

0.4.0

Feb 14, 2026

0.3.0

Feb 13, 2026

0.2.1

Feb 12, 2026

0.2.0

Feb 12, 2026

0.1.4

Feb 11, 2026

0.1.3

Feb 11, 2026

0.1.2

Feb 8, 2026

0.1.1

Feb 8, 2026

0.1.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multimetriceval-0.8.1.tar.gz (39.7 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

multimetriceval-0.8.1-py3-none-any.whl (41.9 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file multimetriceval-0.8.1.tar.gz.

File metadata

Download URL: multimetriceval-0.8.1.tar.gz
Upload date: Apr 18, 2026
Size: 39.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for multimetriceval-0.8.1.tar.gz
Algorithm	Hash digest
SHA256	`39c970976e13abaeeaedcee7b6aceb4b9d45d1ae3007ecc255ab42ff7db33056`
MD5	`64dda310d31ba9b3ff7178eaec5fa54f`
BLAKE2b-256	`2831874557bcc9e042fc1394482d61c927b8b873ea4e69bc1603aae4d762363e`

See more details on using hashes here.

File details

Details for the file multimetriceval-0.8.1-py3-none-any.whl.

File metadata

Download URL: multimetriceval-0.8.1-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 41.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for multimetriceval-0.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e88876706d6d719d98101acb3b102089711e7b006a3dbedac13b681cfa4f5a78`
MD5	`2a5e0bf13ce3cde9f60c297ea1356aaf`
BLAKE2b-256	`af1ced229099ee982032a5d138807cfd3ffe3c5ba0eafe1fbd3817dce4f75f9f`

See more details on using hashes here.

multimetriceval 0.8.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

MultiMetric-Eval

What It Can Be Used For

Capability Boundary

Core Modules

Installation

Import

Quick Start

Text Translation

Speech Quality

Latency

Paralinguistics

Examples

Python Examples

Bash Examples

Full Evaluation Pipelines

Input Conventions

Notes

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes