Unified multidimensional evaluation toolkit for S2TT and S2ST systems in offline and streaming speech translation settings

These details have not been verified by PyPI

Project links

Project description

OpenSTBench

English | 中文

OpenSTBench is a multidimensional evaluation toolkit for speech translation. It is designed for heterogeneous systems, including speech-to-text translation (S2TT), speech-to-speech translation (S2ST), offline systems, and streaming systems.

The toolkit organizes evaluation into three dimensions:

Translation Quality: whether the translated text preserves the source meaning.
Speech Quality: whether generated speech is natural, text-consistent, speaker-preserving, emotion-preserving, and faithful to non-verbal or paralinguistic events.
Temporal Quality: whether generated speech preserves duration structure and, for streaming systems, whether output is responsive.

Installation

pip install OpenSTBench

For local development:

pip install -e .

Optional extras:

pip install "OpenSTBench[comet]"
pip install "OpenSTBench[whisper]"
pip install "OpenSTBench[speech_quality]"
pip install "OpenSTBench[emotion]"
pip install "OpenSTBench[paralinguistics]"
pip install "OpenSTBench[all]"

BLEURT is installed separately:

pip install git+https://github.com/lucadiliello/bleurt-pytorch.git

Package Names

PyPI package: OpenSTBench
Python import: openstbench

Evaluation Dimensions

Dimension	Evaluator	System type	Main outputs
Translation Quality	`TranslationEvaluator`	S2TT, S2ST transcripts	`sacreBLEU`, `chrF++`, `COMET`, `BLEURT`
Speech Quality	`SpeechQualityEvaluator`	S2ST	`UTMOS`, `WER_Consistency`, `CER_Consistency`
Speech Quality	`SpeakerSimilarityEvaluator`	S2ST	`average_wavlm_large_similarity`, `average_resemblyzer_similarity`
Speech Quality	`EmotionEvaluator`	S2ST	`Emotion2Vec_Cosine_Similarity`, `Audio_Emotion_Accuracy`
Speech Quality	`ParalinguisticEvaluator`	S2ST	`Acoustic_Event_Count_F1`, `Acoustic_Event_Localization_F1`, `Acoustic_Event_Onset_Error`
Temporal Quality	`TemporalConsistencyEvaluator`	S2ST	`Duration_Consistency_SLC_0.2`, `Duration_Consistency_SLC_0.4`
Temporal Quality	`LatencyEvaluator`	Streaming S2TT/S2ST	`First_Audio_Delay_(StartOffset_ms)`, `Overall_Translation_Delay_(ATD_ms)`, `End_Action_Delay_(CustomATD_ms)`, `Real_Time_Factor_(RTF)`

Offline and streaming are supported system settings, not separate metric dimensions. Use the evaluators that match the available outputs: text, generated speech, source/target audio pairs, event annotations, or streaming traces.

Datasets

The paper uses the following datasets. Please follow the license and access terms of each original dataset.

Dataset	Used for	Link
MSLT dev	Translation quality, speech quality, temporal consistency, latency	Microsoft Speech Language Translation Corpus
LibriTTS-based paired speaker set	Speaker preservation	The constructed OpenSTBench paired set will be released through GitHub Releases; the source corpus is LibriTTS
RAVDESS	Emotion preservation	Audio_Speech_Actors_01-24.zip from the RAVDESS Zenodo record
MCAE-SPPS	Emotion preservation	MCAE-SPPS on OSF
NonverbalTTS test	Paralinguistic fidelity	deepvk/NonverbalTTS
SynParaSpeech	Paralinguistic fidelity	shawnpi/SynParaSpeech

Quick Start

from openstbench import TranslationEvaluator

evaluator = TranslationEvaluator(
    use_bleu=True,
    use_chrf=True,
    use_comet=False,
    use_bleurt=False,
    device="cuda",
)

scores = evaluator.evaluate_all(
    reference=["我喜欢看电影。", "今天天气很好。"],
    target_text=["我喜欢看电影。", "今天天气很好。"],
    source=["I like watching movies.", "The weather is nice today."],
    target_lang="zh",
)

print(scores)

Examples

Complete parameter templates are kept in examples/. The README intentionally stays compact; use these files for configurable parameters, input formats, and output fields.

examples/python/translation_eval.py
examples/python/speech_quality_eval.py
examples/python/speaker_similarity_eval.py
examples/python/emotion_eval.py
examples/python/paralinguistic_eval.py
examples/python/paralinguistic_identity_baseline.py
examples/python/temporal_consistency_eval.py
examples/python/latency_eval.py
examples/bash/install_extras.sh
examples/bash/run_latency_cli.sh

Latency can also be run from the module CLI:

python -m openstbench.latency.cli --help

Conventions

Text inputs generally accept list[str], one-sample-per-line .txt files, and .json files where supported by the evaluator.
Audio inputs generally accept folders, list[str], .txt path lists, and .json path lists where supported by the evaluator.
For zh, ja, and ko, speech consistency reports CER_Consistency; other languages report WER_Consistency.
Evaluators that accept pretrained model sources use a local-first rule. If the supplied local path exists, OpenSTBench uses it; otherwise it falls back to the configured remote model id.
Optional dependencies are loaded only when the corresponding evaluator needs them.

License

OpenSTBench's original code is released under the MIT License. See LICENSE.

Some latency evaluation components include code adapted from SimulEval, which is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0). Those adapted portions are distributed under CC BY-SA 4.0. See THIRD_PARTY_NOTICES.md for details.

The datasets referenced by OpenSTBench, including the datasets used in the paper, are not covered by the OpenSTBench code license. They are provided by their original authors or distributors under their own licenses and terms of use. Some datasets are restricted to research or non-commercial use.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.0

Jun 10, 2026

1.1.0

May 28, 2026

1.0.0

May 27, 2026

0.3.3

May 1, 2026

0.3.2

Apr 30, 2026

0.3.1

Apr 26, 2026

0.3.0

Apr 23, 2026

0.2.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openstbench-1.2.0.tar.gz (38.6 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openstbench-1.2.0-py3-none-any.whl (45.8 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file openstbench-1.2.0.tar.gz.

File metadata

Download URL: openstbench-1.2.0.tar.gz
Upload date: Jun 10, 2026
Size: 38.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for openstbench-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`0325f7d968a00969e5108ed2a54d2b463199c0dea46d306d99b6eaa7b321b14c`
MD5	`5546df2757fdc0d18530054965d052cb`
BLAKE2b-256	`51c6d23c990ae3d4a13dd556d0467e16bdae71cb84233710ec31d9899bff138c`

See more details on using hashes here.

File details

Details for the file openstbench-1.2.0-py3-none-any.whl.

File metadata

Download URL: openstbench-1.2.0-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 45.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for openstbench-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b8fc4bf1b16d637cf90b2c01e254563ffc24ab2ab1c54b57f95ea2748f526cc`
MD5	`a9b46738983764f65c2e38a7e835b59b`
BLAKE2b-256	`752b8bbc05210f414c20636fdbaeaebdff3797921b16fec85efa48fb97828d76`

See more details on using hashes here.

OpenSTBench 1.2.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

OpenSTBench

Installation

Package Names

Evaluation Dimensions

Datasets

Quick Start

Examples

Conventions

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes