Skip to main content

A server to run simultaneous/streaming experiments and demo

Project description

simulstream logo

simulstream

simulstream is a Python library for simultaneous/streaming speech recognition and translation. It enables both the simulation with existing files to score systems, like in the SimulEval project, and the possibility to run demos on a browser.

simulstream provides a WebSocket server and utilities for running streaming speech processing experiments and demos. It supports real-time transcription and translation through streaming audio input. By streaming, we mean that the library by default assumes that the input is an unbounded speech signal, rather than many short speech segments as in simultaneous speech processing. The simultaneous setting can be easily addressed by pre-segmenting the audio into many small segments and feed each segment to simulstream.

The repository is tested using Python 3.11. Although it may work also with other Python versions, we do not ensure compatibility with them. Check out the Usage section for instructions on how to use the repository and the Installation section for further information about how to install the project.

Installation

You can install the latest stable version from PyPI:

pip install simulstream

Or, to install from source:

git clone https://github.com/hlt-mt/simulstream.git
cd simulstream
pip install .

Please notice that these commands will only install the basic functionalities of the repository, i.e. the WebSocket server and client. Additionally, you have to install the dependencies required by the speech processor that you want to use. The repository comes with examples of speech processors that rely on e.g. Transformers models or on the NVIDIA Canary model. You can install the required dependencies for them by specifying the corresponding selector when installing the repository. For instance, using Canary speech processors requires the canary selector. But you can create your custom speech processor and install the corresponding dependencies.

Also the evaluation comes with additional dependencies, which can be installed with the eval selector. Be careful as the metrics include COMET, which has dependencies on Transformers and other libraries that can run on conflict with those required by your speech processor.

As an example, if you want to install the simulstream package with Canary speech processors and the evaluation package, run:

pip install -e .[canary,eval]

For development (with docs and testing tools):

pip install -e .[dev]

Usage

The package is based on a WebSocket client-server interaction. The server waits for connections by the clients. The clients can send configurations messages in JSON format (e.g. to specify the input/output desired language) and audio chunks, encoded as 16-bit integers. The server processes the input audio, generates the transcript/translation by means of the configured speech processor and sends it to the client in JSON format. In addition, if configured to do so, the server writes logs that can be used to compute metrics in a JSONL file.

To have everything up and running, first run the server and once the server is up and running execute one or more clients. The repository contains the code for two types of clients: a web interface client that can be used for demos and a python command-line client that can be used for testing or evaluation of the speech processor performance.

Below, you can find a simple illustration of the overall architecture with the two scenarios:

Overall simulstream architecture

To streamline the evaluation of speech processors in terms of quality and latency, there is also the possibility to run the inference with a configured speech processor without setting up a client-server interaction. For more information, refer to the Evaluation section.

WebSocket Server

Run the WebSocket server with YAML configuration files:

simulstream_server --server-config config/server.yaml \
    --speech-processor-config config/speech_processor.yaml

The repository contains examples of YAML files both for the server and for some speech processors. They can be edited and adapted.

Customize with Your Speech Processor

If you want to implement your own speech_processor, you need to create a subclass of simulstream.server.speech_processors.base.SpeechProcessor and add the class to the PYTHONPATH. Then reference the new class in the YAML file configuration of the speech processor. Refer to the dedicated documentation for more details on how to implement a speech processor in the project documentation and in the docstring of both simulstream.server.speech_processors.base.SpeechProcessor and simulstream.server.speech_processors.base.BaseSpeechProcessor.

As reference, you can check the processors implemented in this repository, available in the simulstream.server.speech_processors module. Notice that each speech processor can have additional dependencies that are not installed by default with this codebase (see Installation).

HTTP Server Web Client

For a demo, you can create an HTTP web server that servers a web interface interacting with the WebSocket server. This can be done by:

simulstream_demo_http_server --config config/server.yaml -p 8001 --directory webdemo

In case you are not running the command from the root folder of this repository, you should change accordingly the --directory parameter to point to the webdemo folder containing the HTML files of the HTTP demo. You can of course replace 8001 with any other port number you prefer. The web interface can then be accessed from the local laptop at http://localhost:8001/ or from any other terminal connected to the same network using the IP address of your workstation instead of localhost. Be careful not to use the same port specified for the WebSocket server if they are running on the same machine. If running the HTTP server from a machine different from the one where the WebSocket server runs, ensure that the HTTP server can connect to the WebSocket server through the address specified in the config/server.yaml file.

Python Client

If you want to process a set of audio files (e.g. a test set) with your system, send the audios with the provided client:

simulstream_wavs_client --uri ws://localhost:8080/ \
    --wav-list-file PATH_TO_TXT_FILE_WITH_A_LIST_OF_WAV_FILES.txt \
    --tgt-lang it --src-lang en

Tne --uri should contain the address of the WebSocket client, so it should correspond to what is specified in the server YAML config. The file specified in --wav-list-file should be a TXT files containing for each line the path to a WAV file.

Before running the command, ensure that the logging of metrics is enabled in the configuration file of the WebSocket server (i.e., metrics.enabled = True) and that metrics are logged to an empty/non-existing file, to ensure that the resulting file will contain only the logs related to the files sent by the client. Then, compute the relevant scores as described below (Evaluation).

Evaluation

To evaluate a speech processor, as a first thing you need to run the inference on a test set. This can be done either through the Python client (see section above) or through the dedicated command that performs the inference without setting up a client-server interaction. This second option can be followed by running the dedicated command:

simulstream_inference --speech-processor-config config/speech_processor.yaml \
    --wav-list-file PATH_TO_TXT_FILE_WITH_A_LIST_OF_WAV_FILES.txt \
    --tgt-lang it --src-lang en --metrics-log-file metrics.jsonl

Once you have generated the JSONL file containing the inference log (e.g. metrics.jsonl), you can score your speech processor by running:

simulstream_score_latency --scorer stream_laal \
    --eval-config config/speech_processor.yaml \
    --log-file metrics.jsonl \
    --reference REFERENCE_FILE.txt \
    --audio-definition YAML_AUDIO_REFERENCES_DEFINITION.yaml

simulstream_score_quality --scorer comet \
    --eval-config config/speech_processor.yaml \
    --log-file metrics.jsonl \
    --references REFERENCES_FILE.txt \
    --transcripts TRANSCRIPTS_FILE.txt

simulstream_stats --eval-config config/speech_processor.yaml \
    --log-file metrics.jsonl

Each of them will output different metrics. simulstream_score_latency provides the metric for the latency of the system by leveraging a sentence-based TXT reference file and a YAML file, which for each line in the reference TXT contains the WAV file to which it belong, its offset (i.e. start time) and duration. The exact definition of how the latency is defined depends on the selected metric (--scorer).

Similarly, simulstream_score_quality evaluated the quality of the generated outputs against one (or more) reference (and transcript, only for metrics requiring them) file(s).

Lastly, simulstream_stats computes statistics like the computational cost and flickering ratio.

Contributing

Contributions from interested researchers and developers are extremely appreciated.

You can create an issue in case of problems with the code, questions, or feature requests. You are also more than welcome to create a pull request that addresses any issue.

Licence

simulstream is licensed under Apache Version 2.0.

Credits

If you use this library, please cite:

@article{gaido-et-al-2025-simulstream,
    title={{simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems}},
    author={Gaido, Marco and Papi, Sara and Cettolo, Mauro and Negri, Matteo and Bentivogli, Luisa},
    journal = "arXiv",
    year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simulstream-0.2.0.tar.gz (61.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simulstream-0.2.0-py3-none-any.whl (87.6 kB view details)

Uploaded Python 3

File details

Details for the file simulstream-0.2.0.tar.gz.

File metadata

  • Download URL: simulstream-0.2.0.tar.gz
  • Upload date:
  • Size: 61.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for simulstream-0.2.0.tar.gz
Algorithm Hash digest
SHA256 518d586db6032a4fb75d51e14eecdf3021600c85ddb24414cb69ac86c4ce604f
MD5 b86d04ae5539be2fd790ee07bcd394dc
BLAKE2b-256 ed2b8aeef10089d4c035fad536277bc556993dd8af3685c230c9d7a0c710bca0

See more details on using hashes here.

File details

Details for the file simulstream-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: simulstream-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 87.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for simulstream-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 470c65d9cee9db950b1c453d2bcbc422013ec5d7a681db5e704e645b9653294c
MD5 45f620cb036c18888fdff7b52a35812d
BLAKE2b-256 5432f811b2142711fb7410dceaff1e14d3db802ad10434292354ee91414547fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page