Skip to main content

Evaluate the quality of SRT files using the multilingual multimodal SONAR model.

Project description

SONAR Subtitling Evaluator

Code to evaluate the quality of SRT files using the multilingual multimodal SONAR sentence embedding model.

The evaluation accounts for the semantic similarity (computed as a cosine similarity) between each subtitle block and the corresponding audio to which the block is assigned to (through the timestamps in the SRT). The returned scores range in [-1, 1] where the higher, the better.

Installation

Ensure that you have libsndfile installed in you environment. Then, run:

pip install SubSONAR

or, in the source root of this repository:

pip install -e .

The installation has been tested with python 3.8 and 3.10.

Usage

Example usage for Italian SRTs and English audios of two files (1 and 2):

subsonar \
  --srt-files 1.srt 2.srt \
  --audio-files 1.wav 2.wav \
  --text-lang ita_Latn --audio-lang eng \
  -bs 32

Please set the batch size bs according to your GPU capacity.

The available languages for the speech encoder (--audio-lang) can be found in the SONAR repository, while the text encoder (--text-lang) supports the 200 languages of NLLB.

License

SONAR Subtitling Evaluator is licensed under Apache Version 2.0.

However, the SONAR encoders have a dedicated license that can be found in their repository LICENSE. Please check the license for the encoders you are using.

Citation

If you find this project useful, please cite:

@inproceedings{gaido-et-al-2024-sbaam,
title = {{SBAAM! Eliminating Transcript Dependency in Automatic Subtitling}},
author = {Gaido, Marco and Papi, Sara and Negri, Matteo and Cettolo, Mauro and Bentivogli, Luisa},
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2024",
address = "Bangkok, Thailand",
}

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subsonar-1.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

SubSONAR-1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file subsonar-1.0.tar.gz.

File metadata

  • Download URL: subsonar-1.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.6

File hashes

Hashes for subsonar-1.0.tar.gz
Algorithm Hash digest
SHA256 d02ee95ebe1497245ceba76c586b564f959a24196979dc43f8ac5616bc06ac03
MD5 b0054fec3f788bec5c452376469bd101
BLAKE2b-256 8892d0f5ca3e04febe2553082f626f8d440dc319ddad303d24e31d8eb4dd46ac

See more details on using hashes here.

File details

Details for the file SubSONAR-1.0-py3-none-any.whl.

File metadata

  • Download URL: SubSONAR-1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.6

File hashes

Hashes for SubSONAR-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 68676c28c0156d793d32efd87cc9d984ee6f7a598e8e7fcaec5b3c4a8642c6ff
MD5 d6d75c530655cc6e05b81bfe5bff3f92
BLAKE2b-256 ff8b90e135e283c10b88058488d762a90f6f3ac0c17eb0c3e7b718ecd50183de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page