Evaluate the quality of SRT files using the multilingual multimodal SONAR model.
Project description
SONAR Subtitling Evaluator
Code to evaluate the quality of SRT files using the multilingual multimodal SONAR sentence embedding model.
The evaluation accounts for the semantic similarity (computed as a cosine similarity)
between each subtitle block and the corresponding audio to which the block is assigned to
(through the timestamps in the SRT). The returned scores range in [-1, 1]
where the higher, the better.
Installation
Ensure that you have libsndfile
installed in you environment.
Then, run:
pip install SubSONAR
or, in the source root of this repository:
pip install -e .
The installation has been tested with python 3.8 and 3.10.
Usage
Example usage for Italian SRTs and English audios of two files (1 and 2):
subsonar \
--srt-files 1.srt 2.srt \
--audio-files 1.wav 2.wav \
--text-lang ita_Latn --audio-lang eng \
-bs 32
Please set the batch size bs
according to your GPU capacity.
The available languages for the speech encoder (--audio-lang
) can be found in the
SONAR repository,
while the text encoder (--text-lang
) supports the
200 languages of NLLB.
License
SONAR Subtitling Evaluator is licensed under Apache Version 2.0.
However, the SONAR encoders have a dedicated license that can be found in their repository LICENSE. Please check the license for the encoders you are using.
Citation
If you find this project useful, please cite:
@inproceedings{gaido-et-al-2024-sbaam,
title = {{SBAAM! Eliminating Transcript Dependency in Automatic Subtitling}},
author = {Gaido, Marco and Papi, Sara and Negri, Matteo and Cettolo, Mauro and Bentivogli, Luisa},
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2024",
address = "Bangkok, Thailand",
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file subsonar-1.0.tar.gz
.
File metadata
- Download URL: subsonar-1.0.tar.gz
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d02ee95ebe1497245ceba76c586b564f959a24196979dc43f8ac5616bc06ac03 |
|
MD5 | b0054fec3f788bec5c452376469bd101 |
|
BLAKE2b-256 | 8892d0f5ca3e04febe2553082f626f8d440dc319ddad303d24e31d8eb4dd46ac |
File details
Details for the file SubSONAR-1.0-py3-none-any.whl
.
File metadata
- Download URL: SubSONAR-1.0-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68676c28c0156d793d32efd87cc9d984ee6f7a598e8e7fcaec5b3c4a8642c6ff |
|
MD5 | d6d75c530655cc6e05b81bfe5bff3f92 |
|
BLAKE2b-256 | ff8b90e135e283c10b88058488d762a90f6f3ac0c17eb0c3e7b718ecd50183de |