Skip to main content

TFM

Project description

Speak2Subs

Speak2Subs has three fundamental features: subtitle generation based on a reference template, subtitle quality evaluation, and subtitle generation without a reference template. This readme provides a concise introduction to each of these features, while subsequent sections describe the installation process, usage, and internal mechanics in more detail.

packageresume.png

Features

Subtitle generation based on a reference template

One of the features is to generate subtitles. To correctly evaluate the ASR models on the table, we need to put them to the test and generate captions from datasets. But it is not enough to have subtitles, for the evaluation to be viable we need to compare them with the already original, presumably correct, subtitles. This implies that the generated subtitles must have the same timestamps as the original subtitles.

To make it clear, if our original subtitle contains the following entry:

[00:00:02.000] --> [00:00:06.000] Buenos días, esto es una prueba.

Our generated subtitle could not have the following structure:

[00:00:02.000] --> [00:00:03.000] Buenos días
[00:00:04.000] --> [00:00:06.000] Eso es una rueda.

Because when it comes to comparing texts, there is clearly no match. Obviously by restructuring timestamps we obtain the equivalences:

[00:00:02.000] --> [00:00:06.000] Buenos días, esto es una prueba.
[00:00:02.000] --> [00:00:06.000] Buenos días, eso es una rueda.

In conclusion, to evaluate subtitles we need to previously load the original subtitles as a template. So, one of the usage of this package, is to generate subtitles using a template.

As a disadvantage of using templates, the package is limited to the original structure, and may not be able to adjust the compliance parameters as it would in templateless generation.

Subtitle quality evaluation

This package allows you to evaluate the quality of a subtitle file based on a reference file. Although this feature is what has been used for the evaluation phase, it is not necessary that the subtitles to be evaluated have been generated by this same package. The only requirement is that both files share timestamps and number of entries.

Subtitle generation without a reference template

Of course, you can also generate subtitles without a base template. In fact, it is the functionality that gives value to the package beyond just as an evaluator.

With this functionality, the grouping of the transcribed words (tokens) into sentences and, in turn, subtitles, must follow the compliance policies described in the evaluation section.

A contraindication of these subtitles is that we cannot evaluate the error metrics, but we do have information on generation time metrics and compliance with the UNE.

Installation

Before installing the package, some requisites are needed.

Install PyTorch

PyTorch is an open-source machine learning library used for various tasks like natural language processing, computer vision, and more. It provides tools for building and training neural networks, offering flexibility and efficiency in experimentation due to its dynamic computational graph feature. Speak2Subs needs Torch to be installed.

The command must be adjusted to your requirements: PyTorch

Install Docker

Docker is a platform designed to make it easier to create, deploy, and run applications using containers. Containers allow developers to package up an application with all the necessary parts (such as libraries and other dependencies) and ship it as a single unit. Docker provides a way to automate the deployment of applications inside containers, ensuring consistency across different environments, from development to testing and production. Since Speak2Subs uses containerized ASR models, Docker needs to be installed. Docker

pip install docker

Install Speak2Subs

Finally, Speak2Subs can be installed. The package has a few more dependencies, but they are automatically installed at the same time.

From Pypi

pip install speak2subs

From source code

Alternatively, this command pulls and installs the latest commit from this repository, along with its Python dependencies:

git clone https://github.com/JulioFresneda/Speak2Subs.git
cd Speak2Subs
pip install -e .

Usage

This package can be used directly via CLI, and in a Python script.

This package can be utilized both via the command line and within a Python script, for both generating subtitles and evaluating them. The package design is oriented towards a dataset style, meaning it defaults to working with folders of files. However, it can also be specified to generate or evaluate a specific file. The following sections will explain specific examples.

How to generate subtitles

Speak2Subs supports MP4 and WAV files. When using MP4, it automatically converts them to WAV format. Optionally, if original VTT files are available, similar timestamped subtitles can be generated based on the original VTT as a template.

To generate subtitles, all MP4 or WAV files (and VTT files if desired) should be in the same folder. If VTT files are present, they must share the same name as the media file, disregarding the extension. If no dataset name is specified, the name of the folder containing the files is used. To export the results, a folder needs to be specified where the generated VTT files will be exported.

Individual files are compatible too, the only difference in usage is that instead of using a path for the dataset folder, a path for the media file is needed

Command Line Interface usage

If you want to generate subtitles

speak2subs --media-path="./mydataset" --export_path="./results"

You can choose the ASR models to use. Default is whisperx.

speak2subs -mp="./mydataset" -ep="./results" --asr="nemo, whisper"

If you want to generate subtitles and use original VTT as reference

speak2subs -mp="./mydataset" -ep="./results" --use_vtt_templates

If you want to generate subtitles for a particular file

speak2subs -mp="./mydataset/media_1.wav" -ep="./results" --use_vtt_templates

If you want to get the full list of arguments

speak2subs --help

Python usage

It is as easy as with CLI to use this package in a python script.

from Speak2Subs import speak2subs

speak2subs.transcript('./mydataset',
                      export_path='./results',
                      asr='all',
                      use_vad=True,
                      segment=True,
                      group_segments=False,
                      max_speech_duration=30,
                      use_vtt_template=True,
                      reduce_noise=False)

How to evaluate subtitles

If subtitles have been previously generated, using the original subtitles as a reference, it is possible to evaluate the outcomes without effort. Much like the process for generating results, the original media folder is required to read the original subtitles, alongside the results folder to read the generated subtitles. The sole distinction lies in the additional parameter 'evaluate' that must be utilized. As straightforward as that. The result is an Excel file with all the metrics evaluated.

Command Line Interface usage

First we generate subtitles

speak2subs --media-path="./mydataset" --export_path="./results" --asr="seamless, vosk"

Then we evaluate them

speak2subs -mp="./mydataset" -ep="./results" --evaluate

We can evaluate a pair of VTT too

speak2subs --evaluate --ref_vtt_path="./reference.vtt", --pred_vtt_path="./predicted.vtt"

If we chose to evaluate a pair of VTT instead of the results of a generation, instead of an Excel with metrics, the result is an output on the terminal.

We can use more arguments, but it won't take effect because the rest of them are focused on the generation task.

Python usage

You can use it in python too.

from Speak2Subs import speak2subs
from Speak2Subs import speak2subs

# If we want to evaluate the generated subtitles for our dataset
speak2subs.evaluateFolder( "./mydataset", "./results")
# If we want to evaluate a specific pair of subtitles
result = speak2subs.evaluatePair("./media_1.vtt", "./media_1_PRED_.vtt")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Speak2Subs-1.3.tar.gz (25.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Speak2Subs-1.3-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file Speak2Subs-1.3.tar.gz.

File metadata

  • Download URL: Speak2Subs-1.3.tar.gz
  • Upload date:
  • Size: 25.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for Speak2Subs-1.3.tar.gz
Algorithm Hash digest
SHA256 47eb89fe827ba1f4385109c22060484c3064edeec390b9b3d7d4baa839ed0c93
MD5 961daee4c51f28fe3ec97c6cb3e97639
BLAKE2b-256 a537a4660027d2ef7864126232ce3c3b5b4cc8311c3744a1fe84dfcbbd4e3ac0

See more details on using hashes here.

File details

Details for the file Speak2Subs-1.3-py3-none-any.whl.

File metadata

  • Download URL: Speak2Subs-1.3-py3-none-any.whl
  • Upload date:
  • Size: 28.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for Speak2Subs-1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 431624c9216d9bf8c402018935591c1017f8190bf80b23d6e4e1d70f17b12bbd
MD5 e8875a6f0822caa78550cc453063efb8
BLAKE2b-256 48f89c159ab27230c1793ea136c484b2265379d0bfa3edea73e8710bec8abc60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page