Skip to main content

Transform subtitle line lengths, splitting into multiple subtitle fragments if necessary.

Project description

SRT Equalizer

A Python module to transform subtitle line lengths, splitting into multiple subtitle fragments if necessary. Useful to adjust automatic speech recognition outputs from e.g. Whisper to a more convenient size.

This library works for all languages where spaces separate words.

Installing

pip install srt_equalizer

Example

If the SRT file contains lines over a certain length like this:

1
00:00:00,000 --> 00:00:04,000
Good evening. I appreciate you giving me a few minutes of your time tonight

2
00:00:04,000 --> 00:00:11,000
so I can discuss with you a complex and difficult issue, an issue that is one of the most profound of our time.

Using this code to shorten the subtitles to a maximum length of 42 chars:

from srt_equalizer import srt_equalizer

srt_equalizer.equalize_srt_file("test.srt", "shortened.srt", 42)

...they are split into multiple fragments and time code is adjusted to the approximate proportional length of each segment while staying inside the time slot for the fragment.

1
00:00:00,000 --> 00:00:02,132
Good evening. I appreciate you giving me

2
00:00:02,132 --> 00:00:04,000
a few minutes of your time tonight

3
00:00:04,000 --> 00:00:06,458
so I can discuss with you a complex and

4
00:00:06,458 --> 00:00:08,979
difficult issue, an issue that is one of

5
00:00:08,979 --> 00:00:11,000
the most profound of our time.

Adjust Whisper subtitle lengths

Is is also possible to work with the subtitle items with the following utility methods:

split_subtitle(sub: srt.Subtitle, target_chars: int=42, start_from_index: int=1) -> list[srt.Subtitle]:

whisper_result_to_srt(segments: list[dict]) -> list[srt.Subtitle]:

Here is an example of how to reduce the lingth of subtitles created by Whisper. It assumes you have an audio file to transcribe called gwb.wav.

import whisper
from srt_equalizer import srt_equalizer
import srt
from datetime import timedelta

options_dict = {"task" : "transcribe", "language": "en"}
model = whisper.load_model("small")
result = model.transcribe("gwb.wav", language="en")
segments = result["segments"]
subs = srt_equalizer.whisper_result_to_srt(segments)

# Reduce line lenth in the whisper result to <= 42 chars
equalized = []
for sub in subs:
    equalized.extend(srt_equalizer.split_subtitle(sub, 42))

for i in equalized:
    print(i.content)

Contributing

This library is built with Poetry. Checkout this repo and run poetry install in the source folder. To run tests use poetry run pytest tests.

If you want to explore the library start a poetry shell.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srt_equalizer-0.1.4.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

srt_equalizer-0.1.4-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file srt_equalizer-0.1.4.tar.gz.

File metadata

  • Download URL: srt_equalizer-0.1.4.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.1 CPython/3.10.5 Darwin/22.5.0

File hashes

Hashes for srt_equalizer-0.1.4.tar.gz
Algorithm Hash digest
SHA256 1bd8a8041e01fe8094c3fe5d9f9ca1458321710e29e5943a53e7ea39d5a273a6
MD5 cf847a52d92c698865ffd831d8624d8f
BLAKE2b-256 384c95714be3d9b776fe16af83956dcd1f470475887f5cbdb5bf6d2af1a7bc93

See more details on using hashes here.

File details

Details for the file srt_equalizer-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: srt_equalizer-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.1 CPython/3.10.5 Darwin/22.5.0

File hashes

Hashes for srt_equalizer-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a8b5bd8871181376e6aaae6cad2a46e7cac9165418057f523b10863a86650f18
MD5 bd72d34af5ec9c5f7853ed76ab1167d7
BLAKE2b-256 3fcf70221f292dbe6bac734eaf6cb3a25d211c4a29c2dcddc3fd34b48d204cb4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page