Skip to main content

Transform subtitle line lengths, splitting into multiple subtitle fragments if necessary.

Project description

OpenSSF Scorecard PyPI - Downloads

SRT Equalizer

A Python module to transform subtitle line lengths, splitting into multiple subtitle fragments if necessary. Useful to adjust automatic speech recognition outputs from e.g. Whisper to a more convenient size.

This library works for all languages where spaces separate words.

Installing

pip install srt_equalizer

Example

An SRT file containing lines over a certain length can be adjusted to a maximum line length for better readability on screen.

1
00:00:00,000 --> 00:00:04,000
Good evening. I appreciate you giving me a few minutes of your time tonight

2
00:00:04,000 --> 00:00:11,000
so I can discuss with you a complex and difficult issue, an issue that is one of the most profound of our time.

To adjust line length to a maximum length of 42 chars you can use SRT equalizer like this:

from srt_equalizer import srt_equalizer

srt_equalizer.equalize_srt_file("test.srt", "shortened.srt", 42)

...they are split into multiple fragments and time code is adjusted to the approximate proportional length of each segment while staying inside the time slot for the fragment.

1
00:00:00,000 --> 00:00:02,132
Good evening. I appreciate you giving me

2
00:00:02,132 --> 00:00:04,000
a few minutes of your time tonight

3
00:00:04,000 --> 00:00:06,458
so I can discuss with you a complex and

4
00:00:06,458 --> 00:00:08,979
difficult issue, an issue that is one of

5
00:00:08,979 --> 00:00:11,000
the most profound of our time.

Algorithms

By default, this script uses greedy algorithm which splits the text at the rightmost possible space.

An alternative splitting algorithm is halving which will split longer lines more evenly instead of always trying to use maximum line length. This prevents producing lines with isolated word remainders.

Another alternative is the punctuation algorithm that takes punctuation (commas, periods, etc.) into account.

from srt_equalizer import srt_equalizer

# use "greedy", "halving" or "punctuation" for the method parameter
srt_equalizer.equalize_srt_file("test.srt", "shortened.srt", 42, method='halving')

Adjust Whisper subtitle lengths

Is is also possible to work with subtitle items produced from Whisper with the following utility methods:

split_subtitle(sub: srt.Subtitle, target_chars: int=42, start_from_index: int=1) -> list[srt.Subtitle]:

whisper_result_to_srt(segments: list[dict]) -> list[srt.Subtitle]:

Here is an example of how to reduce the lingth of subtitles created by Whisper. It assumes you have an audio file to transcribe called gwb.wav.

import whisper
from srt_equalizer import srt_equalizer
import srt
from datetime import timedelta

options_dict = {"task" : "transcribe", "language": "en"}
model = whisper.load_model("small")
result = model.transcribe("gwb.wav", language="en")
segments = result["segments"]
subs = srt_equalizer.whisper_result_to_srt(segments)

# Reduce line lenth in the whisper result to <= 42 chars
equalized = []
for sub in subs:
    equalized.extend(srt_equalizer.split_subtitle(sub, 42))

for i in equalized:
    print(i.content)

Quotes consideration

The method "split_by_punctuation" will try to take into account punctuation (commas, periods, etc.) when splitting the text.

This is to prevent trailing quotation characters when splitting text.

Character Unicode Description
' U+0027 ASCII apostrophe
" U+0022 ASCII double quote
" U+201C Left double curly quote
" U+201D Right double curly quote
' U+2018 Left single curly quote
' U+2019 Right single curly quote
U+201E Double low-9 quote (German)
U+201A Single low-9 quote (German)
« U+00AB Left guillemet
» U+00BB Right guillemet
U+2039 Single left guillemet
U+203A Single right guillemet

Contributing

This library is built with Poetry. Checkout this repo and run poetry install in the source folder. To run tests use poetry run pytest tests.

To build a new release, create a new tag, build it and publish to pypi:

poetry run pytest tests
git tag v0.1.2
poetry build
poetry publish

If you want to explore the library start a poetry shell.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srt_equalizer-0.1.12.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

srt_equalizer-0.1.12-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file srt_equalizer-0.1.12.tar.gz.

File metadata

  • Download URL: srt_equalizer-0.1.12.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.5 Darwin/25.1.0

File hashes

Hashes for srt_equalizer-0.1.12.tar.gz
Algorithm Hash digest
SHA256 d50c679837893630fa7f0333ebf56914818c0f1c7ce3c6208f22c6b8737ee1ac
MD5 e7919482e87c447095601c3dfb47336e
BLAKE2b-256 bdc5589f124a4165b054b6787e48207cb777d82d784fb083e1105fabef2fec95

See more details on using hashes here.

File details

Details for the file srt_equalizer-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: srt_equalizer-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.5 Darwin/25.1.0

File hashes

Hashes for srt_equalizer-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 3a4b750a86f3a8d73ffd911b4e32782dc378d6be7a3e679aac0d98ac54fd0a64
MD5 87e3dfe599b0d69e8854d87ab5d76717
BLAKE2b-256 8d4862ab1df5a4f1e23d6071d9381e16c546aa3f1c30d100041b0cadab0ac759

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page