Transform subtitle line lengths, splitting into multiple subtitle fragments if necessary.
Project description
SRT Equalizer
A Python module to transform subtitle line lengths, splitting into multiple subtitle fragments if necessary. Useful to adjust automatic speech recognition outputs from e.g. Whisper to a more convenient size.
This library works for all languages where spaces separate words.
Installing
pip install srt_equalizer
Example
An SRT file containing lines over a certain length can be adjusted to a maximum line length for better readability on screen.
1
00:00:00,000 --> 00:00:04,000
Good evening. I appreciate you giving me a few minutes of your time tonight
2
00:00:04,000 --> 00:00:11,000
so I can discuss with you a complex and difficult issue, an issue that is one of the most profound of our time.
To adjust line length to a maximum length of 42 chars you can use SRT equalizer like this:
from srt_equalizer import srt_equalizer
srt_equalizer.equalize_srt_file("test.srt", "shortened.srt", 42)
...they are split into multiple fragments and time code is adjusted to the approximate proportional length of each segment while staying inside the time slot for the fragment.
1
00:00:00,000 --> 00:00:02,132
Good evening. I appreciate you giving me
2
00:00:02,132 --> 00:00:04,000
a few minutes of your time tonight
3
00:00:04,000 --> 00:00:06,458
so I can discuss with you a complex and
4
00:00:06,458 --> 00:00:08,979
difficult issue, an issue that is one of
5
00:00:08,979 --> 00:00:11,000
the most profound of our time.
Algorithms
By default, this script uses greedy
algorithm which splits the text at the rightmost possible space.
An alternative splitting algorithm is halving
which will split longer lines more evenly instead of always trying to use maximum line length. This prevents producing lines with isolated word remainders.
Another alternative is the punctuation
algorithm that takes punctuation (commas, periods, etc.) into account.
from srt_equalizer import srt_equalizer
# use "greedy", "halving" or "punctuation" for the method parameter
srt_equalizer.equalize_srt_file("test.srt", "shortened.srt", 42, method='halving')
Adjust Whisper subtitle lengths
Is is also possible to work with subtitle items produced from Whisper with the following utility methods:
split_subtitle(sub: srt.Subtitle, target_chars: int=42, start_from_index: int=1) -> list[srt.Subtitle]:
whisper_result_to_srt(segments: list[dict]) -> list[srt.Subtitle]:
Here is an example of how to reduce the lingth of subtitles created by Whisper. It assumes you have an audio file to transcribe called gwb.wav.
import whisper
from srt_equalizer import srt_equalizer
import srt
from datetime import timedelta
options_dict = {"task" : "transcribe", "language": "en"}
model = whisper.load_model("small")
result = model.transcribe("gwb.wav", language="en")
segments = result["segments"]
subs = srt_equalizer.whisper_result_to_srt(segments)
# Reduce line lenth in the whisper result to <= 42 chars
equalized = []
for sub in subs:
equalized.extend(srt_equalizer.split_subtitle(sub, 42))
for i in equalized:
print(i.content)
Contributing
This library is built with Poetry. Checkout this repo and run poetry install
in the source folder. To run tests use poetry run pytest tests
.
To build a new release, create a new tag, build it and publish to pypi:
poetry run pytest tests
git tag v0.1.2
poetry build
poetry publish
If you want to explore the library start a poetry shell
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file srt_equalizer-0.1.10.tar.gz
.
File metadata
- Download URL: srt_equalizer-0.1.10.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.8 Darwin/23.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f6b1b2c48b12bb1caab13822d7ddd0a54a87772b824a0aa2ba64c033d379353 |
|
MD5 | d53e066464ca4bd55520847f14c9e300 |
|
BLAKE2b-256 | 91600ef6d49662e1b8d30e54a35b77ea8183ca85a7b9cf7d1ec836cc07450551 |
File details
Details for the file srt_equalizer-0.1.10-py3-none-any.whl
.
File metadata
- Download URL: srt_equalizer-0.1.10-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.8 Darwin/23.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b0cca73374ff7351badf2dc5266cabefdfd36b91ff736a535445562036cf8ed |
|
MD5 | 50c185ebef992c606f9d397aaf6d1c65 |
|
BLAKE2b-256 | db79580db2774a371dc5ec7c71fbd841909d8e76f5ba97d2b4cd55f6fcf7a80e |