Remove duplicate/ghost entries from SRT subtitle files generated by auto-captioning tools
Project description
subtitle-deduplicator 🎬
SRT Subtitle Duplicate Remover
Remove duplicate/ghost entries from auto-generated SRT subtitle files. | GitHub | PyPI
Why?
Auto-generated SRT subtitles (from YouTube, Whisper, etc.) often contain duplicated entries in a "scrolling karaoke" pattern:
1
00:00:00,440 --> 00:00:02,909
so welcome to the last Talk of the day
2
00:00:02,909 --> 00:00:02,919
so welcome to the last Talk of the day
3
00:00:02,919 --> 00:00:06,630
so welcome to the last Talk of the day
and OSS what do ABI and why should they
Each real entry shows 2 lines (previous + new), and between them are 10ms "ghost" entries that only repeat the previous text. This roughly triples the file size and makes the subtitles unreadable.
Install
pip install subtitle-deduplicator
Usage
# Basic usage (outputs to video_deduped.srt)
subtitle-dedup video.srt
# Specify output file
subtitle-dedup video.srt -o video_clean.srt
# Overwrite in place
subtitle-dedup video.srt --in-place
# Custom ghost threshold (default: 20ms)
subtitle-dedup video.srt -t 50
# Specify encoding
subtitle-dedup video.srt -e latin-1
Example Output:
✔ Deduplication complete!
ℹ Input: video.srt
ℹ Output: video_clean.srt
ℹ Original entries: 1559
ℹ Deduplicated: 760
ℹ Removed: 799 (51.3%)
What It Removes
| Duplicate Type | Description |
|---|---|
| Ghost entries | 10ms entries that repeat previous text |
| Carry-over lines | First line duplicating previous entry's last line |
| Identical entries | Back-to-back entries with same text |
| Empty entries | Entries with no actual text content |
Zero Dependencies
subtitle-deduplicator uses only Python standard library — no pip install requirements beyond Python 3.8+.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file subtitle_deduplicator-1.0.0.tar.gz.
File metadata
- Download URL: subtitle_deduplicator-1.0.0.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb400a5d6f98bdaf1adf85e3eb6f4268053ba3bb2615d12c4c59f859a67d11ac
|
|
| MD5 |
afe4cb55af5fc5dd30ea64b0f4f1d68f
|
|
| BLAKE2b-256 |
4a55e694f6b52e61b3b0bf7570962818871cd5e357217ab91e294efcb032dc3e
|
Provenance
The following attestation bundles were made for subtitle_deduplicator-1.0.0.tar.gz:
Publisher:
publish.yml on fr0stb1rd/subtitle-deduplicator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
subtitle_deduplicator-1.0.0.tar.gz -
Subject digest:
bb400a5d6f98bdaf1adf85e3eb6f4268053ba3bb2615d12c4c59f859a67d11ac - Sigstore transparency entry: 1235757279
- Sigstore integration time:
-
Permalink:
fr0stb1rd/subtitle-deduplicator@188f39712bbe7b0ad72c2fcd7749a496f45d24ae -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/fr0stb1rd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@188f39712bbe7b0ad72c2fcd7749a496f45d24ae -
Trigger Event:
release
-
Statement type:
File details
Details for the file subtitle_deduplicator-1.0.0-py3-none-any.whl.
File metadata
- Download URL: subtitle_deduplicator-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80561722e25e3fd4eccb26dd184de92baeb71cde68e93bd27ae0828e4da434dd
|
|
| MD5 |
6c84d3150b9f951a050881afe1c0f230
|
|
| BLAKE2b-256 |
60c7c5c8e66dd8f45f34154fc396abd993715ef8908f83e4aa44104e69b53575
|
Provenance
The following attestation bundles were made for subtitle_deduplicator-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on fr0stb1rd/subtitle-deduplicator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
subtitle_deduplicator-1.0.0-py3-none-any.whl -
Subject digest:
80561722e25e3fd4eccb26dd184de92baeb71cde68e93bd27ae0828e4da434dd - Sigstore transparency entry: 1235757315
- Sigstore integration time:
-
Permalink:
fr0stb1rd/subtitle-deduplicator@188f39712bbe7b0ad72c2fcd7749a496f45d24ae -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/fr0stb1rd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@188f39712bbe7b0ad72c2fcd7749a496f45d24ae -
Trigger Event:
release
-
Statement type: