Skip to main content

Remove duplicate/ghost entries from SRT subtitle files generated by auto-captioning tools

Project description

subtitle-deduplicator 🎬

PyPI AUR GitHub

SRT Subtitle Duplicate Remover

Remove duplicate/ghost entries from auto-generated SRT subtitle files. | GitHub | PyPI | AUR |Blog

Why?

Auto-generated SRT subtitles (from YouTube, Whisper, etc.) often contain duplicated entries in a "scrolling karaoke" pattern:

1
00:00:00,440 --> 00:00:02,909

so welcome to the last Talk of the day

2
00:00:02,909 --> 00:00:02,919
so welcome to the last Talk of the day
 

3
00:00:02,919 --> 00:00:06,630
so welcome to the last Talk of the day
and OSS what do ABI and why should they

Each real entry shows 2 lines (previous + new), and between them are 10ms "ghost" entries that only repeat the previous text. This roughly triples the file size and makes the subtitles unreadable.

Install

PyPI

pip install subtitle-deduplicator

Arch Linux (AUR)

yay -S subtitle-deduplicator

Usage

# Basic usage (outputs to video_deduped.srt)
subtitle-dedup video.srt

# Specify output file
subtitle-dedup video.srt -o video_clean.srt

# Overwrite in place
subtitle-dedup video.srt --in-place

# Custom ghost threshold (default: 20ms)
subtitle-dedup video.srt -t 50

# Specify encoding
subtitle-dedup video.srt -e latin-1

Example Output:

✔ Deduplication complete!
ℹ Input:               video.srt
ℹ Output:              video_clean.srt
ℹ Original entries:    1559
ℹ Deduplicated:        760
ℹ Removed:             799 (51.3%)

What It Removes

Duplicate Type Description
Ghost entries 10ms entries that repeat previous text
Carry-over lines First line duplicating previous entry's last line
Identical entries Back-to-back entries with same text
Empty entries Entries with no actual text content

Zero Dependencies

subtitle-deduplicator uses only Python standard library — no pip install requirements beyond Python 3.8+.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subtitle_deduplicator-1.0.2.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

subtitle_deduplicator-1.0.2-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file subtitle_deduplicator-1.0.2.tar.gz.

File metadata

  • Download URL: subtitle_deduplicator-1.0.2.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for subtitle_deduplicator-1.0.2.tar.gz
Algorithm Hash digest
SHA256 765e2aa947895d585fb0a5b6d11e671bd75d4350e5b31cda9a459f62a01ebbe1
MD5 dcbd5b0831919dabf23b45fc7063fc65
BLAKE2b-256 67f3ebbabc6ca73be3fbf6c95f0585147752716384f3fd5c12e8fe8c710e1deb

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtitle_deduplicator-1.0.2.tar.gz:

Publisher: publish.yml on fr0stb1rd/subtitle-deduplicator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtitle_deduplicator-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for subtitle_deduplicator-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 30d6b5639a6f1bdf9773ec65cd102161cb7933df18ed6f5cb3078514b816e92e
MD5 36457c8dd9a05488a8ea4f3c3209ec82
BLAKE2b-256 d6021e7b3d776788589172607db9c6d9aafe916588d9b6ea95dc622971e48545

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtitle_deduplicator-1.0.2-py3-none-any.whl:

Publisher: publish.yml on fr0stb1rd/subtitle-deduplicator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page