Skip to main content

Remove duplicate/ghost entries from SRT subtitle files generated by auto-captioning tools

Project description

subtitle-deduplicator 🎬

PyPI AUR GitHub

SRT Subtitle Duplicate Remover

Remove duplicate/ghost entries from auto-generated SRT subtitle files. | GitHub | PyPI | AUR |Blog

Why?

Auto-generated SRT subtitles (from YouTube, Whisper, etc.) often contain duplicated entries in a "scrolling karaoke" pattern:

1
00:00:00,440 --> 00:00:02,909

so welcome to the last Talk of the day

2
00:00:02,909 --> 00:00:02,919
so welcome to the last Talk of the day
 

3
00:00:02,919 --> 00:00:06,630
so welcome to the last Talk of the day
and OSS what do ABI and why should they

Each real entry shows 2 lines (previous + new), and between them are 10ms "ghost" entries that only repeat the previous text. This roughly triples the file size and makes the subtitles unreadable.

Install

PyPI

pip install subtitle-deduplicator

Arch Linux (AUR)

yay -S subtitle-deduplicator

Usage

# Basic usage (outputs to video_deduped.srt)
subtitle-dedup video.srt

# Specify output file
subtitle-dedup video.srt -o video_clean.srt

# Overwrite in place
subtitle-dedup video.srt --in-place

# Custom ghost threshold (default: 20ms)
subtitle-dedup video.srt -t 50

# Specify encoding
subtitle-dedup video.srt -e latin-1

Example Output:

✔ Deduplication complete!
ℹ Input:               video.srt
ℹ Output:              video_clean.srt
ℹ Original entries:    1559
ℹ Deduplicated:        760
ℹ Removed:             799 (51.3%)

What It Removes

Duplicate Type Description
Ghost entries 10ms entries that repeat previous text
Carry-over lines First line duplicating previous entry's last line
Identical entries Back-to-back entries with same text
Empty entries Entries with no actual text content

Zero Dependencies

subtitle-deduplicator uses only Python standard library — no pip install requirements beyond Python 3.8+.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subtitle_deduplicator-1.0.1.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

subtitle_deduplicator-1.0.1-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file subtitle_deduplicator-1.0.1.tar.gz.

File metadata

  • Download URL: subtitle_deduplicator-1.0.1.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for subtitle_deduplicator-1.0.1.tar.gz
Algorithm Hash digest
SHA256 5bdfa5ea443e9b2c36bb4017b327a831df3ac9788d0d2b7d757b5bf0cf2a940a
MD5 5ff56a61983ccdaaf0a2059825cf1429
BLAKE2b-256 db3e8153ecb5bd82753d5f5c11a12e63be8bef9e1839665de227bf1d5416d5a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtitle_deduplicator-1.0.1.tar.gz:

Publisher: publish.yml on fr0stb1rd/subtitle-deduplicator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtitle_deduplicator-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for subtitle_deduplicator-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f264bd394552a8f97cea2b5ec25f7e1cd6801cd52743fbb826f9e0bfd20e7cc4
MD5 bc50649da6c435a9670dd9911f2d0fbe
BLAKE2b-256 3f5eedcba3b7ec3fc6d2f5aa4728524c05ef738435900ce34cfd5c0e18cb11b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtitle_deduplicator-1.0.1-py3-none-any.whl:

Publisher: publish.yml on fr0stb1rd/subtitle-deduplicator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page