Skip to main content

Remove duplicate/ghost entries from SRT subtitle files generated by auto-captioning tools

Project description

subtitle-deduplicator 🎬

PyPI GitHub

SRT Subtitle Duplicate Remover

Remove duplicate/ghost entries from auto-generated SRT subtitle files. | GitHub | PyPI

Why?

Auto-generated SRT subtitles (from YouTube, Whisper, etc.) often contain duplicated entries in a "scrolling karaoke" pattern:

1
00:00:00,440 --> 00:00:02,909

so welcome to the last Talk of the day

2
00:00:02,909 --> 00:00:02,919
so welcome to the last Talk of the day
 

3
00:00:02,919 --> 00:00:06,630
so welcome to the last Talk of the day
and OSS what do ABI and why should they

Each real entry shows 2 lines (previous + new), and between them are 10ms "ghost" entries that only repeat the previous text. This roughly triples the file size and makes the subtitles unreadable.

Install

pip install subtitle-deduplicator

Usage

# Basic usage (outputs to video_deduped.srt)
subtitle-dedup video.srt

# Specify output file
subtitle-dedup video.srt -o video_clean.srt

# Overwrite in place
subtitle-dedup video.srt --in-place

# Custom ghost threshold (default: 20ms)
subtitle-dedup video.srt -t 50

# Specify encoding
subtitle-dedup video.srt -e latin-1

Example Output:

✔ Deduplication complete!
ℹ Input:               video.srt
ℹ Output:              video_clean.srt
ℹ Original entries:    1559
ℹ Deduplicated:        760
ℹ Removed:             799 (51.3%)

What It Removes

Duplicate Type Description
Ghost entries 10ms entries that repeat previous text
Carry-over lines First line duplicating previous entry's last line
Identical entries Back-to-back entries with same text
Empty entries Entries with no actual text content

Zero Dependencies

subtitle-deduplicator uses only Python standard library — no pip install requirements beyond Python 3.8+.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subtitle_deduplicator-1.0.0.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

subtitle_deduplicator-1.0.0-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file subtitle_deduplicator-1.0.0.tar.gz.

File metadata

  • Download URL: subtitle_deduplicator-1.0.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for subtitle_deduplicator-1.0.0.tar.gz
Algorithm Hash digest
SHA256 bb400a5d6f98bdaf1adf85e3eb6f4268053ba3bb2615d12c4c59f859a67d11ac
MD5 afe4cb55af5fc5dd30ea64b0f4f1d68f
BLAKE2b-256 4a55e694f6b52e61b3b0bf7570962818871cd5e357217ab91e294efcb032dc3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtitle_deduplicator-1.0.0.tar.gz:

Publisher: publish.yml on fr0stb1rd/subtitle-deduplicator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtitle_deduplicator-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for subtitle_deduplicator-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 80561722e25e3fd4eccb26dd184de92baeb71cde68e93bd27ae0828e4da434dd
MD5 6c84d3150b9f951a050881afe1c0f230
BLAKE2b-256 60c7c5c8e66dd8f45f34154fc396abd993715ef8908f83e4aa44104e69b53575

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtitle_deduplicator-1.0.0-py3-none-any.whl:

Publisher: publish.yml on fr0stb1rd/subtitle-deduplicator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page