Skip to main content

A professional, robust library for converting WebVTT to SRT with advanced cleaning capabilities for AI-generated subtitles.

Project description

Buy Me A Coffee

whisper-vtt2srt Icon

whisper-vtt2srt

A robust, production-grade library designed to convert WebVTT to SRT, turning messy AI transcripts into clean, usable subtitles.

A post-processing tool designed to clean the output from OpenAI Whisper, YouTube Auto-Captions, and other AI transcription services.
Perfect for TTS pipelines, video dubbing, and dataset preparation.

License: MIT Python 3.10+ Code Style: Black PRs Welcome Issues PyPI version


🧠 The Problem with Raw AI Subtitles

AI tools like Whisper are incredible at speech recognition, but their raw VTT output is often chaotic. They frequently produce:

  • The "Karaoke Effect": Words accumulating screen-by-screen (e.g., "Hello", "Hello world", "Hello world!").
  • Micro-Glitches: Subtitle frames lasting milliseconds that are invisible to humans but break TTS/dubbing scripts.
  • Metadata Clutter: Tags like align:start, <c>, <b> or <i> that mess up text processing.

whisper-vtt2srt is the bridge between raw AI output and your production pipeline. It stabilizes and normalizes the text, making it safe for Text-to-Speech (TTS) generation, video players, and NLP tasks.


📖 Table of Contents


👀 See the Difference (Before vs After)

🚧 Raw Input — (Typical output from YouTube/Whisper - with "Karaoke Effect")

Notice the accumulated text, repetitive lines, and internal tagging.

WEBVTT
Kind: captions
Language: en

00:00:00.640 --> 00:00:03.110 align:start position:0%
 
APIs<00:00:01.280><c> are</c><00:00:01.520><c> everywhere.</c><00:00:02.399><c> They</c><00:00:02.639><c> power</c><00:00:02.960><c> your</c>

00:00:03.110 --> 00:00:03.120 align:start position:0%
APIs are everywhere. They power your
 

00:00:03.120 --> 00:00:05.430 align:start position:0%
APIs are everywhere. They power your
apps,<00:00:03.600><c> your</c><00:00:03.840><c> payment</c><00:00:04.160><c> systems,</c><00:00:04.880><c> your</c><00:00:05.120><c> cloud</c>

00:00:05.430 --> 00:00:05.440 align:start position:0%
apps, your payment systems, your cloud
 

00:00:05.440 --> 00:00:07.829 align:start position:0%
apps, your payment systems, your cloud
services,<00:00:06.560><c> pretty</c><00:00:06.879><c> much</c><00:00:07.120><c> every</c><00:00:07.440><c> piece</c><00:00:07.680><c> of</c>

00:00:07.829 --> 00:00:07.839 align:start position:0%
services, pretty much every piece of
 

00:00:07.839 --> 00:00:10.470 align:start position:0%
services, pretty much every piece of
Cleaned Output  — (Processed by whisper-vtt2srt)

Clean, stable, and ready for TTS input, YouTube, Netflix or standard players.

1
00:00:00,640 --> 00:00:05,430
APIs are everywhere. They power your

2
00:00:03,120 --> 00:00:07,829
apps, your payment systems, your cloud

3
00:00:05,440 --> 00:00:10,470
services, pretty much every piece of

🚀 Key Features

  • 🛡️ Stabilization Strategy Intelligently detects and merges accumulating text blocks ("Karaoke Effect"), preventing the rapid flashing of partial sentences. Essential for generating smooth audio in TTS pipelines, video dubbing, and subtitles.

  • 🎵 Sound Description Removal Automatically filters out non-speech elements like [Music], [Applause], or [Laughter], ensuring your TTS voice doesn't try to read stage directions.

  • 🧹 Glitch Filtering Automatically removes subtitle blocks with insignificant duration (< 50ms) that can cause audio generation errors or player flickering.

  • ✨ Smart Normalization Strips VTT-specific metadata (align:start, position:0%), removes internal tags (<c>, <00:00:00>), and cleans up inconsistent whitespace ensuring pure text output.

  • ⚡ Zero Dependencies Built with pure Python standard library. Lightweight and easy to install in any environment (Linux, Windows, Docker).

  • 🔧 Configurable Strictness Every cleaning step is optional. You enable exactly what your pipeline needs.

📦 Installation

pip install whisper-vtt2srt

📘 How to Use

💻 CLI Usage

Process files directly from the command line:

# Convert a Single File
whisper-vtt2srt input.vtt

# Batch Convert a Folder
whisper-vtt2srt ./my_dataset

# Recursive Conversion (subfolders included)
whisper-vtt2srt ./my_dataset --recursive

# Handle Legacy Encodings (e.g., Latin-1)
whisper-vtt2srt input_latin.vtt --encoding ISO-8859-1

# Keep the "karaoke" effect (disable deduplication)
whisper-vtt2srt input.vtt --no-karaoke

Command Help

usage: whisper-vtt2srt [-h] [-r] [-e ENCODING] [--no-karaoke] [--keep-glitches] [--keep-formatting]
               [--keep-metadata] [--merge-short-lines]
               input [output]

Convert WebVTT to SRT with professional cleaning.

positional arguments:
  input                 Input VTT file or directory
  output                Output SRT file or directory (optional)

options:
  -h, --help            show this help message and exit
  -r, --recursive       Recursively process directories
  -e ENCODING, --encoding ENCODING
                        Input file encoding (default: utf-8)
  --no-karaoke          Disable anti-karaoke filter (keep accumulating text)
  --keep-sound-descriptions
                        Keep sound descriptions like [Music] or [Applause]
  --keep-glitches       Keep short <50ms blocks
  --keep-formatting     Keep VTT tags (bold, italic, colors)
  --keep-metadata       Keep metadata tags (align:start, position:0%)
  --merge-short-lines   Aggressively merge short lines into single lines
  --max-line-length MAX_LINE_LENGTH
                        Maximum line length allowed when merging short lines (default: 42, like YouTube/Netflix)

🐍 Python API Usage

Easily integrate whisper-vtt2srt into your own Python pipelines. The library exports a high-level Pipeline class for full control.

Basic Conversion

from whisper_vtt2srt import Pipeline

# 1. Initialize
pipeline = Pipeline()

# 2. Read input
with open("subs.vtt", "r", encoding="utf-8") as f:
    raw_vtt_content = f.read()

# 3. Convert raw VTT content
srt_content = pipeline.convert(raw_vtt_content)

# 4. Use the clean SRT (e.g., send to TTS engine, save to file, render in player, etc.)
print(srt_content)

Advanced Control

You can customize the cleaning options if needed:

Just pass a CleaningOptions object to the Pipeline constructor to toggle specific cleaning rules.

from whisper_vtt2srt import CleaningOptions, Pipeline

# Configure strictness
options = CleaningOptions(
    remove_pixelation=True,    # Fix Karaoke effect
    remove_sound_descriptions=True, # Remove [Music], [Applause]
    remove_glitches=True,      # Remove <50ms blocks
    simplify_formatting=True,  # Strip tags like <c> or <b> and fix whitespace
    remove_metadata=True,      # Clean VTT positioning
    merge_short_lines=False,   # Aggressively merge short lines
    max_line_length=42         # Max length constraint for merging
)

pipeline = Pipeline(options)

🧠 How It Works

  1. Parser (State Machine): Robustly reads messy VTT files, handling multi-line strings and irregular spacing.
  2. Deduplication Engine: Uses a sliding window to identify comparison patterns between blocks. If a block is just a prefix of the next one (common in AI streams), it is merged or removed to stabilize the text.
  3. Filter Layer: Applies duration checks and regex cleaning to ensure the final output is compliant with the SubRip (SRT) standard.

📆 Changelog

Project history and updates are tracked in CHANGELOG.md.

🤝 Contributing

Contributions are welcome! We follow a strict SOLID architecture. See CONTRIBUTING.md for details.

📜 License

MIT License - see LICENSE.

📚 Reference

If you use this library in your research or project, please cite it as:

@software{whisper_vtt2srt,
  author = {Jorcelino Junior},
  title = {whisper-vtt2srt: A robust WebVTT to SRT converter for AI subtitles},
  year = {2026},
  url = {https://github.com/jorcelinojunior/whisper-vtt2srt}
}

Saved you time? Helped your project?

Support independent open-source development!

Buy Me A Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_vtt2srt-0.1.0.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisper_vtt2srt-0.1.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file whisper_vtt2srt-0.1.0.tar.gz.

File metadata

  • Download URL: whisper_vtt2srt-0.1.0.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisper_vtt2srt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ba290c4b41c945bf4cb072022c5258fdaf057d8eb67b73036643cd7f01662c35
MD5 de3a619c2dc9eec91c3ceb87fe6d75e5
BLAKE2b-256 b982b331d570e0dcb0cc24056c81b2b57b705db3e81d8f70280f32c061b8a57f

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_vtt2srt-0.1.0.tar.gz:

Publisher: release.yml on jorcelinojunior/whisper-vtt2srt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file whisper_vtt2srt-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for whisper_vtt2srt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9ef0f9a230c127b4d7e42f4f3cca491960782c7f30407ee163bd51d7d9dba2e
MD5 f5680f9ea4fa163efba2691f037f4816
BLAKE2b-256 5eaed42e2772e5acce8f991a5eb69d4dfe7d9db3afa8b224398ae3f037dfb2f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_vtt2srt-0.1.0-py3-none-any.whl:

Publisher: release.yml on jorcelinojunior/whisper-vtt2srt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page