Skip to main content

A professional, robust library for converting WebVTT to SRT with advanced cleaning capabilities for AI-generated subtitles.

Project description

Buy Me A Coffee

whisper-vtt2srt Icon

whisper-vtt2srt

A robust, production-grade library designed to convert WebVTT to SRT, turning messy AI transcripts into clean, usable subtitles.

A post-processing tool designed to clean the output from OpenAI Whisper, YouTube Auto-Captions, and other AI transcription services.
Perfect for TTS pipelines, video dubbing, and dataset preparation.

License: MIT Python 3.10+ Code Style: Black PRs Welcome Issues PyPI version


🧠 The Problem with Raw AI Subtitles

AI tools like Whisper are incredible at speech recognition, but their raw VTT output is often chaotic. They frequently produce:

  • The "Karaoke Effect": Words accumulating screen-by-screen (e.g., "Hello", "Hello world", "Hello world!").
  • Micro-Glitches: Subtitle frames lasting milliseconds that are invisible to humans but break TTS/dubbing scripts.
  • Metadata Clutter: Tags like align:start, <c>, <b> or <i> that mess up text processing.

whisper-vtt2srt is the bridge between raw AI output and your production pipeline. It stabilizes and normalizes the text, making it safe for Text-to-Speech (TTS) generation, video players, and NLP tasks.


📖 Table of Contents


👀 See the Difference (Before vs After)

🚧 Raw Input — (Typical output from YouTube/Whisper - with "Karaoke Effect")

Notice the accumulated text, repetitive lines, and internal tagging.

WEBVTT
Kind: captions
Language: en

00:00:00.640 --> 00:00:03.110 align:start position:0%
 
APIs<00:00:01.280><c> are</c><00:00:01.520><c> everywhere.</c><00:00:02.399><c> They</c><00:00:02.639><c> power</c><00:00:02.960><c> your</c>

00:00:03.110 --> 00:00:03.120 align:start position:0%
APIs are everywhere. They power your
 

00:00:03.120 --> 00:00:05.430 align:start position:0%
APIs are everywhere. They power your
apps,<00:00:03.600><c> your</c><00:00:03.840><c> payment</c><00:00:04.160><c> systems,</c><00:00:04.880><c> your</c><00:00:05.120><c> cloud</c>

00:00:05.430 --> 00:00:05.440 align:start position:0%
apps, your payment systems, your cloud
 

00:00:05.440 --> 00:00:07.829 align:start position:0%
apps, your payment systems, your cloud
services,<00:00:06.560><c> pretty</c><00:00:06.879><c> much</c><00:00:07.120><c> every</c><00:00:07.440><c> piece</c><00:00:07.680><c> of</c>

00:00:07.829 --> 00:00:07.839 align:start position:0%
services, pretty much every piece of
 

00:00:07.839 --> 00:00:10.470 align:start position:0%
services, pretty much every piece of
Cleaned Output  — (Processed by whisper-vtt2srt)

Clean, stable, and ready for TTS input, YouTube, Netflix or standard players.

1
00:00:00,640 --> 00:00:05,430
APIs are everywhere. They power your

2
00:00:03,120 --> 00:00:07,829
apps, your payment systems, your cloud

3
00:00:05,440 --> 00:00:10,470
services, pretty much every piece of

🚀 Key Features

  • 🛡️ Stabilization Strategy Intelligently detects and merges accumulating text blocks ("Karaoke Effect"), preventing the rapid flashing of partial sentences. Essential for generating smooth audio in TTS pipelines, video dubbing, and subtitles.

  • 🎵 Sound Description Removal Automatically filters out non-speech elements like [Music], [Applause], or [Laughter], ensuring your TTS voice doesn't try to read stage directions.

  • 🧹 Glitch Filtering Automatically removes subtitle blocks with insignificant duration (< 50ms) that can cause audio generation errors or player flickering.

  • ✨ Smart Normalization Strips VTT-specific metadata (align:start, position:0%), removes internal tags (<c>, <00:00:00>), and cleans up inconsistent whitespace ensuring pure text output.

  • ⚡ Zero Dependencies Built with pure Python standard library. Lightweight and easy to install in any environment (Linux, Windows, Docker).

  • 🔧 Configurable Strictness Every cleaning step is optional. You enable exactly what your pipeline needs.

📦 Installation

pip install whisper-vtt2srt

📘 How to Use

💻 CLI Usage

Process files directly from the command line:

# Convert a Single File
whisper-vtt2srt input.vtt

# Batch Convert a Folder
whisper-vtt2srt ./my_dataset

# Recursive Conversion (subfolders included)
whisper-vtt2srt ./my_dataset --recursive

# Handle Legacy Encodings (e.g., Latin-1)
whisper-vtt2srt input_latin.vtt --encoding ISO-8859-1

# Keep the "karaoke" effect (disable deduplication)
whisper-vtt2srt input.vtt --no-karaoke

Command Help

usage: whisper-vtt2srt [-h] [-r] [-e ENCODING] [--no-karaoke] [--keep-glitches] [--keep-formatting]
               [--keep-metadata] [--merge-short-lines]
               input [output]

Convert WebVTT to SRT with professional cleaning.

positional arguments:
  input                 Input VTT file or directory
  output                Output SRT file or directory (optional)

options:
  -h, --help            show this help message and exit
  -r, --recursive       Recursively process directories
  -e ENCODING, --encoding ENCODING
                        Input file encoding (default: utf-8)
  --no-karaoke          Disable anti-karaoke filter (keep accumulating text)
  --keep-sound-descriptions
                        Keep sound descriptions like [Music] or [Applause]
  --keep-glitches       Keep short <50ms blocks
  --keep-formatting     Keep VTT tags (bold, italic, colors)
  --keep-metadata       Keep metadata tags (align:start, position:0%)
  --merge-short-lines   Aggressively merge short lines into single lines
  --max-line-length MAX_LINE_LENGTH
                        Maximum line length allowed when merging short lines (default: 42, like YouTube/Netflix)

🐍 Python API Usage

Easily integrate whisper-vtt2srt into your own Python pipelines. The library exports a high-level Pipeline class for full control.

Basic Conversion

from whisper_vtt2srt import Pipeline

# 1. Initialize
pipeline = Pipeline()

# 2. Read input
with open("subs.vtt", "r", encoding="utf-8") as f:
    raw_vtt_content = f.read()

# 3. Convert raw VTT content
srt_content = pipeline.convert(raw_vtt_content)

# 4. Use the clean SRT (e.g., send to TTS engine, save to file, render in player, etc.)
print(srt_content)

Advanced Control

You can customize the cleaning options if needed:

Just pass a CleaningOptions object to the Pipeline constructor to toggle specific cleaning rules.

from whisper_vtt2srt import CleaningOptions, Pipeline

# Configure strictness
options = CleaningOptions(
    remove_pixelation=True,    # Fix Karaoke effect
    remove_sound_descriptions=True, # Remove [Music], [Applause]
    remove_glitches=True,      # Remove <50ms blocks
    simplify_formatting=True,  # Strip tags like <c> or <b> and fix whitespace
    remove_metadata=True,      # Clean VTT positioning
    merge_short_lines=False,   # Aggressively merge short lines
    max_line_length=42         # Max length constraint for merging
)

pipeline = Pipeline(options)

🧠 How It Works

  1. Parser (State Machine): Robustly reads messy VTT files, handling multi-line strings and irregular spacing.
  2. Deduplication Engine: Uses a sliding window to identify comparison patterns between blocks. If a block is just a prefix of the next one (common in AI streams), it is merged or removed to stabilize the text.
  3. Filter Layer: Applies duration checks and regex cleaning to ensure the final output is compliant with the SubRip (SRT) standard.

📆 Changelog

Project history and updates are tracked in CHANGELOG.md.

🤝 Contributing

Contributions are welcome! We follow a strict SOLID architecture. See CONTRIBUTING.md for details.

📜 License

MIT License - see LICENSE.

📚 Reference

If you use this library in your research or project, please cite it as:

@software{whisper_vtt2srt,
  author = {Jorcelino Junior},
  title = {whisper-vtt2srt: A robust WebVTT to SRT converter for AI subtitles},
  year = {2026},
  url = {https://github.com/jorcelinojunior/whisper-vtt2srt}
}

Saved you time? Helped your project?

Support independent open-source development!

Buy Me A Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_vtt2srt-0.1.1.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisper_vtt2srt-0.1.1-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file whisper_vtt2srt-0.1.1.tar.gz.

File metadata

  • Download URL: whisper_vtt2srt-0.1.1.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisper_vtt2srt-0.1.1.tar.gz
Algorithm Hash digest
SHA256 62fd46093a63a61265e54d55790a6ddc3f137085c463b0ac571fd095b5fe5ce7
MD5 bac2be0eeb7fd936eb3876b355d26751
BLAKE2b-256 38a6abdbc958b1803020fbd13292cfb0c59a3ef56dee172c1c58bffc569232c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_vtt2srt-0.1.1.tar.gz:

Publisher: release.yml on jorcelinojunior/whisper-vtt2srt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file whisper_vtt2srt-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for whisper_vtt2srt-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f008984a9956bf1ed7fd666da5815ebe5de9c528c1cfd823088d613fae6c4142
MD5 a0dcf4badf07f2a78cd7f55671e5b6df
BLAKE2b-256 1b66d4d99950acef9bd95b60eeffe77c5fc6cfbbab189e6abf305477474dad35

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_vtt2srt-0.1.1-py3-none-any.whl:

Publisher: release.yml on jorcelinojunior/whisper-vtt2srt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page