A professional, robust library for converting WebVTT to SRT with advanced cleaning capabilities for AI-generated subtitles.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jorcelinojunior

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Multimedia :: Video
- Text Processing :: Filters

Project description

whisper-vtt2srt Icon

whisper-vtt2srt

A robust, production-grade library designed to convert WebVTT to SRT, turning messy AI transcripts into clean, usable subtitles.

A post-processing tool designed to clean the output from OpenAI Whisper, YouTube Auto-Captions, and other AI transcription services.
Perfect for TTS pipelines, video dubbing, and dataset preparation.

🧠 The Problem with Raw AI Subtitles

AI tools like Whisper are incredible at speech recognition, but their raw VTT output is often chaotic. They frequently produce:

The "Karaoke Effect": Words accumulating screen-by-screen (e.g., "Hello", "Hello world", "Hello world!").
Micro-Glitches: Subtitle frames lasting milliseconds that are invisible to humans but break TTS/dubbing scripts.
Metadata Clutter: Tags like align:start, <c>, <b> or <i> that mess up text processing.

whisper-vtt2srt is the bridge between raw AI output and your production pipeline. It stabilizes and normalizes the text, making it safe for Text-to-Speech (TTS) generation, video players, and NLP tasks.

📖 Table of Contents

👀 See the Difference
🚀 Key Features
📦 Installation
📘 How to Use
- 💻 CLI Usage
  - Command Help
- 🐍 Python API Usage
  - Basic Conversion
  - Advanced Control
🧠 How It Works
📆 Changelog
🤝 Contributing
📜 License
📚 Reference

👀 See the Difference (Before vs After)

🚧 Raw Input — (Typical output from YouTube/Whisper - with "Karaoke Effect")

Notice the accumulated text, repetitive lines, and internal tagging.

WEBVTT
Kind: captions
Language: en

00:00:00.640 --> 00:00:03.110 align:start position:0%
 
APIs<00:00:01.280><c> are</c><00:00:01.520><c> everywhere.</c><00:00:02.399><c> They</c><00:00:02.639><c> power</c><00:00:02.960><c> your</c>

00:00:03.110 --> 00:00:03.120 align:start position:0%
APIs are everywhere. They power your
 

00:00:03.120 --> 00:00:05.430 align:start position:0%
APIs are everywhere. They power your
apps,<00:00:03.600><c> your</c><00:00:03.840><c> payment</c><00:00:04.160><c> systems,</c><00:00:04.880><c> your</c><00:00:05.120><c> cloud</c>

00:00:05.430 --> 00:00:05.440 align:start position:0%
apps, your payment systems, your cloud
 

00:00:05.440 --> 00:00:07.829 align:start position:0%
apps, your payment systems, your cloud
services,<00:00:06.560><c> pretty</c><00:00:06.879><c> much</c><00:00:07.120><c> every</c><00:00:07.440><c> piece</c><00:00:07.680><c> of</c>

00:00:07.829 --> 00:00:07.839 align:start position:0%
services, pretty much every piece of
 

00:00:07.839 --> 00:00:10.470 align:start position:0%
services, pretty much every piece of

✨ Cleaned Output — (Processed by whisper-vtt2srt)

Clean, stable, and ready for TTS input, YouTube, Netflix or standard players.

1
00:00:00,640 --> 00:00:05,430
APIs are everywhere. They power your

2
00:00:03,120 --> 00:00:07,829
apps, your payment systems, your cloud

3
00:00:05,440 --> 00:00:10,470
services, pretty much every piece of

🚀 Key Features

🛡️ Stabilization Strategy Intelligently detects and merges accumulating text blocks ("Karaoke Effect"), preventing the rapid flashing of partial sentences. Essential for generating smooth audio in TTS pipelines, video dubbing, and subtitles.
🎵 Sound Description Removal Automatically filters out non-speech elements like [Music], [Applause], or [Laughter], ensuring your TTS voice doesn't try to read stage directions.
🧹 Glitch Filtering Automatically removes subtitle blocks with insignificant duration (< 50ms) that can cause audio generation errors or player flickering.
✨ Smart Normalization Strips VTT-specific metadata (align:start, position:0%), removes internal tags (<c>, <00:00:00>), and cleans up inconsistent whitespace ensuring pure text output.
⚡ Zero Dependencies Built with pure Python standard library. Lightweight and easy to install in any environment (Linux, Windows, Docker).
🔧 Configurable Strictness Every cleaning step is optional. You enable exactly what your pipeline needs.

📦 Installation

pip install whisper-vtt2srt

📘 How to Use

💻 CLI Usage

Process files directly from the command line:

# Convert a Single File
whisper-vtt2srt input.vtt

# Batch Convert a Folder
whisper-vtt2srt ./my_dataset

# Recursive Conversion (subfolders included)
whisper-vtt2srt ./my_dataset --recursive

# Handle Legacy Encodings (e.g., Latin-1)
whisper-vtt2srt input_latin.vtt --encoding ISO-8859-1

# Keep the "karaoke" effect (disable deduplication)
whisper-vtt2srt input.vtt --no-karaoke

Command Help

usage: whisper-vtt2srt [-h] [-r] [-e ENCODING] [--no-karaoke] [--keep-glitches] [--keep-formatting]
               [--keep-metadata] [--merge-short-lines]
               input [output]

Convert WebVTT to SRT with professional cleaning.

positional arguments:
  input                 Input VTT file or directory
  output                Output SRT file or directory (optional)

options:
  -h, --help            show this help message and exit
  -r, --recursive       Recursively process directories
  -e ENCODING, --encoding ENCODING
                        Input file encoding (default: utf-8)
  --no-karaoke          Disable anti-karaoke filter (keep accumulating text)
  --keep-sound-descriptions
                        Keep sound descriptions like [Music] or [Applause]
  --keep-glitches       Keep short <50ms blocks
  --keep-formatting     Keep VTT tags (bold, italic, colors)
  --keep-metadata       Keep metadata tags (align:start, position:0%)
  --merge-short-lines   Aggressively merge short lines into single lines
  --max-line-length MAX_LINE_LENGTH
                        Maximum line length allowed when merging short lines (default: 42, like YouTube/Netflix)

🐍 Python API Usage

Easily integrate whisper-vtt2srt into your own Python pipelines. The library exports a high-level Pipeline class for full control.

Basic Conversion

from whisper_vtt2srt import Pipeline

# 1. Initialize
pipeline = Pipeline()

# 2. Read input
with open("subs.vtt", "r", encoding="utf-8") as f:
    raw_vtt_content = f.read()

# 3. Convert raw VTT content
srt_content = pipeline.convert(raw_vtt_content)

# 4. Use the clean SRT (e.g., send to TTS engine, save to file, render in player, etc.)
print(srt_content)

Advanced Control

You can customize the cleaning options if needed:

Just pass a CleaningOptions object to the Pipeline constructor to toggle specific cleaning rules.

from whisper_vtt2srt import CleaningOptions, Pipeline

# Configure strictness
options = CleaningOptions(
    remove_pixelation=True,    # Fix Karaoke effect
    remove_sound_descriptions=True, # Remove [Music], [Applause]
    remove_glitches=True,      # Remove <50ms blocks
    simplify_formatting=True,  # Strip tags like <c> or <b> and fix whitespace
    remove_metadata=True,      # Clean VTT positioning
    merge_short_lines=False,   # Aggressively merge short lines
    max_line_length=42         # Max length constraint for merging
)

pipeline = Pipeline(options)

🧠 How It Works

Parser (State Machine): Robustly reads messy VTT files, handling multi-line strings and irregular spacing.
Deduplication Engine: Uses a sliding window to identify comparison patterns between blocks. If a block is just a prefix of the next one (common in AI streams), it is merged or removed to stabilize the text.
Filter Layer: Applies duration checks and regex cleaning to ensure the final output is compliant with the SubRip (SRT) standard.

📆 Changelog

Project history and updates are tracked in CHANGELOG.md.

🤝 Contributing

Contributions are welcome! We follow a strict SOLID architecture. See CONTRIBUTING.md for details.

📜 License

MIT License - see LICENSE.

📚 Reference

If you use this library in your research or project, please cite it as:

@software{whisper_vtt2srt,
  author = {Jorcelino Junior},
  title = {whisper-vtt2srt: A robust WebVTT to SRT converter for AI subtitles},
  year = {2026},
  url = {https://github.com/jorcelinojunior/whisper-vtt2srt}
}

Saved you time? Helped your project?

Support independent open-source development!

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jorcelinojunior

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Multimedia :: Video
- Text Processing :: Filters

Release history Release notifications | RSS feed

0.1.2

Jan 13, 2026

This version

0.1.1

Jan 12, 2026

0.1.0

Jan 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_vtt2srt-0.1.1.tar.gz (22.6 kB view details)

Uploaded Jan 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whisper_vtt2srt-0.1.1-py3-none-any.whl (18.4 kB view details)

Uploaded Jan 12, 2026 Python 3

File details

Details for the file whisper_vtt2srt-0.1.1.tar.gz.

File metadata

Download URL: whisper_vtt2srt-0.1.1.tar.gz
Upload date: Jan 12, 2026
Size: 22.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisper_vtt2srt-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`62fd46093a63a61265e54d55790a6ddc3f137085c463b0ac571fd095b5fe5ce7`
MD5	`bac2be0eeb7fd936eb3876b355d26751`
BLAKE2b-256	`38a6abdbc958b1803020fbd13292cfb0c59a3ef56dee172c1c58bffc569232c0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_vtt2srt-0.1.1.tar.gz:

Publisher: release.yml on jorcelinojunior/whisper-vtt2srt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whisper_vtt2srt-0.1.1.tar.gz
- Subject digest: 62fd46093a63a61265e54d55790a6ddc3f137085c463b0ac571fd095b5fe5ce7
- Sigstore transparency entry: 814717363
- Sigstore integration time: Jan 12, 2026
Source repository:
- Permalink: jorcelinojunior/whisper-vtt2srt@8c77106f40a89f75731700d7b4fb6bfd4ed2d0b3
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/jorcelinojunior
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8c77106f40a89f75731700d7b4fb6bfd4ed2d0b3
- Trigger Event: push

File details

Details for the file whisper_vtt2srt-0.1.1-py3-none-any.whl.

File metadata

Download URL: whisper_vtt2srt-0.1.1-py3-none-any.whl
Upload date: Jan 12, 2026
Size: 18.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisper_vtt2srt-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f008984a9956bf1ed7fd666da5815ebe5de9c528c1cfd823088d613fae6c4142`
MD5	`a0dcf4badf07f2a78cd7f55671e5b6df`
BLAKE2b-256	`1b66d4d99950acef9bd95b60eeffe77c5fc6cfbbab189e6abf305477474dad35`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_vtt2srt-0.1.1-py3-none-any.whl:

Publisher: release.yml on jorcelinojunior/whisper-vtt2srt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whisper_vtt2srt-0.1.1-py3-none-any.whl
- Subject digest: f008984a9956bf1ed7fd666da5815ebe5de9c528c1cfd823088d613fae6c4142
- Sigstore transparency entry: 814717366
- Sigstore integration time: Jan 12, 2026
Source repository:
- Permalink: jorcelinojunior/whisper-vtt2srt@8c77106f40a89f75731700d7b4fb6bfd4ed2d0b3
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/jorcelinojunior
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8c77106f40a89f75731700d7b4fb6bfd4ed2d0b3
- Trigger Event: push

whisper-vtt2srt 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

whisper-vtt2srt

🧠 The Problem with Raw AI Subtitles

📖 Table of Contents

👀 See the Difference (Before vs After)

🚀 Key Features

📦 Installation

📘 How to Use

💻 CLI Usage

Command Help

🐍 Python API Usage

Basic Conversion

Advanced Control

🧠 How It Works

📆 Changelog

🤝 Contributing

📜 License

📚 Reference

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance