Generate VTT subtitles with timestamps snapped to voice onset using WhisperX forced alignment

These details have not been verified by PyPI

Project links

Project description

vtt-synced-voice

Generate VTT subtitles with timestamps precisely snapped to voice onset using WhisperX forced alignment.

日本語版 README はこちら

Features

Word-level timestamp alignment via WhisperX (Whisper + wav2vec2 forced alignment)
FCP-style peak normalization for recording-level-independent silence detection
Bidirectional onset detection: backward scan when CTC start is inside voice, forward scan when in silence
Guaranteed silence gap between cues (100ms minimum)

Installation

1. Install ffmpeg

macOS

brew install ffmpeg

Windows

winget install ffmpeg

Linux (Debian / Ubuntu)

sudo apt install ffmpeg

2. Install PyTorch

WhisperX runs on PyTorch. GPU (CUDA) is significantly faster than CPU for transcription (roughly 10–20x). Install the build that matches your environment.

macOS — CPU only (no CUDA support on macOS)

pip install torch torchaudio

Windows — CUDA 12.8 (recommended for RTX 30xx / 40xx and newer)

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128

Windows — CUDA 11.8 (for older GPUs such as GTX 10xx / 20xx)

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

Windows — CPU only (no NVIDIA GPU)

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu

Linux — CUDA 12.8 (recommended for RTX 30xx / 40xx and newer)

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128

Linux — CUDA 11.8 (for older GPUs such as GTX 10xx / 20xx)

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

Linux — CPU only (no NVIDIA GPU)

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu

To check your CUDA version: nvidia-smi (Windows/Linux). If you don't have an NVIDIA GPU, use the CPU build. For the full list of builds, see the PyTorch installation guide.

3. Install vtt-synced-voice

pip install vtt-synced-voice

Usage

from vtt_synced_voice import transcribe

transcribe(
    audio_file="sample.m4a",
    output_file="output.vtt",
    language="ja",           # "ja" / "en" / etc.
    model="large-v2",        # "small" / "medium" / "large-v2"
    device="cpu",            # "cpu" / "cuda"
    margin_before=0.066,     # seconds to shift start earlier after onset detection
    margin_after=0.0,        # seconds to extend end
    silence_threshold=0.001, # RMS threshold after peak normalization
    verbose=True,
)

`silence_threshold`

After peak normalization, complete silence ≈ 0.0 and voiced speech ≈ 0.05–1.0. The default 0.001 works well for clean recordings with no background noise. Use verbose=True to inspect onset detection results and adjust if needed.

Requirements

Python 3.10+
ffmpeg (system)
numpy
whisperx

Development

Setup

macOS / Linux

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Windows

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

Running tests

python -m pytest tests/ -v

33 tests covering vtt_io, onset, and cue_builder modules.

Manual test with a local audio file

Place an audio file in audio_input/, then run:

python test_run.py

Output is written to vtt_output/test_package.vtt.

Project structure

src/vtt_synced_voice/
├── __init__.py       # exports transcribe()
├── transcriber.py    # transcribe() entry point, ffmpeg conversion, WhisperX calls
├── onset.py          # find_onset() — bidirectional voice onset detection
├── cue_builder.py    # build_cues_from_segments() — WhisperX result → VttCue
└── vtt_io.py         # VttCue dataclass, read_vtt(), write_vtt(), format_timestamp()

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vtt_synced_voice-0.1.1.tar.gz (606.3 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vtt_synced_voice-0.1.1-py3-none-any.whl (10.5 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file vtt_synced_voice-0.1.1.tar.gz.

File metadata

Download URL: vtt_synced_voice-0.1.1.tar.gz
Upload date: Apr 15, 2026
Size: 606.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for vtt_synced_voice-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`fded1b76b9044af1b38f2bcd385ca1fb10d919a93a34976bef1d5f15ff5cdd31`
MD5	`6be2e01cc7b79466556b961567241543`
BLAKE2b-256	`a915956787d71f67780925054c9e13460256b97a4a1bd2afdeec0412037cedf0`

See more details on using hashes here.

File details

Details for the file vtt_synced_voice-0.1.1-py3-none-any.whl.

File metadata

Download URL: vtt_synced_voice-0.1.1-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 10.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for vtt_synced_voice-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`42b196037ef26745f9b1cbfe212e289e5a092cf3afdc11ba6fc57a4d7bb3369b`
MD5	`3eb29de25e53e2329e853bbf72e3f7a6`
BLAKE2b-256	`261582aaff19e58c61a6d1b61e984f8a2503c7a354365b64fc4b11a48f57ddc4`

See more details on using hashes here.

vtt-synced-voice 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

vtt-synced-voice

Features

Installation

1. Install ffmpeg

2. Install PyTorch

3. Install vtt-synced-voice

Usage

`silence_threshold`

Requirements

Development

Setup

Running tests

Manual test with a local audio file

Project structure

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes