Skip to main content

Burn precisely-timed captions into video using forced alignment.

Project description

subcap

A one-command captioning pipeline built around WhisperX's forced alignment. Give it a video and a transcript — it handles audio extraction, alignment, subtitle segmentation, styling, and burn-in encoding.

Why this exists

WhisperX solves the hard problem: using wav2vec2 to map each word of a known transcript to its exact position in audio. But WhisperX outputs raw word timestamps — turning those into readable, styled, burned-in captions is still a non-trivial amount of glue code per video.

subcap is that glue code, packaged as a CLI:

  • Subtitle segmentation — groups aligned words into readable chunks with sentence-boundary breaks, line wrapping, duration caps, and proper gaps between cues
  • Styled ASS generation — four presets (modern, outline, minimal, bold), auto-adapted for landscape vs portrait video
  • ffmpeg burn-in — re-encodes to H.264, H.265, or ProRes with a single --quality flag
  • SRT bypass — if you already have timed subtitles, it skips alignment and goes straight to styling + burn-in

Without subcap, getting from video + transcript to video with burned-in captions requires chaining WhisperX, writing your own segmentation logic, hand-crafting ASS files, and orchestrating ffmpeg. subcap is subcap video.mov transcript.txt.

Install

pip install subcap

Requires Python 3.10–3.12 and ffmpeg with libass support.

On first run, subcap downloads the wav2vec2 alignment model (~360 MB).

Usage

# Align a transcript and burn captions in
subcap video.mov transcript.txt -o output.mp4

# Use an existing SRT file (skips alignment)
subcap video.mov subtitles.srt -o output.mp4

# Choose a style
subcap video.mov transcript.txt --style outline

# ProRes output for editing
subcap video.mov transcript.txt --quality studio -o output.mov

# Portrait/vertical video (auto-detected)
subcap shorts.mp4 transcript.txt -o shorts_captioned.mp4

Options

subcap <video> <transcript> [options]

  -o, --output          Output path (default: <input>_captioned.mp4)
  --style               modern | outline | minimal | bold (default: modern)
  --quality             standard | high | studio (default: standard)
  --max-lines           Max lines per subtitle (default: 2)
  --max-chars           Max characters per line (default: auto)
  --line-spacing        Gap between lines in px (default: auto)
  --position            bottom | center | top (default: bottom)

Styles

Preset Look
modern White bold text, semi-transparent dark box
outline White text with black outline
minimal Lighter weight, subtle shadow
bold Large text, opaque dark box

Quality

Preset Codec Use case
standard H.264 Sharing, uploading
high H.265 Smaller files
studio ProRes 422 Editing, broadcast

Pipeline

  1. Extract audio — mono 16 kHz WAV via ffmpeg
  2. Force-align (WhisperX / wav2vec2) — map each word of the transcript to its exact position in the audio
  3. Segment (subcap) — group words into readable subtitle cues, break at sentence boundaries, wrap long lines, enforce min/max display duration, insert gaps
  4. Style (subcap) — generate ASS with the selected preset, adapted to aspect ratio
  5. Burn in (ffmpeg) — re-encode with hardcoded subtitles

Steps 1, 2, and 5 are wrappers around existing tools. Steps 3 and 4 are what subcap adds. Because the transcript text is fixed and only timing is being solved, alignment stays precise even for fast speech, accents, or noisy audio — conditions that break speech-to-text approaches.

Transcript notes

Your transcript must match what's actually said in the audio. Small edits are tolerated, but missing or extra sentences will cause alignment failures. If the speaker ad-libs or skips text, update the transcript to match the final delivery.

Acknowledgments

Built on:

  • WhisperX — Phoneme-level forced alignment using wav2vec2
  • wav2vec2 — Self-supervised speech model used as the acoustic backbone for alignment
  • ffmpeg — Video encoding and subtitle rendering via libass

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subcap-0.2.2.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

subcap-0.2.2-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file subcap-0.2.2.tar.gz.

File metadata

  • Download URL: subcap-0.2.2.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for subcap-0.2.2.tar.gz
Algorithm Hash digest
SHA256 6faceb9a02fa71742bc95433aa262648d14207de73dcf42d2e8b38de21725fdf
MD5 f44103afec4f5ea9b6ab231bc5e70e4f
BLAKE2b-256 afaec30a7026edca173b33a0d97d1a8bda9da43f9104a03a8833e0b05ccd3c1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for subcap-0.2.2.tar.gz:

Publisher: publish.yml on bighippoman/subcap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subcap-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: subcap-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for subcap-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a2bb97d22152891562a06f6e01f7af013634612b112453d38688d781f9a8e84e
MD5 48078ee7a5803f4365879f43db18dd42
BLAKE2b-256 e8e896b453e37425cfa0c0035035a22b851ebeefc6944d2e43ec3ca92503b873

See more details on using hashes here.

Provenance

The following attestation bundles were made for subcap-0.2.2-py3-none-any.whl:

Publisher: publish.yml on bighippoman/subcap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page