Burn precisely-timed captions into video using forced alignment.
Project description
subcap
A one-command captioning pipeline built around WhisperX's forced alignment. Give it a video and a transcript — it handles audio extraction, alignment, subtitle segmentation, styling, and burn-in encoding.
Why this exists
WhisperX solves the hard problem: using wav2vec2 to map each word of a known transcript to its exact position in audio. But WhisperX outputs raw word timestamps — turning those into readable, styled, burned-in captions is still a non-trivial amount of glue code per video.
subcap is that glue code, packaged as a CLI:
- Subtitle segmentation — groups aligned words into readable chunks with sentence-boundary breaks, line wrapping, duration caps, and proper gaps between cues
- Styled ASS generation — four presets (modern, outline, minimal, bold), auto-adapted for landscape vs portrait video
- ffmpeg burn-in — re-encodes to H.264, H.265, or ProRes with a single
--qualityflag - SRT bypass — if you already have timed subtitles, it skips alignment and goes straight to styling + burn-in
Without subcap, getting from video + transcript to video with burned-in captions requires chaining WhisperX, writing your own segmentation logic, hand-crafting ASS files, and orchestrating ffmpeg. subcap is subcap video.mov transcript.txt.
Install
pip install subcap
Requires Python 3.10–3.12 and ffmpeg with libass support.
On first run, subcap downloads the wav2vec2 alignment model (~360 MB).
Usage
# Align a transcript and burn captions in
subcap video.mov transcript.txt -o output.mp4
# Use an existing SRT file (skips alignment)
subcap video.mov subtitles.srt -o output.mp4
# Choose a style
subcap video.mov transcript.txt --style outline
# ProRes output for editing
subcap video.mov transcript.txt --quality studio -o output.mov
# Portrait/vertical video (auto-detected)
subcap shorts.mp4 transcript.txt -o shorts_captioned.mp4
Options
subcap <video> <transcript> [options]
-o, --output Output path (default: <input>_captioned.mp4)
--style modern | outline | minimal | bold (default: modern)
--quality standard | high | studio (default: standard)
--max-lines Max lines per subtitle (default: 2)
--max-chars Max characters per line (default: auto)
--line-spacing Gap between lines in px (default: auto)
--position bottom | center | top (default: bottom)
Styles
| Preset | Look |
|---|---|
modern |
White bold text, semi-transparent dark box |
outline |
White text with black outline |
minimal |
Lighter weight, subtle shadow |
bold |
Large text, opaque dark box |
Quality
| Preset | Codec | Use case |
|---|---|---|
standard |
H.264 | Sharing, uploading |
high |
H.265 | Smaller files |
studio |
ProRes 422 | Editing, broadcast |
Pipeline
- Extract audio — mono 16 kHz WAV via ffmpeg
- Force-align (WhisperX / wav2vec2) — map each word of the transcript to its exact position in the audio
- Segment (subcap) — group words into readable subtitle cues, break at sentence boundaries, wrap long lines, enforce min/max display duration, insert gaps
- Style (subcap) — generate ASS with the selected preset, adapted to aspect ratio
- Burn in (ffmpeg) — re-encode with hardcoded subtitles
Steps 1, 2, and 5 are wrappers around existing tools. Steps 3 and 4 are what subcap adds. Because the transcript text is fixed and only timing is being solved, alignment stays precise even for fast speech, accents, or noisy audio — conditions that break speech-to-text approaches.
Transcript notes
Your transcript must match what's actually said in the audio. Small edits are tolerated, but missing or extra sentences will cause alignment failures. If the speaker ad-libs or skips text, update the transcript to match the final delivery.
Acknowledgments
Built on:
- WhisperX — Phoneme-level forced alignment using wav2vec2
- wav2vec2 — Self-supervised speech model used as the acoustic backbone for alignment
- ffmpeg — Video encoding and subtitle rendering via libass
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file subcap-0.2.1.tar.gz.
File metadata
- Download URL: subcap-0.2.1.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8924ede34f4ff17cc2177bc809872a91fd102ef933d89b5e39b2c32baf62a091
|
|
| MD5 |
b91e1c431e42c60b0926513d8aded119
|
|
| BLAKE2b-256 |
9d74530cffc1168c39ed0d786147ad26871ced4ce9eeeb52dee20964f457ea7d
|
Provenance
The following attestation bundles were made for subcap-0.2.1.tar.gz:
Publisher:
publish.yml on bighippoman/subcap
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
subcap-0.2.1.tar.gz -
Subject digest:
8924ede34f4ff17cc2177bc809872a91fd102ef933d89b5e39b2c32baf62a091 - Sigstore transparency entry: 1281992379
- Sigstore integration time:
-
Permalink:
bighippoman/subcap@3c08c4169a4b8361fce426b744472e44c0a9ed05 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/bighippoman
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3c08c4169a4b8361fce426b744472e44c0a9ed05 -
Trigger Event:
release
-
Statement type:
File details
Details for the file subcap-0.2.1-py3-none-any.whl.
File metadata
- Download URL: subcap-0.2.1-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd6da4b2a60bc68e2a2b2d91442275e18b56cd61bd04b0c61dd50e376399c988
|
|
| MD5 |
055906973f54ed41b68d0778fe408c18
|
|
| BLAKE2b-256 |
b802eca5aa4eb95c853103d3e2583922cad45681dd342af0ba6e2d8c1706b6ae
|
Provenance
The following attestation bundles were made for subcap-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on bighippoman/subcap
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
subcap-0.2.1-py3-none-any.whl -
Subject digest:
cd6da4b2a60bc68e2a2b2d91442275e18b56cd61bd04b0c61dd50e376399c988 - Sigstore transparency entry: 1281992398
- Sigstore integration time:
-
Permalink:
bighippoman/subcap@3c08c4169a4b8361fce426b744472e44c0a9ed05 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/bighippoman
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3c08c4169a4b8361fce426b744472e44c0a9ed05 -
Trigger Event:
release
-
Statement type: