Burn precisely-timed captions into video using forced alignment.
Project description
subcap
Burn precisely-timed captions into video. Give it a video and a transcript — it handles alignment, styling, and encoding.
Unlike speech-to-text tools that guess both what is said and when, subcap uses forced alignment: you provide the transcript, and wav2vec2 maps each word to its exact position in the audio waveform. The result is phoneme-level timing accuracy — no drift, no guessing, no cascading errors.
Install
pip install subcap
Requires Python 3.10–3.12 and ffmpeg with libass support.
On first run, subcap downloads the wav2vec2 alignment model (~360 MB).
Usage
# Align a transcript and burn captions in
subcap video.mov transcript.txt -o output.mp4
# Use an existing SRT file (skips alignment)
subcap video.mov subtitles.srt -o output.mp4
# Choose a style
subcap video.mov transcript.txt --style outline
# ProRes output for editing
subcap video.mov transcript.txt --quality studio -o output.mov
# Portrait/vertical video (auto-detected)
subcap shorts.mp4 transcript.txt -o shorts_captioned.mp4
Options
subcap <video> <transcript> [options]
-o, --output Output path (default: <input>_captioned.mp4)
--style modern | outline | minimal | bold (default: modern)
--quality standard | high | studio (default: standard)
--max-lines Max lines per subtitle (default: 2)
--max-chars Max characters per line (default: auto)
--line-spacing Gap between lines in px (default: auto)
--position bottom | center | top (default: bottom)
Styles
| Preset | Look |
|---|---|
modern |
White bold text, semi-transparent dark box |
outline |
White text with black outline |
minimal |
Lighter weight, subtle shadow |
bold |
Large text, opaque dark box |
Quality
| Preset | Codec | Use case |
|---|---|---|
standard |
H.264 | Sharing, uploading |
high |
H.265 | Smaller files |
studio |
ProRes 422 | Editing, broadcast |
How it works
- Extracts audio from the video
- Runs phoneme-level forced alignment via WhisperX (wav2vec2) to map each word of your transcript to its exact position in the audio
- Segments words into readable subtitle chunks, breaking at sentence boundaries
- Generates styled ASS subtitles adapted to the video's aspect ratio
- Burns captions into the video via ffmpeg
Because the text is fixed and only the timing is being solved, alignment is precise even for fast speech, accents, or overlapping audio — conditions that typically break speech-to-text.
Transcript notes
Your transcript must match what's actually said in the audio. Small edits are tolerated, but missing or extra sentences will cause alignment failures. If the speaker ad-libs or skips text, update the transcript to match the final delivery.
Acknowledgments
Built on:
- WhisperX — Phoneme-level forced alignment using wav2vec2
- wav2vec2 — Self-supervised speech model used as the acoustic backbone for alignment
- ffmpeg — Video encoding and subtitle rendering via libass
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file subcap-0.2.0.tar.gz.
File metadata
- Download URL: subcap-0.2.0.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb2253484bcd5439eb7c42b8eef89469f0b1065009d9890a4307ac79a66c157d
|
|
| MD5 |
9e4fb0b2f098ada2e3ed3f2eb1a8fce6
|
|
| BLAKE2b-256 |
e302cf20ea4395a0d8eb2720b762496e5a470ab4c05285ebcb01ef183edd11ee
|
Provenance
The following attestation bundles were made for subcap-0.2.0.tar.gz:
Publisher:
publish.yml on bighippoman/subcap
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
subcap-0.2.0.tar.gz -
Subject digest:
cb2253484bcd5439eb7c42b8eef89469f0b1065009d9890a4307ac79a66c157d - Sigstore transparency entry: 1281749688
- Sigstore integration time:
-
Permalink:
bighippoman/subcap@cc1727fd8038b46f61ecf86806fd10505da45925 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/bighippoman
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cc1727fd8038b46f61ecf86806fd10505da45925 -
Trigger Event:
release
-
Statement type:
File details
Details for the file subcap-0.2.0-py3-none-any.whl.
File metadata
- Download URL: subcap-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25305b25549589e2db38ddebbc17dcd77aeb3f99ac5eba2ced0967cc270b8f06
|
|
| MD5 |
22c48829efd4cb52ba9e4b13296510a2
|
|
| BLAKE2b-256 |
2753533a2fb5f3c5112997129ddb0f6b0247ae565265132f1cab66b8bd60207d
|
Provenance
The following attestation bundles were made for subcap-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on bighippoman/subcap
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
subcap-0.2.0-py3-none-any.whl -
Subject digest:
25305b25549589e2db38ddebbc17dcd77aeb3f99ac5eba2ced0967cc270b8f06 - Sigstore transparency entry: 1281749767
- Sigstore integration time:
-
Permalink:
bighippoman/subcap@cc1727fd8038b46f61ecf86806fd10505da45925 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/bighippoman
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cc1727fd8038b46f61ecf86806fd10505da45925 -
Trigger Event:
release
-
Statement type: