20 audio-driven caption styles for video. Composes text-fx + lyric-sync + audio-arrange.
Project description
caption-cast
20 audio-driven caption styles for video.
caption-cast renders styled, timed captions onto video. It provides 20 named caption styles, each a fixed composition of text-fx effects, timing behaviour, and layout rules. It does not do speech transcription, audio separation, or beat detection itself — those are handled by sibling packages (lyric-sync and audio-arrange respectively), which caption-cast can consume when installed as optional extras.
The library was built to serve the caption layer of Trollfabriken's automated video pipeline, where clip metadata already carries subtitle or lyric timecodes and the remaining work is rendering them consistently across styles.
What it solves
| Pain point | Resolution |
|---|---|
| Karaoke-style word-by-word highlighting requires tight timing control and per-word colour switching | burn_lyrics() with any karaoke_* style handles word-level timing from a pysubs2 SSAFile or from lyric-sync output |
| Subtitle burn-in varies wildly in font, position, and animation across tools | 20 named styles give a reproducible, versioned rendering contract; pick a slug, get the same output every time |
| Beat-synchronised caption pulses need audio analysis glued to render | beat_synced_captions() accepts an audio-arrange beat grid and schedules caption emphasis automatically |
Installation
Core package (requires ffmpeg on PATH):
pip install caption-cast
With lyric-sync support (LRC / MusicXML ingestion, word-level timing):
pip install "caption-cast[lyrics]"
With audio-arrange support (beat detection, tempo analysis):
pip install "caption-cast[beats]"
Full extras:
pip install "caption-cast[all]"
Development:
pip install "caption-cast[dev]"
ffmpeg must be available on your system PATH. Install via:
- Ubuntu/Debian:
sudo apt-get install ffmpeg - macOS:
brew install ffmpeg - Windows:
choco install ffmpegor download from https://ffmpeg.org/download.html
Quick start
1. Burn subtitles from an SRT file
from caption_cast import burn_subtitles
burn_subtitles(
video="input.mp4",
subtitles="dialogue.srt",
style="clean_lower_third",
output="output.mp4",
)
The subtitles parameter accepts a path to any pysubs2-readable format (SRT, ASS, VTT, SSA).
2. Karaoke word-by-word highlighting from an LRC file
from caption_cast import burn_lyrics
burn_lyrics(
video="music_video.mp4",
lyrics="song.lrc", # LRC with word-level timestamps
style="karaoke_neon",
output="music_video_captioned.mp4",
highlight_color="#FF3399",
base_color="#FFFFFF",
)
If lyric-sync is installed, you can pass a lyric_sync.LyricTrack object directly instead of a
file path. See lyric-sync's README for how to produce one from MusicXML or Spotify API data.
3. Beat-synchronised caption pulses
from caption_cast import beat_synced_captions
from audio_arrange import analyse_beats # requires caption-cast[beats]
beat_grid = analyse_beats("track.wav")
beat_synced_captions(
video="clip.mp4",
subtitles="lyrics.srt",
beat_grid=beat_grid,
style="pulse_bold",
output="clip_captioned.mp4",
)
On each beat onset the active caption receives a scale/opacity pulse governed by the style's
beat_emphasis parameters. The default pulse lasts 80 ms and can be overridden per call.
The 20 styles
| Slug | Display name | Primary use case |
|---|---|---|
clean_lower_third |
Clean Lower Third | Documentary, interview subtitles |
clean_center |
Clean Center | Narrative subtitles, no music |
minimal_top |
Minimal Top | B-roll with top-aligned text |
bold_lower_third |
Bold Lower Third | Social media, short-form video |
bold_center |
Bold Center | Impact titles doubling as captions |
outline_lower_third |
Outline Lower Third | Subtitles over bright or variable backgrounds |
outline_center |
Outline Center | Lyrics on music videos with complex backgrounds |
drop_shadow_lower |
Drop Shadow Lower | Standard broadcast-style lower third |
karaoke_classic |
Karaoke Classic | Word-by-word left-to-right wipe highlight |
karaoke_neon |
Karaoke Neon | Word highlight with neon glow effect |
karaoke_fill |
Karaoke Fill | Word highlight with solid fill colour swap |
karaoke_bounce |
Karaoke Bounce | Word highlight with vertical bounce on activation |
karaoke_wave |
Karaoke Wave | Sequential per-character wave on active word |
pulse_bold |
Pulse Bold | Caption scales on beat onset, bold weight |
pulse_glow |
Pulse Glow | Caption glow radius pulses on beat onset |
pulse_color |
Pulse Color | Caption colour shifts on beat onset |
fade_word |
Fade Word | Each word fades in on its start timecode |
slide_up |
Slide Up | Line slides up into position on entry |
typewriter |
Typewriter | Characters reveal left-to-right at constant rate |
pop_center |
Pop Center | Caption pops to full size from zero scale |
All 20 styles are parametric. Every parameter has a default; pass keyword arguments to apply_caption,
burn_lyrics, or burn_subtitles to override individual parameters without changing the base style.
Lyrics input formats
caption-cast accepts timed lyrics in three ways:
pysubs2-readable files — SRT, ASS, SSA, VTT. Word-level timing in ASS/SSA format is supported
for karaoke styles. Pass the file path as the lyrics or subtitles argument.
LRC files — Standard LRC with line timestamps. Enhanced LRC with word-level <mm:ss.xx> tags
is required for karaoke styles. Pass the file path; caption-cast parses LRC internally via pysubs2.
lyric-sync LyricTrack — When lyric-sync >= 0.1 is installed, pass a LyricTrack object
directly. This is the preferred path when you need to ingest MusicXML, parse Spotify's lyrics API,
or align phoneme boundaries. lyric-sync produces word-level timecodes that map cleanly onto the
karaoke styles.
# With lyric-sync installed
from lyric_sync import parse_lrc, align_words
from caption_cast import burn_lyrics
track = parse_lrc("song.lrc")
aligned = align_words(track, audio="song.wav") # optional phoneme alignment
burn_lyrics(
video="clip.mp4",
lyrics=aligned, # LyricTrack object accepted directly
style="karaoke_wave",
output="out.mp4",
)
CLI
caption-cast ships a CLI entry point at caption-cast.
List all styles:
caption-cast styles
Burn subtitles from the command line:
caption-cast burn \
--video input.mp4 \
--subtitles dialogue.srt \
--style clean_lower_third \
--output output.mp4
Burn karaoke lyrics:
caption-cast lyrics \
--video music_video.mp4 \
--lyrics song.lrc \
--style karaoke_neon \
--highlight-color "#FF3399" \
--output out.mp4
Inspect a style's parameters:
caption-cast info karaoke_neon
Get version:
caption-cast --version
All burn and lyrics subcommand options map 1-to-1 to the Python API keyword arguments.
Run caption-cast <subcommand> --help for a full parameter list.
Composition with the Trollfabriken stack
caption-cast sits between the ingestion packages (lyric-sync, audio-arrange) and the render engine (text-fx). The dependency chain is:
lyric-sync ──┐
├──► caption-cast ──► text-fx ──► ffmpeg
audio-arrange ┘
caption-cast does not need lyric-sync or audio-arrange at runtime if you supply pre-timed subtitle files. Install the extras only when you need programmatic lyrics ingestion or beat analysis.
For title cards and motion-graphic intro/outro sequences, see title-fx, which builds on text-fx with a separate catalog of 44 cinematic title effects.
Package structure
caption-cast/
src/
caption_cast/
__init__.py # public API: apply_caption, burn_lyrics, burn_subtitles,
api.py # beat_synced_captions, list_styles, get_style_info
cli.py # caption-cast CLI entry point
renderer.py # delegates to text-fx render pipeline
timing.py # timecode parsing, word-level offset resolution
beat.py # beat grid → caption emphasis schedule
styles/
__init__.py # style registry
catalog.py # style definitions (parametric dataclasses)
data/
caption_styles.json # serialised style catalog (shipped in wheel)
tests/
License
MIT. Copyright 2026 Trollfabriken AITrix AB.
Part of the Trollfabriken stack.
- PyPI: https://pypi.org/project/caption-cast/
- Issues: https://github.com/tomastimelock/caption-cast/issues
- text-fx (render engine): https://github.com/tomastimelock/text-fx
- lyric-sync (lyrics ingestion): https://github.com/tomastimelock/lyric-sync
- audio-arrange (beat detection): https://github.com/tomastimelock/audio-arrange
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file caption_cast-0.1.0.tar.gz.
File metadata
- Download URL: caption_cast-0.1.0.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4864f79a088c9ffe0cc706b420e63b0b66e50af9d98343b3834fc1ff2a1e624
|
|
| MD5 |
f40d5f9f1b6cdc769fe8120ec4a143ba
|
|
| BLAKE2b-256 |
375a3ecc05c9a2e29983c484ffb1e6bc3577ae3c7905e449207fb96afa9c5b74
|
Provenance
The following attestation bundles were made for caption_cast-0.1.0.tar.gz:
Publisher:
release.yml on tomastimelock/caption-cast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
caption_cast-0.1.0.tar.gz -
Subject digest:
f4864f79a088c9ffe0cc706b420e63b0b66e50af9d98343b3834fc1ff2a1e624 - Sigstore transparency entry: 1616142283
- Sigstore integration time:
-
Permalink:
tomastimelock/caption-cast@64931062396c6cdffb1ecff438bb6cce3af44579 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/tomastimelock
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@64931062396c6cdffb1ecff438bb6cce3af44579 -
Trigger Event:
push
-
Statement type:
File details
Details for the file caption_cast-0.1.0-py3-none-any.whl.
File metadata
- Download URL: caption_cast-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31cf0d52563facdefd8447abdb28799d1840923c72f96d38acbe517c09457be0
|
|
| MD5 |
5198308e666db18c8b61078d68810d71
|
|
| BLAKE2b-256 |
b611cde5765309dfebb1aedabc5d0075ee7dde72abeba2b18078710fbf8d91dc
|
Provenance
The following attestation bundles were made for caption_cast-0.1.0-py3-none-any.whl:
Publisher:
release.yml on tomastimelock/caption-cast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
caption_cast-0.1.0-py3-none-any.whl -
Subject digest:
31cf0d52563facdefd8447abdb28799d1840923c72f96d38acbe517c09457be0 - Sigstore transparency entry: 1616142299
- Sigstore integration time:
-
Permalink:
tomastimelock/caption-cast@64931062396c6cdffb1ecff438bb6cce3af44579 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/tomastimelock
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@64931062396c6cdffb1ecff438bb6cce3af44579 -
Trigger Event:
push
-
Statement type: