Deck-spec to narrated MP4: TTS via ElevenLabs, frame capture via web-overlay, audio mix via audio-arrange, video assembly via video-arrange.
Project description
talk-cast
Deck-spec to narrated MP4: TTS via ElevenLabs, frame capture via web-overlay, audio mix via audio-arrange, video assembly via video-arrange.
Built at Trollfabriken AITrix AB to close the loop: AIMOS Insight audit reports, Granskning case briefs,
and civic-education explainers all begin as Deck objects authored by an LLM, and end as narrated videos
posted to the news site — without leaving Python or paying a video-API vendor. Uses ElevenLabs for
narration, your audio-arrange for the mix, your web-overlay for frame capture, and your
video-arrange for the final assembly. Re-rendering after editing one slide takes seconds, not minutes.
What it solves
| Problem | How talk-cast fixes it |
|---|---|
| TTS calls cost money on every re-render | Per-slide audio cache; unchanged slides reuse the cached MP3 |
| Frame capture needs a real browser | web-overlay drives Playwright Chromium headlessly |
| Audio timing drifts from slide duration | audio-arrange trims/pads each clip to match slide duration exactly |
| Assembling MP4 from frames + audio requires ffmpeg knowledge | video-arrange wraps ffmpeg; one call produces the final file |
| Different voice per section is messy to wire up | NarrateConfig maps slide index ranges to ElevenLabs voice IDs |
| Subtitle track is optional but painful to add | talk-cast[subtitles] writes a WebVTT file alongside the MP4 |
Installation
pip install talk-cast
With subtitle support:
pip install "talk-cast[subtitles]"
Development extras:
pip install "talk-cast[dev]"
Runtime requirements
ffmpeg must be on PATH:
# Ubuntu / Debian
sudo apt-get install ffmpeg
# macOS
brew install ffmpeg
# Windows
choco install ffmpeg
Playwright Chromium for frame capture:
python -m playwright install chromium
ElevenLabs API key — set the environment variable:
export ELEVENLABS_API_KEY="your-key-here"
Quick start
from deck_spec import Deck
from talk_cast import NarrateConfig, cast
# Load a deck authored by an LLM or built by hand
deck = Deck.model_validate_json(open("my_deck.json").read())
config = NarrateConfig(
voice_id="21m00Tcm4TlvDq8ikWAM", # ElevenLabs voice ID
slide_duration=8.0, # seconds per slide
output_path="output/my_video.mp4",
cache_dir=".talk-cast-cache", # skip TTS if audio already cached
)
# Render the full narrated video
cast(deck, config)
Re-run after editing one slide — only that slide's TTS call is repeated. All other audio is served from cache.
The pipeline
Deck object
│
① Read slides + speaker notes
│
② Check cache (.talk-cast-cache/)
│ │
│ hit ─┘ miss ─► ③ ElevenLabs TTS → MP3 → cache
│
④ audio-arrange: trim / pad each MP3 to slide_duration
│
⑤ slide-render: render each slide to HTML
│
⑥ web-overlay: Playwright Chromium captures PNG frame per slide
│
⑦ Assemble per-slide: frame PNG + padded MP3
│
⑧ video-arrange: encode each slide segment to MP4 clip
│
⑨ video-arrange: concatenate all clips → final MP4
│
⑩ (optional) write WebVTT subtitle file alongside MP4
Each step is independently testable. Steps ③–④ are skipped when TALK_CAST_SKIP_LIVE_TTS=1.
Configuration
NarrateConfig is a Pydantic model. All fields have defaults except voice_id.
| Field | Type | Default | Description |
|---|---|---|---|
voice_id |
str |
required | ElevenLabs voice ID for narration |
voice_map |
dict[int, str] |
{} |
Override voice per slide index (0-based) |
slide_duration |
float |
8.0 |
Seconds each slide is held on screen |
output_path |
str | Path |
"output.mp4" |
Destination MP4 file |
cache_dir |
str | Path |
".talk-cast-cache" |
Directory for cached TTS audio |
resolution |
tuple[int, int] |
(1920, 1080) |
Frame resolution in pixels |
fps |
int |
30 |
Frames per second in the output video |
model_id |
str |
"eleven_multilingual_v2" |
ElevenLabs model |
stability |
float |
0.5 |
ElevenLabs stability (0.0–1.0) |
similarity_boost |
float |
0.75 |
ElevenLabs similarity boost (0.0–1.0) |
subtitles |
bool |
False |
Write a .vtt file alongside the MP4 |
theme |
str |
"default" |
slide-render theme name |
Voice map example
Assign a different voice to slides 5–9:
config = NarrateConfig(
voice_id="21m00Tcm4TlvDq8ikWAM",
voice_map={5: "AZnzlk1XvdvUeBnXmlld", 6: "AZnzlk1XvdvUeBnXmlld"},
slide_duration=10.0,
output_path="output/report.mp4",
)
CLI
Render a deck file to MP4:
talk-cast render my_deck.json --voice 21m00Tcm4TlvDq8ikWAM --output output/video.mp4
Set slide duration to 12 seconds and enable subtitles:
talk-cast render my_deck.json \
--voice 21m00Tcm4TlvDq8ikWAM \
--duration 12 \
--subtitles \
--output output/video.mp4
Purge the TTS cache (forces re-synthesis on next render):
talk-cast cache clear
Inspect the cache — see which slides have audio:
talk-cast cache list
Validate a deck before rendering (runs deck-spec validation):
talk-cast validate my_deck.json
Package structure
talk-cast/
├── src/
│ └── talk_cast/
│ ├── __init__.py ← public API: cast(), NarrateConfig
│ ├── cli.py ← talk-cast entry point
│ ├── narrate.py ← TTS orchestration and cache logic
│ ├── capture.py ← web-overlay frame capture per slide
│ ├── assemble.py ← video-arrange + audio-arrange wiring
│ ├── config.py ← NarrateConfig Pydantic model
│ ├── cache.py ← cache read/write helpers
│ └── py.typed ← PEP 561 marker
├── tests/
│ ├── fixtures/ ← small JSON decks + reference WAVs
│ ├── test_narrate.py
│ ├── test_capture.py
│ ├── test_assemble.py
│ └── test_cli.py
├── pyproject.toml
├── README.md
└── LICENSE
© Trollfabriken AITrix AB — MIT licensed
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file talk_cast-0.1.0.tar.gz.
File metadata
- Download URL: talk_cast-0.1.0.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88179ae92b24b657ec2c6420967f986d854b91695aab4f9dc66eb503d3511cc0
|
|
| MD5 |
1b3e73c156f1f9747fbca9dc08ba9d18
|
|
| BLAKE2b-256 |
d5b6fd7fcf10a09d694a3ee571b601cd81e3162f825f96179894d412097bb363
|
Provenance
The following attestation bundles were made for talk_cast-0.1.0.tar.gz:
Publisher:
release.yml on tomastimelock/talk-cast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
talk_cast-0.1.0.tar.gz -
Subject digest:
88179ae92b24b657ec2c6420967f986d854b91695aab4f9dc66eb503d3511cc0 - Sigstore transparency entry: 1602221726
- Sigstore integration time:
-
Permalink:
tomastimelock/talk-cast@b06f9d29643bcaf70442fa2a36cebbbd82dd73d9 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/tomastimelock
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b06f9d29643bcaf70442fa2a36cebbbd82dd73d9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file talk_cast-0.1.0-py3-none-any.whl.
File metadata
- Download URL: talk_cast-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e5f562821aedd180d016d2107e32ad6f097917d77444ee0abf38c986d70335a
|
|
| MD5 |
8d69bac9976d61bbe6db61c9ede16393
|
|
| BLAKE2b-256 |
63634804a67cfe40e49be2ed164a95621ea779e3a6a99851ccfd828d75a17ea6
|
Provenance
The following attestation bundles were made for talk_cast-0.1.0-py3-none-any.whl:
Publisher:
release.yml on tomastimelock/talk-cast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
talk_cast-0.1.0-py3-none-any.whl -
Subject digest:
8e5f562821aedd180d016d2107e32ad6f097917d77444ee0abf38c986d70335a - Sigstore transparency entry: 1602221757
- Sigstore integration time:
-
Permalink:
tomastimelock/talk-cast@b06f9d29643bcaf70442fa2a36cebbbd82dd73d9 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/tomastimelock
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b06f9d29643bcaf70442fa2a36cebbbd82dd73d9 -
Trigger Event:
push
-
Statement type: