Skip to main content

Narration-first DSL + audio pipeline for Remotion videos

Project description

Babulus (Voiceover → Video Timing)

Babulus compiles a narration-first DSL into a timed script JSON. Remotion uses that JSON as the source of truth for scene/cue timing by converting seconds → frames at runtime.

The One-Sentence Mental Model

Your .babulus.yml defines IDs + times, Babulus outputs JSON with startSec/endSec, and your Remotion code does two explicit mappings:

  • scene.id → which React scene component to render
  • cue.id → which element/animation to start/show at that time

That’s “the connection”.

Data Shape (JSON)

script.json contains:

  • scenes[]: { id, title, startSec, endSec, cues[] }
  • cues[]: { id, label, startSec, endSec, text, bullets? }

The DSL (YAML)

A .babulus.yml file is a YAML document with a top-level scenes: list.

audio:
  # Optional: default provider for `kind: sfx` clips.
  sfx_provider: elevenlabs

scenes:
  - id: intro
    title: "Intro"
    time: "0s-8s"
    cues:
      - id: hook
        label: "Hook"
        time: "0s-3s"
        voice: "In this video, we'll build an agent."
      - id: bullets
        label: "Bullets"
        time: "3s-8s"
        voice: "We'll cover three things."
        bullets:
          - "Tools"
          - "Memory"
          - "Errors"
      - id: whoosh-demo
        label: "Transition"
        voice: "Now let's transition."
        audio:
          - kind: sfx
            id: whoosh
            at: "+0.0s"    # relative to this cue's start time
            volume: 25%    # accepts 0..1, 0..100, or "80%"
            prompt: "Fast cinematic whoosh transition, clean, no voice"
            duration_seconds: 3
            variants: 8
            pick: 2

Time formats

time may be either:

  • A range string: "12.5s-18.3s"
  • A relative range string inside a timed scene: "+0.5s-+1.2s" (adds the scene’s startSec)

If you omit id for a scene/cue, Babulus derives one from title/label (slugified). It’s optional, but for real projects you usually want explicit IDs so you can rename titles/labels without breaking the Remotion mapping.

Compile to JSON (CLI)

Install (for local development, from a clone of this repo):

python -m pip install -e . -U

You can then run either babulus ... (recommended) or python -m babulus ....

Manual timing compile

babulus compile \
  --dsl path/to/video.babulus.yml \
  --out path/to/script.json \
  --pretty

Transcript-driven alignment is supported if you pass --transcript path/to/words.json, where the JSON contains:

{ "words": [{ "word": "Hello", "start": 0.0, "end": 0.2 }] }

Audio-driven generation (the “real” pipeline)

This mode is for when you want cue timing to come from the actual generated audio (plus explicit pauses), rather than hard-coded time: ranges.

babulus generate --dsl path/to/video.babulus.yml

Defaults (derived from the DSL filename <video>.babulus.yml):

  • script-out: src/videos/<video>/<video>.script.json
  • timeline-out: src/videos/<video>/<video>.timeline.json
  • audio-out: public/babulus/<video>.wav
  • out-dir: .babulus/out/<video>

If you have exactly one DSL under ./content/, you can omit --dsl entirely:

babulus generate

Idempotence / caching:

  • By default, generate reuses cached audio segments when the inputs are unchanged (so changing one word only regenerates the affected clip).
  • Use --fresh to force regeneration of everything.

Watch mode

Regenerate automatically when you edit the DSL (and ./.babulus/config.yml if present):

babulus generate --watch --dsl path/to/video.babulus.yml

Clean

Remove generated artifacts (script/timeline/audio, .babulus/out/, staged public/babulus/ files).

Dry-run (prints what would be deleted):

babulus clean

Actually delete:

babulus clean --yes

Babulus loads API credentials from config in this order (unless BABULUS_PATH is set):

  1. ./.babulus/config.yml
  2. ~/.babulus/config.yml

If BABULUS_PATH is set, it will use:

  • $BABULUS_PATH if it points to a file
  • $BABULUS_PATH/config.yml if it points to a directory

Example config.yml shape:

providers:
  elevenlabs:
    api_key: "..."
    voice_id: "..."
  openai:
    api_key: "..."
  azure_speech:
    api_key: "..."
    region: "eastus"
  aws_polly:
    region: "us-east-1"
    voice_id: "Joanna"

Providers (TTS)

Set voiceover.provider in your .babulus.yml to one of:

  • dry-run (silent WAVs with estimated durations)
  • elevenlabs (TTS via ElevenLabs; segments are stored as MP3 and concatenated to your requested --audio-out)
  • openai (TTS via OpenAI, writes WAV)
  • aws-polly (TTS via AWS Polly, writes WAV by wrapping PCM)
  • azure-speech (TTS via Azure Cognitive Services Speech, writes WAV)

Credentials/config live in ./.babulus/config.yml or ~/.babulus/config.yml:

  • ElevenLabs: providers.elevenlabs.api_key, plus providers.elevenlabs.voice_id for TTS
  • OpenAI: providers.openai.api_key
  • Azure: providers.azure_speech.api_key + providers.azure_speech.region
  • AWS Polly: uses the standard AWS credential chain (env vars like AWS_ACCESS_KEY_ID, ~/.aws/credentials, SSO, etc.). Region/voice go in providers.aws_polly.

ElevenLabs pronunciation dictionaries

To fix pronunciation of project-specific words (like “Tactus”), you have two options:

Option A: Define lexemes in the DSL (recommended)

Put lexemes directly in the DSL, and Babulus will transparently create/update an ElevenLabs pronunciation dictionary in your workspace and attach it to every TTS request:

voiceover:
  provider: elevenlabs
  pronunciation_dictionary:
    name: tactus
  pronunciations:
    - lexeme:
        grapheme: "Tactus"
        alias: "tack-tus"

Notes:

  • The cloud dictionary is cached/tracked in .babulus/out/<video>/manifest.json so it only updates when lexemes change.
  • Babulus prepends the auto-managed dictionary to any explicitly listed dictionaries (max 3 total).

Option B: Reference an existing dictionary ID

Add a pronunciation dictionary in ElevenLabs yourself and reference it from the DSL:

voiceover:
  provider: elevenlabs
  pronunciation_dictionaries:
    - id: "pd_your_dictionary_id"
      version_id: null

This maps to ElevenLabs pronunciation_dictionary_locators on each TTS request (max 3 per request).

Pauses & Segments (Voiceover Authoring)

In generate mode, cue timing is computed from audio segment durations. You can also insert explicit pauses.

Delaying the start of a cue’s narration

If you want the voice to start later (while the scene is already on screen), put pause_seconds on the voice: mapping (or make the first segments[] item a pause).

scenes:
  - id: problem
    title: "Problem"
    cues:
      - id: problem
        label: "Problem"
        voice:
          pause_seconds: 2
          segments:
            - voice: "This line will start 2 seconds after the cue begins."

Important: voice.segments runs in order. A pause_seconds segment after a voice segment is a pause after speaking, not a delay before it.

You can also delay an individual voice segment by putting pause_seconds on that segment:

voice:
  segments:
    - voice: "First sentence."
    - voice: "Second sentence after a beat."
      pause_seconds: 0.5

Per-cue segments

Instead of a single voice: field, a cue can use segments: to split narration into smaller chunks and insert pauses:

scenes:
  - title: "Example"
    cues:
      - id: hook
        label: "Hook"
        voice:
          segments:
            - voice: "Tool-using agents are useful."
            - pause_seconds: 0.25
            - voice: "And dangerous."
              trim_end_sec: 0.12

Trimming breaths / tails

Some TTS voices add a little breath or tail at the end of a segment. You can trim that off:

voiceover:
  trim_end_seconds: 0.12

Or override per segment with trim_end_sec (legacy key) or trim_end_seconds (preferred).

Default pause between cues (with optional jitter)

You can set a default pause between cue items, optionally randomized (deterministically via seed):

voiceover:
  seed: 1337
  pause_between_items_seconds: 0.1
  pause_between_items_gaussian:
    mean_seconds: 0.12
    std_seconds: 0.05
    min_seconds: 0.02
    max_seconds: 0.35

Multi-Track Audio (SFX / Music / Files)

Declare audio clips next to the cue or scene where they should play.

audio:
  sfx_provider: elevenlabs
  music_provider: elevenlabs
  library:
    whoosh:
      kind: sfx
      prompt: "Quick whoosh transition"
      duration_seconds: 3
      variants: 5

scenes:
  - id: problem
    title: "Problem"
    cues:
      - id: problem
        label: "Problem"
        voice: "..."
        audio:
          - use: whoosh
            at: "+0.0s"     # relative to this cue's start
            volume: 35%
            pick: 2         # per-use: choose variant

  - id: intro
    title: "Intro"
    # Optional: scene-level audio (relative to scene start)
    audio:
      # Generated background music (default duration: this scene’s duration)
      - kind: music
        id: bed
        prompt: "Warm ambient background music, minimal percussion, no vocals"
        volume: 20%
        # play_through: true     # extend to end of video
        # duration_seconds: 30   # override default duration
      # Or, reference an existing file under `public/`:
      - kind: file
        id: bed-file
        src: "music/bed.mp3"
        volume: 20%
    cues:
      - id: hook
        label: "Hook"
        voice: "..."

Key ideas:

  • audio: under a cue defaults to playing at the cue start; use at: "+0.2s" to offset.
  • Use audio.library + use: to reuse the same generated clip in multiple places (with independent pick, volume, pause_seconds).
  • Use explicit anchors if needed: at: "cue:<cueId>+0.2s" or at: "scene:<sceneId>+0.2s".
  • SFX supports variants + pick for auditioning options.
  • Music clips default to the current scene duration; set play_through: true to extend to the end of the video.
  • Any clip can fade its volume over time using fade_to / fade_out (default fade_duration_seconds: 2).
  • src for kind: file should be a path under Remotion’s public/ directory (so staticFile(src) works).
  • volume accepts either 0..1 (Remotion gain) or 0..100 / "80%" (percent).

Volume fades example (clip-local seconds):

audio:
  music_provider: elevenlabs

scenes:
  - id: title
    title: "Title"
    audio:
      - kind: music
        id: bed
        prompt: "Ambient background music, no vocals"
        volume: 92%
        fade_to:
          volume: 50%
          after_seconds: 4
          # fade_duration_seconds: 4   # optional (default 2)
        fade_out:
          volume: 92%
          before_end_seconds: 4
          # fade_duration_seconds: 4   # optional (default 2)

What Babulus generates:

  • --timeline-out JSON includes audio.tracks[].clips[] with computed startSec.
  • For SFX variants, Babulus caches all candidates under --out-dir and (when --audio-out points into public/) stages the chosen SFX into public/babulus/sfx/<clipId>.wav and writes src: "babulus/sfx/<clipId>.wav" into the timeline so Remotion can play it.
  • For narration, when --audio-out points into public/, Babulus also stages each generated TTS segment under public/babulus/<video>/segments/ and emits them as separate kind: file clips (so you can see each utterance as its own audio item in Remotion).

ElevenLabs SFX integration:

  • Set audio.sfx_provider: elevenlabs in your .babulus.yml (or set audio.default_sfx_provider in ./.babulus/config.yml), and use kind: sfx clips with variants + pick.
  • Babulus caches variants under --out-dir and stages the chosen file under public/babulus/sfx/ so Remotion can play it.

Auditioning SFX variants (workflow)

SFX clips can generate multiple variants. Babulus keeps all variants cached under .babulus/out/<video>/sfx/.

To audition different variants without editing the DSL, use the selection file under .babulus/out/<video>/selections.json via the CLI:

bash bin/babulus sfx next --clip whoosh --variants 8
bash bin/babulus sfx prev --clip whoosh --variants 8
bash bin/babulus sfx set --clip whoosh --pick 3

With bash bin/babulus generate --watch, changing the pick will trigger a re-generate so Remotion updates the staged public/babulus/sfx/<clipId>.* file.

If you’re not using --watch, you can also apply the change immediately:

bash bin/babulus sfx next --clip whoosh --variants 8 --apply

Archiving options you don’t want to see right now:

bash bin/babulus sfx archive --clip whoosh --keep-pick
bash bin/babulus sfx restore --clip whoosh
bash bin/babulus sfx clear --clip whoosh

Remotion: The Two Mappings

1) scene.id → React scene component

You render a Sequence per scene using scene.startSec/endSec, then route by scene.id:

import { Sequence, useVideoConfig } from "remotion";
import scriptJson from "./script.json";

const secondsToFrames = (sec: number, fps: number) => Math.round(sec * fps);

const SceneRouter: React.FC<{ scene: any }> = ({ scene }) => {
  switch (scene.id) {
    case "intro":
      return <IntroScene scene={scene} />;
    default:
      return null;
  }
};

export const MyVideo: React.FC = () => {
  const { fps } = useVideoConfig();
  return (
    <>
      {scriptJson.scenes.map((scene) => {
        const from = secondsToFrames(scene.startSec, fps);
        const to = secondsToFrames(scene.endSec, fps);
        return (
          <Sequence key={scene.id} from={from} durationInFrames={to - from}>
            <SceneRouter scene={scene} />
          </Sequence>
        );
      })}
    </>
  );
};

2) cue.id → element/animation timing

Inside a scene component, find the cue you care about and convert cue.startSec to a frame.

const cue = scene.cues.find((c) => c.id === "hook"); // <- from the DSL
if (!cue) return null;
const cueStartFrame = secondsToFrames(cue.startSec, fps);

Audio (Typical)

If you generate a voiceover audio file, play it at the top-level:

import { Audio, staticFile } from "remotion";

<Audio src={staticFile("voiceover.mp3")} />;

Your script’s startSec/endSec should reference absolute seconds from the start of that audio track.

Audio cueing in Remotion (layered tracks)

Babulus generate writes an additional timeline.json which includes audio.tracks[] events (SFX/music/file clips).

In this repo, you can render them using src/babulus/AudioTimeline.tsx (it creates <Sequence><Audio/></Sequence> per clip).

Concrete Example (This Repo)

  • DSL (project-owned): content/intro.babulus.yml
  • Compiled JSON (generated): src/videos/intro/intro.script.json
  • Scene mapping (scene.id → React): src/videos/intro/IntroVideo.tsx
  • Cue timing usage (Solution cards): src/videos/intro/IntroVideo.tsx

Note: the YAML snippet above uses intro/hook as simple examples. In the actual Intro video DSL, the scene IDs are title, problem, solution, code, cta.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babulus-0.1.0.tar.gz (59.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

babulus-0.1.0-py3-none-any.whl (66.4 kB view details)

Uploaded Python 3

File details

Details for the file babulus-0.1.0.tar.gz.

File metadata

  • Download URL: babulus-0.1.0.tar.gz
  • Upload date:
  • Size: 59.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for babulus-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6a7f826dfffcc79c029ebcbb20d75ceb44c3d42b55b48fc738313dc622863ed6
MD5 3d056dc2a8b312f5ba612c629e706169
BLAKE2b-256 59534d492576484fb4ad5bd301531dd594a5e5fe0d1c5a54dacfbb87ea51e798

See more details on using hashes here.

File details

Details for the file babulus-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: babulus-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 66.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for babulus-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d6678d3390ff324632d5a8cd1f3d9bccae73f3538872edacbcde8046375c28cd
MD5 2aa05bc3db645959a8589829c4fba8a0
BLAKE2b-256 3df665535ecb1ec8404df169d360eff702936f06707cbf8b2d2ff395fd059ae7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page