Skip to main content

Tools to make music videos

Project description

muvid

Tools to make music videos. Orchestrates the local ecosystem (falaw, lookbook, lacing, an, mixing) into a song-to-video pipeline. The user is the director; an agent (Claude in the terminal, or the local web UI) drives the stages.

Status: v0+. The pipeline (init → transcribe → align → cast → environments → script → render → compose) works end to end. Render strategies: lipsync, image_to_video, text_to_video, animation, still. CLI, Claude skill (.claude/skills/muvid/), and a single-page local UI all dispatch to the same Python functions. The v0 audit follow-up (improvement_ideas.md) shipped pluggable aligners, cost rollups + --budget, structured falaw progress events streamed to .muvid/fal_events.jsonl, end-to-end smoke fixture, lacing as the SSOT for word timings (no redundant whisper passes inside an), and a muvid.contracts adapter layer to sibling-package shapes. See misc/docs/design.md for the design rationale and misc/docs/alignment_references.md for the lyric-alignment literature muvid builds on.

Install

pip install -e ./muvid
pip install -e ./muvid[ui]   # adds FastAPI + uvicorn for the web UI

This package depends on local sibling packages (falaw, lookbook, lacing, mixing); install them editable first.

System: ffmpeg and ffprobe on PATH. Env: ELEVENLABS_API_KEY (for transcription), FAL_KEY (for fal.ai generation).

30-second tour

# Bootstrap a project around a song.
muvid init ~/muvid/park-bench --song ~/Downloads/park_bench.mp3 --title "Park Bench"

# Transcribe to a draft lyrics.md (you'll edit it).
muvid transcribe ~/muvid/park-bench

# … you edit lyrics/lyrics.md to fix mishears and add [section] tags …

# Align lyrics.md against the transcript and write lyrics/alignment.annot.
muvid align ~/muvid/park-bench

# Cast a character: card, then images, then lookbook curation.
muvid character ~/muvid/park-bench maya --description "mid-30s, dark curly hair, wary eyes"
muvid character-generate ~/muvid/park-bench maya --n 6
muvid character-curate    ~/muvid/park-bench maya --k 8

# Establish an environment.
muvid environment ~/muvid/park-bench park_bench --description "wooden park bench at dusk"
muvid environment-render ~/muvid/park-bench park_bench

# Write/edit script/script.md (let an agent draft it from the lyrics + cast),
# then sync it back into project.json:
muvid script-apply ~/muvid/park-bench

# Estimate cost before committing fal calls.
muvid estimate-cost ~/muvid/park-bench

# Render every shot (optionally gated on a USD budget), then composite.
muvid render  ~/muvid/park-bench --budget=2.50
muvid compose ~/muvid/park-bench
# → ~/muvid/park-bench/output/final.mp4

# Inspect progress.
muvid status        ~/muvid/park-bench           # human-readable
muvid status --json ~/muvid/park-bench           # structured shape

# Or open the local UI (FastAPI + single HTML page).
muvid serve ~/muvid/park-bench

Pluggable aligners

muvid align --aligner=... accepts:

  • scribe-greedy (default) — Scribe transcript + greedy token-match.
  • user — caller-supplied line_index → (start, end) timings.
  • whisperx-lite — local faster-whisper, falls back to scribe-greedy if no audio_path= is given.
  • stars — singing-grade joint inference (stub; NotImplementedError).

Plug your own with muvid.align.register_aligner(name, fn, ...).

Interactive character curation

When a recipe's automatic top-k isn't quite right, replay a JSON of decisions:

# decisions.json:
# [{"keep": ["<image_id>"], "reject": [...], "stop": false}, ...]
muvid character-curate-interactive ~/muvid/park-bench maya \
    --decisions decisions.json --k 8 --present 6

How it fits the ecosystem

Concern Owner
AI media (TTS, image, video, lipsync, voice clone) falaw
Reference image curation (LoRA-style sets) lookbook
Timeline / interval annotations (lyrics, sections) lacing
Structured 2D animation (cutout characters) an
Audio/video editing + ElevenLabs Scribe mixing
Project, pipeline, dispatcher muvid

muvid is the orchestrator: a folder layout (project.json + song/, lyrics/, characters/, environments/, script/, shots/, output/), a content-addressed cache (re-render only what changed), and a uniform dispatch layer with three surfaces (CLI, skill, UI) all calling the same Python functions in muvid.facade.

Render strategies

Each shot picks one. The dispatcher resolves shared inputs (audio slice, lyric lines that fall in the shot interval, character / env anchor images) once and hands them to the strategy:

strategy use it for calls
lipsync character singing on screen falaw.animate_face
image_to_video cinematic shot, env anchor as i2v seed falaw.image_to_video
text_to_video no anchor, pure prompt falaw.text_to_video
animation stylized 2D cutout an.orchestrate
still single image held for the duration ffmpeg

The Claude skill

.claude/skills/muvid/SKILL.md walks Claude (or any agent that follows Claude Code skills) through the eight stages. It will:

  • run muvid status first to see where you are
  • pick the next stage and offer to run it
  • never re-transcribe after you've edited lyrics.md
  • never --force a render without asking
  • offer to draft script/script.md from your lyrics + cast

Layout

muvid/
  __init__.py         public surface (the facade)
  __main__.py         CLI (argh)
  schema.py           ProjectSpec, ShotSpec, SectionSpec, …
  project.py          MusicVideoProject (folder facade)
  lyrics.py           transcribe + parse/render lyrics.md
  align.py            pluggable aligners + lacing SqliteStore writer
  characters.py       cards + ref images + lookbook curation (incl. interactive)
  environments.py     cards + establishing-image generation
  script.py           script.md ↔ ShotSpec list
  cost.py             render-cost rollup over pending shots
  events.py           pipe falaw progress events → .muvid/fal_events.jsonl
  contracts.py        adapters: muvid SSOT ↔ falaw / an / lacing shapes
  renderers/
    __init__.py       dispatcher + RenderContext + caching
    lipsync.py        falaw.animate_face
    image_to_video.py falaw.image_to_video
    text_to_video.py  falaw.text_to_video
    still.py          ffmpeg single-image loop
    animation.py      handoff to `an.orchestrate` with lacing-driven lipsync
  compose.py          ffmpeg concat + overlay song audio
  facade.py           top-level verbs the CLI/skill/UI call
  ui/
    app.py            FastAPI app
    static/index.html single-page UI
.claude/skills/muvid/SKILL.md
misc/docs/design.md             full design rationale
misc/docs/improvement_ideas.md  v0 audit + post-audit follow-through

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

muvid-0.0.4.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

muvid-0.0.4-py3-none-any.whl (58.7 kB view details)

Uploaded Python 3

File details

Details for the file muvid-0.0.4.tar.gz.

File metadata

  • Download URL: muvid-0.0.4.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for muvid-0.0.4.tar.gz
Algorithm Hash digest
SHA256 abc2f2bafcc59b8add2d3557e934163296c9b9bd5bfedd246bce8f79afb88260
MD5 2b298c3a0b9acbe64fe2f1e95f7ea99d
BLAKE2b-256 498953b3e68e61cc384cf4121bbb3c258016485f8e605d7f05ecbab28444ac05

See more details on using hashes here.

File details

Details for the file muvid-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: muvid-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 58.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for muvid-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a391cadacb0d67a7a02c8c0b408c36fe7577e8c68dea3d6505746c68a869b16a
MD5 098bf6d9a71fb399058d7c63c2e47605
BLAKE2b-256 60f81afa1d119685f21db1f3c538237fc86fe7ef23459c27b37837dc41f4063f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page