Recursive HTML→MP4 rendering with vision-based self-correction

These details have not been verified by PyPI

Project links

Project description

recursive-animation-engine

An HTML → MP4 rendering pipeline with built-in self-correction loops and multi-provider LLM support. Features structured video planning (plan → build → combine), Gemini TTS 3.1 Flash voiceover generation, and vision verification with multiple backends.

┌─────────────────────────────────────────────────────────────────────┐
│                         PLAN PHASE                                  │
│  User answers → LLM reasons over acts → Structured VideoPlan       │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        BUILD PHASE                                  │
│  For each act: Render → Verify keyframes → Vision check → Patch     │
│  + Generate voiceover with Gemini TTS 3.1 Flash                     │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        COMBINE PHASE                                │
│  Stitch acts → Mix audio → Deliver final MP4                        │
└─────────────────────────────────────────────────────────────────────┘

Always-on progress viewer:

$ reng watch
reng watch — tailing ~/.recursive-animation-engine/events.jsonl

14:02:01 a1b2c3 ▶ run start   ./build_output/act01  max=3
14:02:01 a1b2c3 ■ iteration 1
14:02:01 a1b2c3   render…
14:02:08 a1b2c3   ✓ rendered out.mp4 (7.2s)
14:02:08 a1b2c3   extracting 3 keyframe(s)…
14:02:10 a1b2c3     ✓ out_frame01.png OK
14:02:12 a1b2c3     ◦ out_frame02.png Bar exceeds container at 50% — text "100%" clips off right edge.
14:02:14 a1b2c3     ✓ out_frame03.png OK
14:02:14 a1b2c3 ◦ iteration 1 — 1 issue(s)
14:02:14 a1b2c3 ■ iteration 2
14:02:14 a1b2c3   render…
14:02:22 a1b2c3   ✓ rendered out.mp4 (7.8s)
14:02:22 a1b2c3     ✓ out_frame01.png OK
14:02:24 a1b2c3     ✓ out_frame02.png OK
14:02:26 a1b2c3     ✓ out_frame03.png OK
14:02:26 a1b2c3 ✓ iteration 2 passed
14:02:26 a1b2c3 ■ run passed (2 iter) → out.mp4

Features

Multi-provider LLM support: OpenRouter, Google Gemini API, Fireworks AI
Default vision model: Gemma 3 (latest) for cost-effective verification
Structured planning: Interactive plan phase with user questions → act-based reasoning
Act-by-act building: Build complex videos scene by scene with vision verification per act
Gemini TTS 3.1 Flash: Generate high-quality voiceovers with SSML support
Native Claude Code integration: Text generation uses Claude Code context by default
Vision verification loops: Render → extract keyframes → vision check → iterate

Install

From PyPI (recommended):

pip install recursive-animation-engine

Latest from GitHub (if you need unreleased changes):

pip install git+https://github.com/Science-Prof-Robot/recursive-animation-engine

Or from source:

git clone https://github.com/Science-Prof-Robot/recursive-animation-engine
cd recursive-animation-engine
pip install -e .

System dependencies

Tool	Install
`bun`	`curl -fsSL https://bun.sh/install \| bash`
`ffmpeg`	`apt-get install ffmpeg` / `brew install ffmpeg`
`chromium`	`apt-get install chromium` (set `PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium`)
`node`	`apt-get install nodejs` / use nvm
Hyperframes	`git clone` → `bun install` → `bun run build`

Environment variables

Variable	Required	Default	Description
`OPENROUTER_API_KEY`	yes*	—	OpenRouter API key
`GEMINI_API_KEY`	yes*	—	Google Gemini API key (for TTS and native Gemini)
`FIREWORKS_API_KEY`	yes*	—	Fireworks AI API key
`RENG_LLM_PROVIDER`	no	`openrouter`	Default provider: `openrouter`, `gemini`, `fireworks`, `native`
`RENG_VISION_PROVIDER`	no	`openrouter`	Provider for vision tasks
`RENG_TEXT_PROVIDER`	no	`native`	Text generation provider (`native` = Claude Code context)
`RENG_VISION_MODEL`	no	`google/gemma-3-27b-it`	Vision model ID
`HYPERFRAMES_CLI`	no	`~/hyperframes/...`	Path to Hyperframes CLI
`RENG_EVENT_LOG`	no	`~/.recursive-animation-engine/events.jsonl`	Event log path

*At least one provider key is required depending on your setup.

Usage

Full workflow: Plan → Build → Combine

1. Plan Phase

Interactively create a structured video plan:

reng plan -o video_plan.json

This asks about:

Purpose and topic
Target duration (short/medium/long)
Visual style preferences
Voiceover requirements
Target audience

Then reasons over your answers to produce acts with timing and scripts.

2. Build Phase

Build each act with vision loop verification:

reng build video_plan.json ./build_output

This:

Creates act subdirectories (act01/, act02/, etc.)
Renders each act with recursive verification
Generates voiceover per act (if scripted)
Combines all acts into final.mp4
Mixes voiceover with video

Options:

--max-iterations N - Max render-verify cycles per act (default: 3)
--no-voiceover - Skip TTS generation
--no-combine - Don't concatenate acts
--no-mix-audio - Don't mix voiceover with final video

3. Voiceover Only

Generate voiceover with Gemini TTS 3.1 Flash:

# Simple text
reng voiceover "Welcome to our product demo" -o intro.mp3

# From script file
reng voiceover --file script.txt -o narration.mp3

# Custom voice and rate
reng voiceover "Hello world" -o hello.mp3 --voice en-GB-Neural2-B --rate 0.9

# SSML for fine-grained control
reng voiceover '<speak>Hello <break time="1s"/> World</speak>' -o ssml.mp3

Legacy: One-shot render

For simple single-scene renders:

reng render ./renders/progress-bar --intent "progress bar fills 0% → 100% smoothly"

Exit code 0 = passed or hit max iterations (check status in output). Exit code 1 = render or vision errored.

Live progress viewer

# Watch all activity
reng watch

# Follow specific run
reng watch --follow-run a1b2c3d4

# Replay last hour
reng watch --since 1h

Vision and verification

# Vision check with default Gemma
reng vision screenshot.png "What error is shown here?"

# Use specific provider
reng vision screenshot.png "What do you see?" --provider gemini

# Specific model
reng vision screenshot.png "Describe the layout" --model google/gemma-3-27b-it

# Extract keyframes
reng verify input.mp4 --frames 5

Provider management

# Test all configured providers
reng provider test

# List recommended models
reng provider list-models

# Show environment setup help
reng provider env

Python API

Full workflow

from reng.lib.plan import reason_over_acts, get_planning_questions
from reng.lib.build import build_all_acts

# Create plan from user answers
answers = {
    "purpose": "product demo",
    "topic": "New feature walkthrough",
    "duration": "medium",
    "voiceover": "yes, formal",
}
plan = reason_over_acts(answers)

# Build all acts with vision verification
result = build_all_acts(
    plan,
    Path("./build_output"),
    max_iterations=3,
    generate_voiceovers=True,
    combine_acts=True,
)

print(f"Status: {result.status}")
print(f"Final video: {result.final_video}")
print(f"Combined voiceover: {result.combined_voiceover}")

Voiceover generation

from reng.lib.providers import GeminiTTSProvider

tts = GeminiTTSProvider()

# Simple generation
audio = tts.generate_voiceover(
    text="Welcome to our presentation!",
    voice_name="en-US-Neural2-D",
    output_path=Path("welcome.mp3")
)

# SSML for control
ssml = '''<speak>
    <emphasis level="strong">Welcome!</emphasis>
    <break time="500ms"/>
    <prosody rate="slow" pitch="-1st">Let me show you around.</prosody>
</speak>'''

audio = tts.generate_voiceover_ssml(
    ssml=ssml,
    voice_name="en-US-Neural2-D",
    output_path=Path("ssml_demo.mp3")
)

Custom provider usage

from reng.lib.providers import get_provider, get_vision_model_spec

# Use Gemini for vision
provider = get_provider("gemini")
model_spec = get_vision_model_spec()

result = provider.analyze(
    question="What do you see in this image?",
    image_path=Path("frame.png"),
    model_spec=model_spec
)

Multi-Provider Configuration

OpenRouter (default)

Unified API for 100+ models.

export OPENROUTER_API_KEY='your-key'
export RENG_LLM_PROVIDER=openrouter
export RENG_VISION_MODEL=google/gemma-3-27b-it

Google Gemini API

Native Gemini with competitive pricing.

export GEMINI_API_KEY='your-key'
export RENG_LLM_PROVIDER=gemini
export RENG_VISION_MODEL=gemini-2.0-flash

Fireworks AI

Fast inference for production.

export FIREWORKS_API_KEY='your-key'
export RENG_LLM_PROVIDER=fireworks

Mix and match

# Use Gemini for vision, native Claude Code for text
export RENG_VISION_PROVIDER=gemini
export RENG_TEXT_PROVIDER=native

# OpenRouter for both
export RENG_LLM_PROVIDER=openrouter
export RENG_VISION_MODEL=google/gemma-3-27b-it

Recommended Models

Vision

Model	Provider	When
`google/gemma-3-27b-it`	OpenRouter	Default, excellent vision + text
`google/gemma-3-12b-it`	OpenRouter	Faster, still capable
`gemini-2.0-flash`	Gemini API	Native Google, good pricing

Text

Provider	Model	Use case
`native`	Claude Code	Default, uses existing context
OpenRouter	`anthropic/claude-sonnet-4`	High-quality reasoning
OpenRouter	`google/gemma-3-27b-it`	Unified vision+text

Design Principles

Flexible providers: Use OpenRouter for unified access, Gemini for TTS/natives, Fireworks for speed
Cost-effective verification: Gemma provides excellent vision at lower cost than alternatives
Native integration: Text generation defaults to Claude Code context (no extra API calls)
Structured workflow: Plan phase structures complexity, build phase executes with verification
Per-act verification: Each scene verified independently before combination
Deterministic keyframe sampling: Always samples 10%–90% of duration to avoid fade frames
Hard iteration caps: Max 3 loops per act prevents runaway cost
File-based event bus: Append-only NDJSON log enables any number of watchers

Swapping the Renderer

The engine is renderer-agnostic. reng/lib/render.py is a ~60-line shim around the Hyperframes CLI. To use Remotion, Manim, Playwright recording, etc., replace that module's render() function — the rest of the pipeline (verify, vision, loop, events) stays the same.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

Apr 21, 2026

This version

0.2.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recursive_animation_engine-0.2.0.tar.gz (35.0 kB view details)

Uploaded Apr 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

recursive_animation_engine-0.2.0-py3-none-any.whl (38.1 kB view details)

Uploaded Apr 21, 2026 Python 3

File details

Details for the file recursive_animation_engine-0.2.0.tar.gz.

File metadata

Download URL: recursive_animation_engine-0.2.0.tar.gz
Upload date: Apr 21, 2026
Size: 35.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for recursive_animation_engine-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`6fe14218981b76b28f2151201e31e6508377309b591c9b9265b43b94e434748b`
MD5	`dd6c450a10bbf557a6f43224f8563b54`
BLAKE2b-256	`e5cc499c74925b65f59b73550435e9af33fdefe04942b2268b382bcc458041a8`

See more details on using hashes here.

File details

Details for the file recursive_animation_engine-0.2.0-py3-none-any.whl.

File metadata

Download URL: recursive_animation_engine-0.2.0-py3-none-any.whl
Upload date: Apr 21, 2026
Size: 38.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for recursive_animation_engine-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d070dd74f5e788481da9f86173b9620c2265a3a8fa1eb31bfc0f3ce676ff6a73`
MD5	`d820d9ef249e26ea1783bb615f04ad9b`
BLAKE2b-256	`0bdd1e9b974bf13df674763b5d3cbeb44a1fb5f2651de63f3558e7d244488ebb`

See more details on using hashes here.

recursive-animation-engine 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

recursive-animation-engine

Features

Install

System dependencies

Environment variables

Usage

Full workflow: Plan → Build → Combine

1. Plan Phase

2. Build Phase

3. Voiceover Only

Legacy: One-shot render

Live progress viewer

Vision and verification

Provider management

Python API

Full workflow

Voiceover generation

Custom provider usage

Multi-Provider Configuration

OpenRouter (default)

Google Gemini API

Fireworks AI

Mix and match

Recommended Models

Vision

Text

Design Principles

Swapping the Renderer

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes