Skip to main content

DSPy-style framework for optimizing text-to-video generation via VBench metric feedback

Project description

ViDSPy ๐ŸŽฌ

DSPy-style framework for optimizing text-to-video prompts via VBench metric feedback.

PyPI version License: MIT Python 3.9+

ViDSPy brings the power of DSPy's declarative programming paradigm to text-to-video generation. Optimize your prompts for any text-to-video API (Runway, Pika, Replicate, etc.) using VBench quality metrics and VLM-based feedback.

Note: ViDSPy optimizes how you prompt video generation models. It does not generate videos itself - you bring your own text-to-video API.

๐ŸŽฏ Key Features

  • DSPy-style Optimization: Tune instructions (prompt templates) and demonstrations (few-shot examples) using canonical DSPy optimizers
  • VBench Integration: Full support for all 10 CORE_METRICS from VBench evaluation
  • Multiple Optimizers: BootstrapFewShot, LabeledFewShot, MIPROv2, COPRO, and GEPA
  • Flexible VLM Backends: OpenRouter (cloud) and HuggingFace (local) support
  • Composite Metrics: Weighted combination of video quality (60%) and text-video alignment (40%)

๐Ÿ“ฆ Installation

pip install vidspy

For full functionality with VBench evaluation:

pip install vidspy[vbench]

For development:

pip install vidspy[all]

โš™๏ธ Configuration

ViDSPy can be configured in two ways:

Option 1: Pass Arguments Directly in Code

from vidspy import ViDSPy

vidspy = ViDSPy(
    vlm_backend="openrouter",
    vlm_model="google/gemini-2.5-flash",
    api_key="your-api-key",
    device="auto"
)

Option 2: Use Configuration File

Create a vidspy_config.yaml file from the template:

cp vidspy_config.yaml.example vidspy_config.yaml

Edit the configuration file:

# vidspy_config.yaml
# VLM Provider Settings
vlm:
  backend: openrouter  # or "huggingface"
  model: google/gemini-2.5-flash
  # api_key: your-api-key-here  # Or set OPENROUTER_API_KEY env var

# Optimizer LLM Settings
# This LLM is used by DSPy optimizers (MIPROv2, COPRO, GEPA) to generate instruction variations
optimizer:
  lm: openai/gpt-4o-mini  # Any model from OpenRouter
  # api_key: your-api-key-here  # Or set OPENROUTER_OPTIMIZER_API_KEY env var

# Optimization Settings
optimization:
  default_optimizer: mipro_v2
  max_bootstrapped_demos: 4
  max_labeled_demos: 4

# Metric Settings
metrics:
  quality_weight: 0.6
  alignment_weight: 0.4
  quality_metrics:
    - subject_consistency
    - motion_smoothness
    - temporal_flickering
    - human_anatomy
    - aesthetic_quality
    - imaging_quality
  alignment_metrics:
    - object_class
    - human_action
    - spatial_relationship
    - overall_consistency

# Target Thresholds
targets:
  human_anatomy: 0.85
  alignment: 0.80

# Cache Settings
cache:
  dir: ~/.cache/vidspy
  vbench_models: ~/.cache/vbench

# Hardware Settings
hardware:
  device: auto  # "cuda", "cpu", or "auto"
  dtype: float16

Then use ViDSPy without any arguments - it automatically loads the config:

from vidspy import ViDSPy, VideoChainOfThought, Example

# ViDSPy automatically finds and loads vidspy_config.yaml
vidspy = ViDSPy()

# All settings from the config file are now applied!
trainset = [Example(prompt="a cat jumping", video_path="cat.mp4")]
optimized = vidspy.optimize(VideoChainOfThought("prompt -> video"), trainset)

Config file search order:

ViDSPy automatically searches for vidspy_config.yaml in:

  1. Current working directory: ./vidspy_config.yaml
  2. User config directory: ~/.vidspy/config.yaml
  3. User home directory: ~/vidspy_config.yaml

Custom config path:

You can also specify a custom config file location:

vidspy = ViDSPy(config_path="/path/to/custom_config.yaml")

Important:

  • Arguments passed directly to ViDSPy() always override config file values
  • API keys should be in environment variables or .env file (not in the config file):
# .env
OPENROUTER_API_KEY=your-api-key-here
OPENROUTER_OPTIMIZER_API_KEY=your-api-key-here  # For optimizer LLM

๐Ÿค– Optimizer LLM Configuration

ViDSPy uses two different LLMs:

  1. VLM (Vision Language Model): Analyzes videos and enhances prompts during inference
  2. Optimizer LLM: Used by MIPROv2, COPRO, and GEPA optimizers to generate instruction variations during optimization

Note: BootstrapFewShot and LabeledFewShot optimizers do NOT require an optimizer LLM - they work without this configuration.

Configure the Optimizer LLM:

from vidspy import ViDSPy

vidspy = ViDSPy(
    vlm_backend="openrouter",
    vlm_model="google/gemini-2.5-flash",          # For video analysis
    optimizer_lm="openai/gpt-4o-mini",            # For optimization
    optimizer_api_key="your-openrouter-api-key"   # Or set OPENROUTER_OPTIMIZER_API_KEY env var
)

Or in vidspy_config.yaml:

vlm:
  backend: openrouter
  model: google/gemini-2.5-flash

optimizer:
  lm: openai/gpt-4o-mini  # Any OpenRouter model

Choosing an Optimizer LLM:

  • For most users: openai/gpt-4o-mini (fast and cost-effective)
  • For better quality: openai/gpt-4o or anthropic/claude-3-5-sonnet
  • Any model from OpenRouter is supported

Note: If optimizer_api_key is not specified, it will use the VLM API key as fallback.

๐ŸŽฅ Connecting Video Generation Models

Important: ViDSPy is a prompt optimization framework that sits on top of existing text-to-video models. It does NOT generate videos itself. Instead, it optimizes how you prompt external video generation services.

How ViDSPy Works

User Prompt โ†’ ViDSPy (optimize prompt) โ†’ Text-to-Video API โ†’ Generated Video
                โ†“
            VBench + VLM (evaluate quality)
                โ†“
        Learn better prompting strategies

Setting Up Your Video Generator

You need to provide a video_generator function that connects to your preferred text-to-video service:

Example 1: Runway Gen-3

from vidspy import VideoChainOfThought

def runway_generator(prompt: str, **kwargs) -> str:
    """Generate video using Runway Gen-3 API."""
    import requests

    response = requests.post(
        "https://api.runwayml.com/v1/generate",
        headers={"Authorization": f"Bearer {RUNWAY_API_KEY}"},
        json={
            "prompt": prompt,
            "model": "gen3",
            "duration": kwargs.get("duration", 5)
        }
    )

    video_url = response.json()["output"]["url"]
    # Download and save video locally
    video_path = f"outputs/{response.json()['id']}.mp4"
    # ... download logic ...
    return video_path

# Create module with your generator
module = VideoChainOfThought(
    "prompt -> video",
    video_generator=runway_generator
)

Example 2: Replicate (Stable Video Diffusion, CogVideo, etc.)

import replicate

def replicate_generator(prompt: str, **kwargs) -> str:
    """Generate video using Replicate API."""
    output = replicate.run(
        "stability-ai/stable-video-diffusion",
        input={"prompt": prompt}
    )

    # Save video from output URL
    video_path = f"outputs/{uuid.uuid4()}.mp4"
    # ... download logic ...
    return video_path

module = VideoChainOfThought(
    "prompt -> video",
    video_generator=replicate_generator
)

Example 3: Pika Labs

from pika import PikaClient

def pika_generator(prompt: str, **kwargs) -> str:
    """Generate video using Pika Labs API."""
    client = PikaClient(api_key=PIKA_API_KEY)

    video = client.generate_video(
        prompt=prompt,
        aspect_ratio="16:9",
        duration=3
    )

    return video.download_path

module = VideoChainOfThought(
    "prompt -> video",
    video_generator=pika_generator
)

Supported Text-to-Video Services

ViDSPy works with any text-to-video API. Popular options include:

Service API Available Notes
Runway Gen-3 โœ… Yes High quality, good motion
Pika Labs โœ… Yes Creative effects, good for social media
Stability AI Video โœ… Yes Open weights available
Replicate โœ… Yes Multiple models (CogVideo, SVD, etc.)
LumaAI Dream Machine โœ… Yes Cinematic quality
HaiperAI โœ… Yes Fast generation
Morph Studio โœ… Yes Style control

Custom Video Generator Template

def my_video_generator(prompt: str, **kwargs) -> str:
    """
    Your custom video generation function.

    Args:
        prompt: Enhanced prompt from ViDSPy
        **kwargs: Additional parameters (duration, aspect_ratio, etc.)

    Returns:
        Local path to the generated video file
    """
    # 1. Call your text-to-video API
    # 2. Download the generated video
    # 3. Save it locally
    # 4. Return the file path

    video_path = "path/to/generated/video.mp4"
    return video_path

๐Ÿš€ Quick Start

from vidspy import ViDSPy, VideoChainOfThought, Example

# Step 1: Define your video generator (connect to your text-to-video API)
def my_video_generator(prompt: str, **kwargs) -> str:
    """Your text-to-video API integration."""
    # Example: Runway, Pika, Replicate, etc.
    import my_video_api
    video = my_video_api.generate(prompt=prompt)
    return video.save_path()

# Step 2: Initialize ViDSPy with OpenRouter VLM backend
# Note: Optimizer LLM defaults to "openai/gpt-4o-mini" via OpenRouter
# You can customize: ViDSPy(vlm_backend="openrouter", optimizer_lm="openai/gpt-4o")
vidspy = ViDSPy(vlm_backend="openrouter")

# Step 3: Create training examples (use videos you've already generated)
trainset = [
    Example(prompt="a cat jumping over a fence", video_path="cat_jump.mp4"),
    Example(prompt="a dog running in a park", video_path="dog_run.mp4"),
    Example(prompt="a bird flying through clouds", video_path="bird_fly.mp4"),
]

# Step 4: Create module with your video generator
module = VideoChainOfThought(
    "prompt -> video",
    video_generator=my_video_generator  # Connect your generator!
)

# Step 5: Optimize prompting strategy
optimized = vidspy.optimize(
    module,
    trainset,
    optimizer="mipro_v2"  # Multi-stage instruction + demo optimization
)

# Step 6: Generate videos with optimized prompts
result = optimized("a dolphin swimming in the ocean")
print(f"Generated video: {result.video_path}")
print(f"Optimized prompt used: {result.enhanced_prompt}")

๐Ÿ“Š VBench Metrics

ViDSPy uses VBench's 10 CORE_METRICS split into two categories:

Video Quality Metrics (60% weight, video-only)

Metric Description
subject_consistency Temporal stability of subjects
motion_smoothness Natural motion quality
temporal_flickering Absence of temporal jitter
human_anatomy Correct hands/faces/torso rendering
aesthetic_quality Artistic/visual beauty
imaging_quality Technical clarity and sharpness

Text-Video Alignment Metrics (40% weight, prompt-conditioned)

Metric Description
object_class Prompt objects appear correctly
human_action Prompt actions performed correctly
spatial_relationship Correct spatial layout
overall_consistency Holistic text-video alignment

Using Metrics

from vidspy.metrics import composite_reward, quality_score, alignment_score

# Default composite metric (60% quality + 40% alignment)
score = composite_reward(example, prediction)

# Quality-only score
q_score = quality_score(example, prediction)

# Alignment-only score
a_score = alignment_score(example, prediction)

# Custom metric configuration
from vidspy.metrics import VBenchMetric

custom_metric = VBenchMetric(
    quality_weight=0.5,
    alignment_weight=0.5,
    quality_metrics=["motion_smoothness", "aesthetic_quality"],
    alignment_metrics=["object_class", "overall_consistency"]
)

๐Ÿ”ง Optimizers

ViDSPy provides 5 DSPy-compatible optimizers:

Optimizer Description Key Parameters
VidBootstrapFewShot Auto-generate/select few-shots max_bootstrapped_demos=4
VidLabeledFewShot Static few-shot assignment k=3
VidMIPROv2 Multi-stage instruction + demo optimization num_candidates=10, auto="light"
VidCOPRO Cooperative multi-LM instruction optimization breadth=5, depth=3
VidGEPA Generate + Evaluate + Propose + Accept auto="light"

Example: Using Different Optimizers

# Bootstrap few-shot
optimized = vidspy.optimize(
    module, trainset,
    optimizer="bootstrap",
    max_bootstrapped_demos=4
)

# MIPROv2 with more candidates
optimized = vidspy.optimize(
    module, trainset,
    optimizer="mipro_v2",
    num_candidates=15,
    auto="medium"
)

# COPRO with custom search
optimized = vidspy.optimize(
    module, trainset,
    optimizer="copro",
    breadth=10,
    depth=5
)

๐Ÿค– VLM Providers

Vision Language Models (VLMs) in ViDSPy are used for:

  • ๐Ÿ“ Prompt enhancement - Improving user prompts before generation
  • ๐Ÿ” Video analysis - Understanding generated video content
  • ๐ŸŽฏ Quality assessment - Analyzing text-video alignment
  • ๐Ÿง  Chain-of-thought reasoning - Planning video generation strategies

Note: VLMs do NOT generate videos. They help optimize the prompts you send to your text-to-video API.

OpenRouter (Default)

Cloud-based multimodal VLMs via unified API:

vidspy = ViDSPy(
    vlm_backend="openrouter",
    vlm_model="google/gemini-2.5-flash",
    api_key="your-api-key"  # Or set OPENROUTER_API_KEY env var
)

Example models:

  • google/gemini-2.5-flash
  • google/gemini-1.5-pro
  • anthropic/claude-opus-4.5
  • openai/gpt-4o

HuggingFace (Local)

Local video VLMs for offline usage:

vidspy = ViDSPy(
    vlm_backend="huggingface",
    vlm_model="llava-hf/llava-v1.6-mistral-7b-hf",
    device="cuda"
)

๐Ÿ“ Video Modules

ViDSPy provides several module types for different prompting strategies.

Module Signatures

When creating video modules, you'll typically use simple signature strings like "prompt -> video" or "prompt -> video_path". The library internally handles all the complex reasoning steps (scene analysis, motion planning, etc.) based on the module type you choose. You don't need to worry about the internal signature detailsโ€”just use these standard formats and ViDSPy takes care of the rest.

from vidspy import VideoPredict, VideoChainOfThought, VideoReAct, VideoEnsemble

# Define your video generator once
def my_video_gen(prompt, **kwargs):
    # Your text-to-video API call
    return video_path

# Simple prediction with prompt enhancement
# Just use "prompt -> video_path" - ViDSPy handles the internal complexity
predictor = VideoPredict(
    "prompt -> video_path",
    video_generator=my_video_gen
)

# Chain-of-thought reasoning (analyzes scene, motion, style before generating)
# The simple "prompt -> video" signature is all you need
cot = VideoChainOfThought(
    "prompt -> video",
    video_generator=my_video_gen
)

# ReAct-style iterative refinement (generates, evaluates, refines)
# Same simple signature - internal reasoning is handled automatically
react = VideoReAct(
    "prompt -> video",
    video_generator=my_video_gen,
    max_iterations=3
)

# Ensemble multiple approaches (tries different strategies, picks best)
ensemble = VideoEnsemble([
    VideoPredict(video_generator=my_video_gen),
    VideoChainOfThought(video_generator=my_video_gen),
], selection_metric=composite_reward)

๐Ÿ› ๏ธ Setup VBench Models

# Via CLI
vidspy setup

# Via Python
from vidspy import setup_vbench_models
setup_vbench_models()  # Downloads to ~/.cache/vbench

๐Ÿ“ Full Example

import os
import replicate
from vidspy import (
    ViDSPy,
    VideoChainOfThought,
    Example,
    composite_reward,
    VBenchMetric,
)

# Set API keys (or pass directly to ViDSPy)
os.environ["OPENROUTER_API_KEY"] = "your-vlm-api-key"
os.environ["OPENROUTER_OPTIMIZER_API_KEY"] = "your-optimizer-api-key"

# Define video generator (using Replicate as example)
def replicate_video_generator(prompt: str, **kwargs) -> str:
    """Generate video using Replicate API."""
    output = replicate.run(
        "stability-ai/stable-video-diffusion",
        input={"prompt": prompt}
    )

    # Download and save video
    video_path = f"outputs/{hash(prompt)}.mp4"
    # ... download logic ...
    return video_path

# Initialize ViDSPy with all available parameters
vidspy = ViDSPy(
    # VLM Provider Settings (corresponds to vlm: section in config)
    vlm_backend="openrouter",  # or "huggingface"
    vlm_model="google/gemini-2.5-flash",  # VLM for video analysis
    api_key="your-vlm-api-key",  # Or set OPENROUTER_API_KEY env var

    # Optimizer LLM Settings (corresponds to optimizer: section in config)
    optimizer_lm="openai/gpt-4o-mini",  # LLM for optimizers (MIPROv2, COPRO, GEPA)
    optimizer_api_key="your-optimizer-api-key",  # Or set OPENROUTER_OPTIMIZER_API_KEY env var

    # Cache Settings (corresponds to cache.dir in config)
    cache_dir="~/.cache/vidspy",  # Cache directory for models
    # Note: VBench models cache (cache.vbench_models) is set via setup_vbench_models()

    # Hardware Settings (corresponds to hardware: section in config)
    device="auto",  # "cuda", "cpu", or "auto"
    # Note: hardware.dtype from config is used internally by HuggingFace VLM

    # Config File (optional)
    # config_path="vidspy_config.yaml",  # Load all settings from config file
)

# Prepare training data (videos you've already generated)
trainset = [
    Example(
        prompt="a person walking through a forest",
        video_path="data/walk_forest.mp4"
    ),
    Example(
        prompt="a car driving on a highway",
        video_path="data/car_highway.mp4"
    ),
    # ... more examples
]

# Split for validation
valset = trainset[-2:]
trainset = trainset[:-2]

# Create module with your video generator
module = VideoChainOfThought(
    "prompt -> video",
    video_generator=replicate_video_generator
)

# Optional: Create custom metric with specific settings
# (Corresponds to metrics: section in config file)
custom_metric = VBenchMetric(
    quality_weight=0.6,  # Corresponds to metrics.quality_weight in config
    alignment_weight=0.4,  # Corresponds to metrics.alignment_weight in config
    quality_metrics=[  # Corresponds to metrics.quality_metrics in config
        "subject_consistency",
        "motion_smoothness",
        "temporal_flickering",
        "human_anatomy",
        "aesthetic_quality",
        "imaging_quality"
    ],
    alignment_metrics=[  # Corresponds to metrics.alignment_metrics in config
        "object_class",
        "human_action",
        "spatial_relationship",
        "overall_consistency"
    ]
)
# Note: Target thresholds (targets: section in config) are for reference/documentation

# Optimize prompting strategy with optimization settings
# (These correspond to the optimization: section in config file)
optimized = vidspy.optimize(
    module,
    trainset,
    valset=valset,
    metric=custom_metric,  # Or use composite_reward for defaults (60% quality, 40% alignment)
    optimizer="mipro_v2",  # Corresponds to optimization.default_optimizer in config
    num_candidates=10,
    max_bootstrapped_demos=4,  # Corresponds to optimization.max_bootstrapped_demos in config
    max_labeled_demos=4,  # Corresponds to optimization.max_labeled_demos in config
)

# Evaluate on test set
testset = [Example(prompt="a boat on a lake", video_path="data/boat.mp4")]
results = vidspy.evaluate(optimized, testset)

print(f"Mean Score: {results['mean_score']:.4f}")
print(f"Quality: {results['details'][0].get('quality_score', 'N/A')}")
print(f"Alignment: {results['details'][0].get('alignment_score', 'N/A')}")

# Generate new videos with optimized prompts
result = optimized("a butterfly landing on a flower")
print(f"Generated: {result.video_path}")
print(f"Enhanced prompt: {result.enhanced_prompt}")

๐Ÿ“– CLI Reference

# Show help
vidspy --help

# Setup VBench models
vidspy setup
vidspy setup --cache-dir /path/to/cache

# Check dependencies
vidspy setup --check-only

# Optimize a module
vidspy optimize trainset.json --optimizer mipro_v2 --output optimized_model

# Evaluate a module
vidspy evaluate testset.json --module optimized_model --output results.json

# Show information
vidspy info

๐Ÿ—๏ธ Project Structure

vidspy/
โ”œโ”€โ”€ pyproject.toml              # Package configuration
โ”œโ”€โ”€ README.md                   # This file
โ”œโ”€โ”€ .env.example                # Environment variables template
โ”œโ”€โ”€ vidspy_config.yaml.example  # Configuration file template
โ”œโ”€โ”€ vidspy/
โ”‚   โ”œโ”€โ”€ __init__.py             # Main exports
โ”‚   โ”œโ”€โ”€ core.py                 # ViDSPy main class, Example
โ”‚   โ”œโ”€โ”€ signatures.py           # VideoSignature, etc.
โ”‚   โ”œโ”€โ”€ modules.py              # VideoPredict, VideoChainOfThought
โ”‚   โ”œโ”€โ”€ optimizers.py           # VidBootstrapFewShot, VidMIPROv2, etc.
โ”‚   โ”œโ”€โ”€ metrics.py              # VBench wrappers, composite_reward
โ”‚   โ”œโ”€โ”€ providers.py            # OpenRouterVLM, HuggingFaceVLM
โ”‚   โ”œโ”€โ”€ setup.py                # Setup utilities
โ”‚   โ””โ”€โ”€ cli.py                  # Command-line interface
โ”œโ”€โ”€ examples/
โ”‚   โ””โ”€โ”€ basic_usage.py
โ””โ”€โ”€ tests/
    โ””โ”€โ”€ test_basic.py

๐Ÿ”ฌ Target Thresholds

For production-quality videos, aim for:

  • Human Anatomy: โ‰ฅ 0.85
  • Text-Video Alignment: โ‰ฅ 0.80

๐Ÿ“š References

  • DSPy - Declarative Self-improving Python
  • VBench - Video generation benchmark
  • OpenRouter - Unified AI API

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿค Contributing

Contributions welcome! Please read our contributing guide first.

โญ Star History

If you find ViDSPy useful, please consider giving it a star!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidspy-0.1.3.tar.gz (48.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vidspy-0.1.3-py3-none-any.whl (41.0 kB view details)

Uploaded Python 3

File details

Details for the file vidspy-0.1.3.tar.gz.

File metadata

  • Download URL: vidspy-0.1.3.tar.gz
  • Upload date:
  • Size: 48.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vidspy-0.1.3.tar.gz
Algorithm Hash digest
SHA256 85935f96d306747d893c74e0ed12aa38fa3a915e66990707be6723a9fbd84da0
MD5 2d6ccbf75c21b3cecc8693e2f1b39a8c
BLAKE2b-256 6e9ac22c9c95a303e3fc866dd047e8aa0f181d7e74f15dc45e3b0ff96e233b8e

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidspy-0.1.3.tar.gz:

Publisher: publish.yml on leockl/vidspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidspy-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: vidspy-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 41.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vidspy-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a5f205191fbecd6b61042d78a8c922c11caa839d9d0d3c5365354969acf7c5e1
MD5 9b11ed48b20bda030e7dc77d5f0644f3
BLAKE2b-256 cac05cc900c97098340ff542511e9ecc30319523373fa44dcb6838dee63ebb6c

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidspy-0.1.3-py3-none-any.whl:

Publisher: publish.yml on leockl/vidspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page