Skip to main content

DSPy-style framework for optimizing text-to-video generation via VBench metric feedback

Project description

ViDSPy ๐ŸŽฌ

DSPy-style framework for optimizing text-to-video generation via VBench metric feedback.

PyPI version License: MIT Python 3.9+

ViDSPy brings the power of DSPy's declarative programming paradigm to text-to-video generation. Optimize your video generation prompts and few-shot demonstrations using VBench quality metrics as feedback signals.

๐ŸŽฏ Key Features

  • DSPy-style Optimization: Tune instructions (prompt templates) and demonstrations (few-shot examples) using canonical DSPy optimizers
  • VBench Integration: Full support for all 10 CORE_METRICS from VBench evaluation
  • Multiple Optimizers: BootstrapFewShot, LabeledFewShot, MIPROv2, COPRO, and GEPA
  • Flexible VLM Backends: OpenRouter (cloud) and HuggingFace (local) support
  • Composite Metrics: Weighted combination of video quality (60%) and text-video alignment (40%)

๐Ÿ“ฆ Installation

pip install vidspy

For full functionality with VBench evaluation:

pip install vidspy[vbench]

For development:

pip install vidspy[all]

โš™๏ธ Configuration

ViDSPy can be configured in two ways:

Option 1: Pass Arguments Directly in Code

from vidspy import ViDSPy

vidspy = ViDSPy(
    vlm_backend="openrouter",
    vlm_model="google/gemini-2.0-flash-001",
    api_key="your-api-key",
    device="auto"
)

Option 2: Use Configuration File

Create a vidspy_config.yaml file from the template:

cp vidspy_config.yaml.example vidspy_config.yaml

Edit the configuration file:

# vidspy_config.yaml
vlm:
  backend: openrouter
  model: google/gemini-2.0-flash-001

optimization:
  default_optimizer: mipro_v2
  max_bootstrapped_demos: 4

metrics:
  quality_weight: 0.6
  alignment_weight: 0.4

cache:
  dir: ~/.cache/vidspy

hardware:
  device: auto
  dtype: float16

Then use ViDSPy without any arguments - it automatically loads the config:

from vidspy import ViDSPy, VideoChainOfThought, Example

# ViDSPy automatically finds and loads vidspy_config.yaml
vidspy = ViDSPy()

# All settings from the config file are now applied!
trainset = [Example(prompt="a cat jumping", video_path="cat.mp4")]
optimized = vidspy.optimize(VideoChainOfThought("prompt -> video"), trainset)

Config file search order:

ViDSPy automatically searches for vidspy_config.yaml in:

  1. Current working directory: ./vidspy_config.yaml
  2. User config directory: ~/.vidspy/config.yaml
  3. User home directory: ~/vidspy_config.yaml

Custom config path:

You can also specify a custom config file location:

vidspy = ViDSPy(config_path="/path/to/custom_config.yaml")

Important:

  • Arguments passed directly to ViDSPy() always override config file values
  • API keys should be in environment variables or .env file (not in the config file):
# .env
OPENROUTER_API_KEY=your-api-key-here

๐Ÿš€ Quick Start

from vidspy import ViDSPy, VideoChainOfThought, Example

# Initialize ViDSPy with OpenRouter VLM backend
vidspy = ViDSPy(vlm_backend="openrouter")

# Create training examples
trainset = [
    Example(prompt="a cat jumping over a fence", video_path="cat_jump.mp4"),
    Example(prompt="a dog running in a park", video_path="dog_run.mp4"),
    Example(prompt="a bird flying through clouds", video_path="bird_fly.mp4"),
]

# Optimize a video generation module
optimized = vidspy.optimize(
    VideoChainOfThought("prompt -> video"),
    trainset,
    optimizer="mipro_v2"  # Multi-stage instruction + demo optimization
)

# Generate videos with optimized prompts and demonstrations
result = optimized("a dolphin swimming in the ocean")
print(f"Generated video: {result.video_path}")

๐Ÿ“Š VBench Metrics

ViDSPy uses VBench's 10 CORE_METRICS split into two categories:

Video Quality Metrics (60% weight, video-only)

Metric Description
subject_consistency Temporal stability of subjects
motion_smoothness Natural motion quality
temporal_flickering Absence of temporal jitter
human_anatomy Correct hands/faces/torso rendering
aesthetic_quality Artistic/visual beauty
imaging_quality Technical clarity and sharpness

Text-Video Alignment Metrics (40% weight, prompt-conditioned)

Metric Description
object_class Prompt objects appear correctly
human_action Prompt actions performed correctly
spatial_relationship Correct spatial layout
overall_consistency Holistic text-video alignment

Using Metrics

from vidspy.metrics import composite_reward, quality_score, alignment_score

# Default composite metric (60% quality + 40% alignment)
score = composite_reward(example, prediction)

# Quality-only score
q_score = quality_score(example, prediction)

# Alignment-only score
a_score = alignment_score(example, prediction)

# Custom metric configuration
from vidspy.metrics import VBenchMetric

custom_metric = VBenchMetric(
    quality_weight=0.5,
    alignment_weight=0.5,
    quality_metrics=["motion_smoothness", "aesthetic_quality"],
    alignment_metrics=["object_class", "overall_consistency"]
)

๐Ÿ”ง Optimizers

ViDSPy provides 5 DSPy-compatible optimizers:

Optimizer Description Key Parameters
VidBootstrapFewShot Auto-generate/select few-shots max_bootstrapped_demos=4
VidLabeledFewShot Static few-shot assignment k=3
VidMIPROv2 Multi-stage instruction + demo optimization num_candidates=10, auto="light"
VidCOPRO Cooperative multi-LM instruction optimization breadth=5, depth=3
VidGEPA Generate + Evaluate + Propose + Accept auto="light"

Example: Using Different Optimizers

# Bootstrap few-shot
optimized = vidspy.optimize(
    module, trainset,
    optimizer="bootstrap",
    max_bootstrapped_demos=4
)

# MIPROv2 with more candidates
optimized = vidspy.optimize(
    module, trainset,
    optimizer="mipro_v2",
    num_candidates=15,
    auto="medium"
)

# COPRO with custom search
optimized = vidspy.optimize(
    module, trainset,
    optimizer="copro",
    breadth=10,
    depth=5
)

๐Ÿค– VLM Providers

OpenRouter (Default)

Cloud-based video VLMs via unified API:

vidspy = ViDSPy(
    vlm_backend="openrouter",
    vlm_model="google/gemini-2.0-flash-001",
    api_key="your-api-key"  # Or set OPENROUTER_API_KEY env var
)

Supported models:

  • google/gemini-2.0-flash-001 (default)
  • google/gemini-1.5-pro
  • anthropic/claude-3-opus
  • openai/gpt-4o

HuggingFace (Local)

Local video VLMs for offline usage:

vidspy = ViDSPy(
    vlm_backend="huggingface",
    vlm_model="llava-hf/llava-v1.6-mistral-7b-hf",
    device="cuda"
)

๐Ÿ“ Video Modules

ViDSPy provides several module types for different use cases:

from vidspy import VideoPredict, VideoChainOfThought, VideoReAct, VideoEnsemble

# Simple prediction
predictor = VideoPredict("prompt -> video_path")

# Chain-of-thought reasoning
cot = VideoChainOfThought("prompt -> video")

# ReAct-style iterative refinement
react = VideoReAct("prompt -> video", max_iterations=3)

# Ensemble multiple approaches
ensemble = VideoEnsemble([
    VideoPredict(),
    VideoChainOfThought(),
], selection_metric=composite_reward)

๐Ÿ› ๏ธ Setup VBench Models

# Via CLI
vidspy setup

# Via Python
from vidspy import setup_vbench_models
setup_vbench_models()  # Downloads to ~/.cache/vbench

๐Ÿ“ Full Example

import os
from vidspy import (
    ViDSPy,
    VideoChainOfThought,
    Example,
    composite_reward,
)

# Set API key
os.environ["OPENROUTER_API_KEY"] = "your-api-key"

# Initialize
vidspy = ViDSPy(vlm_backend="openrouter")

# Prepare training data
trainset = [
    Example(
        prompt="a person walking through a forest",
        video_path="data/walk_forest.mp4"
    ),
    Example(
        prompt="a car driving on a highway",
        video_path="data/car_highway.mp4"
    ),
    # ... more examples
]

# Split for validation
valset = trainset[-2:]
trainset = trainset[:-2]

# Create and optimize module
module = VideoChainOfThought("prompt -> video")

optimized = vidspy.optimize(
    module,
    trainset,
    valset=valset,
    metric=composite_reward,
    optimizer="mipro_v2",
    num_candidates=10,
)

# Evaluate on test set
testset = [Example(prompt="a boat on a lake", video_path="data/boat.mp4")]
results = vidspy.evaluate(optimized, testset)

print(f"Mean Score: {results['mean_score']:.4f}")
print(f"Quality: {results['details'][0].get('quality_score', 'N/A')}")
print(f"Alignment: {results['details'][0].get('alignment_score', 'N/A')}")

# Generate new videos
result = optimized("a butterfly landing on a flower")
print(f"Generated: {result.video_path}")
print(f"Enhanced prompt: {result.enhanced_prompt}")

๐Ÿ“– CLI Reference

# Show help
vidspy --help

# Setup VBench models
vidspy setup
vidspy setup --cache-dir /path/to/cache

# Check dependencies
vidspy setup --check-only

# Optimize a module
vidspy optimize trainset.json --optimizer mipro_v2 --output optimized_model

# Evaluate a module
vidspy evaluate testset.json --module optimized_model --output results.json

# Show information
vidspy info

๐Ÿ—๏ธ Project Structure

vidspy/
โ”œโ”€โ”€ pyproject.toml              # Package configuration
โ”œโ”€โ”€ README.md                   # This file
โ”œโ”€โ”€ .env.example                # Environment variables template
โ”œโ”€โ”€ vidspy_config.yaml.example  # Configuration file template
โ”œโ”€โ”€ vidspy/
โ”‚   โ”œโ”€โ”€ __init__.py             # Main exports
โ”‚   โ”œโ”€โ”€ core.py                 # ViDSPy main class, Example
โ”‚   โ”œโ”€โ”€ signatures.py           # VideoSignature, etc.
โ”‚   โ”œโ”€โ”€ modules.py              # VideoPredict, VideoChainOfThought
โ”‚   โ”œโ”€โ”€ optimizers.py           # VidBootstrapFewShot, VidMIPROv2, etc.
โ”‚   โ”œโ”€โ”€ metrics.py              # VBench wrappers, composite_reward
โ”‚   โ”œโ”€โ”€ providers.py            # OpenRouterVLM, HuggingFaceVLM
โ”‚   โ”œโ”€โ”€ setup.py                # Setup utilities
โ”‚   โ””โ”€โ”€ cli.py                  # Command-line interface
โ”œโ”€โ”€ examples/
โ”‚   โ””โ”€โ”€ basic_usage.py
โ””โ”€โ”€ tests/
    โ””โ”€โ”€ test_basic.py

๐Ÿ”ฌ Target Thresholds

For production-quality videos, aim for:

  • Human Anatomy: โ‰ฅ 0.85
  • Text-Video Alignment: โ‰ฅ 0.80

๐Ÿ“š References

  • DSPy - Declarative Self-improving Python
  • VBench - Video generation benchmark
  • OpenRouter - Unified AI API

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿค Contributing

Contributions welcome! Please read our contributing guide first.

โญ Star History

If you find ViDSPy useful, please consider giving it a star!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidspy-0.1.1.tar.gz (40.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vidspy-0.1.1-py3-none-any.whl (37.3 kB view details)

Uploaded Python 3

File details

Details for the file vidspy-0.1.1.tar.gz.

File metadata

  • Download URL: vidspy-0.1.1.tar.gz
  • Upload date:
  • Size: 40.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vidspy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 54a6e0cf0935e50010633ebe4733ba99a983c632833c5615bcf753f618a0ee3e
MD5 8d1617fe43c90cc01c8c8eccd3d0a403
BLAKE2b-256 d7b38ecb7aca9d565d4874d297f1fe0f42b42bf91d306bbde197cdbae0d1eefd

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidspy-0.1.1.tar.gz:

Publisher: publish.yml on leockl/vidspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidspy-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: vidspy-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 37.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vidspy-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 538e390023380d492c3e77cd6696769d6830686f5c1c9ff3814fe327714d5f3d
MD5 7bca2b76b2f878fcb1131dc547462a86
BLAKE2b-256 d6358ca66c8b09d3caddb361e50d93c974e25d2fcb73adffff1d10cf9923d696

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidspy-0.1.1-py3-none-any.whl:

Publisher: publish.yml on leockl/vidspy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page