DSPy-style framework for optimizing text-to-video generation via VBench metric feedback
Project description
ViDSPy ๐ฌ
DSPy-style framework for optimizing text-to-video generation via VBench metric feedback.
ViDSPy brings the power of DSPy's declarative programming paradigm to text-to-video generation. Optimize your video generation prompts and few-shot demonstrations using VBench quality metrics as feedback signals.
๐ฏ Key Features
- DSPy-style Optimization: Tune instructions (prompt templates) and demonstrations (few-shot examples) using canonical DSPy optimizers
- VBench Integration: Full support for all 10 CORE_METRICS from VBench evaluation
- Multiple Optimizers: BootstrapFewShot, LabeledFewShot, MIPROv2, COPRO, and GEPA
- Flexible VLM Backends: OpenRouter (cloud) and HuggingFace (local) support
- Composite Metrics: Weighted combination of video quality (60%) and text-video alignment (40%)
๐ฆ Installation
pip install vidspy
For full functionality with VBench evaluation:
pip install vidspy[vbench]
For development:
pip install vidspy[all]
โ๏ธ Configuration
ViDSPy can be configured in two ways:
Option 1: Pass Arguments Directly in Code
from vidspy import ViDSPy
vidspy = ViDSPy(
vlm_backend="openrouter",
vlm_model="google/gemini-2.0-flash-001",
api_key="your-api-key",
device="auto"
)
Option 2: Use Configuration File
Create a vidspy_config.yaml file from the template:
cp vidspy_config.yaml.example vidspy_config.yaml
Edit the configuration file:
# vidspy_config.yaml
vlm:
backend: openrouter
model: google/gemini-2.0-flash-001
optimization:
default_optimizer: mipro_v2
max_bootstrapped_demos: 4
metrics:
quality_weight: 0.6
alignment_weight: 0.4
cache:
dir: ~/.cache/vidspy
hardware:
device: auto
dtype: float16
Then use ViDSPy without any arguments - it automatically loads the config:
from vidspy import ViDSPy, VideoChainOfThought, Example
# ViDSPy automatically finds and loads vidspy_config.yaml
vidspy = ViDSPy()
# All settings from the config file are now applied!
trainset = [Example(prompt="a cat jumping", video_path="cat.mp4")]
optimized = vidspy.optimize(VideoChainOfThought("prompt -> video"), trainset)
Config file search order:
ViDSPy automatically searches for vidspy_config.yaml in:
- Current working directory:
./vidspy_config.yaml - User config directory:
~/.vidspy/config.yaml - User home directory:
~/vidspy_config.yaml
Custom config path:
You can also specify a custom config file location:
vidspy = ViDSPy(config_path="/path/to/custom_config.yaml")
Important:
- Arguments passed directly to
ViDSPy()always override config file values - API keys should be in environment variables or
.envfile (not in the config file):
# .env
OPENROUTER_API_KEY=your-api-key-here
๐ Quick Start
from vidspy import ViDSPy, VideoChainOfThought, Example
# Initialize ViDSPy with OpenRouter VLM backend
vidspy = ViDSPy(vlm_backend="openrouter")
# Create training examples
trainset = [
Example(prompt="a cat jumping over a fence", video_path="cat_jump.mp4"),
Example(prompt="a dog running in a park", video_path="dog_run.mp4"),
Example(prompt="a bird flying through clouds", video_path="bird_fly.mp4"),
]
# Optimize a video generation module
optimized = vidspy.optimize(
VideoChainOfThought("prompt -> video"),
trainset,
optimizer="mipro_v2" # Multi-stage instruction + demo optimization
)
# Generate videos with optimized prompts and demonstrations
result = optimized("a dolphin swimming in the ocean")
print(f"Generated video: {result.video_path}")
๐ VBench Metrics
ViDSPy uses VBench's 10 CORE_METRICS split into two categories:
Video Quality Metrics (60% weight, video-only)
| Metric | Description |
|---|---|
subject_consistency |
Temporal stability of subjects |
motion_smoothness |
Natural motion quality |
temporal_flickering |
Absence of temporal jitter |
human_anatomy |
Correct hands/faces/torso rendering |
aesthetic_quality |
Artistic/visual beauty |
imaging_quality |
Technical clarity and sharpness |
Text-Video Alignment Metrics (40% weight, prompt-conditioned)
| Metric | Description |
|---|---|
object_class |
Prompt objects appear correctly |
human_action |
Prompt actions performed correctly |
spatial_relationship |
Correct spatial layout |
overall_consistency |
Holistic text-video alignment |
Using Metrics
from vidspy.metrics import composite_reward, quality_score, alignment_score
# Default composite metric (60% quality + 40% alignment)
score = composite_reward(example, prediction)
# Quality-only score
q_score = quality_score(example, prediction)
# Alignment-only score
a_score = alignment_score(example, prediction)
# Custom metric configuration
from vidspy.metrics import VBenchMetric
custom_metric = VBenchMetric(
quality_weight=0.5,
alignment_weight=0.5,
quality_metrics=["motion_smoothness", "aesthetic_quality"],
alignment_metrics=["object_class", "overall_consistency"]
)
๐ง Optimizers
ViDSPy provides 5 DSPy-compatible optimizers:
| Optimizer | Description | Key Parameters |
|---|---|---|
VidBootstrapFewShot |
Auto-generate/select few-shots | max_bootstrapped_demos=4 |
VidLabeledFewShot |
Static few-shot assignment | k=3 |
VidMIPROv2 |
Multi-stage instruction + demo optimization | num_candidates=10, auto="light" |
VidCOPRO |
Cooperative multi-LM instruction optimization | breadth=5, depth=3 |
VidGEPA |
Generate + Evaluate + Propose + Accept | auto="light" |
Example: Using Different Optimizers
# Bootstrap few-shot
optimized = vidspy.optimize(
module, trainset,
optimizer="bootstrap",
max_bootstrapped_demos=4
)
# MIPROv2 with more candidates
optimized = vidspy.optimize(
module, trainset,
optimizer="mipro_v2",
num_candidates=15,
auto="medium"
)
# COPRO with custom search
optimized = vidspy.optimize(
module, trainset,
optimizer="copro",
breadth=10,
depth=5
)
๐ค VLM Providers
OpenRouter (Default)
Cloud-based video VLMs via unified API:
vidspy = ViDSPy(
vlm_backend="openrouter",
vlm_model="google/gemini-2.0-flash-001",
api_key="your-api-key" # Or set OPENROUTER_API_KEY env var
)
Supported models:
google/gemini-2.0-flash-001(default)google/gemini-1.5-proanthropic/claude-3-opusopenai/gpt-4o
HuggingFace (Local)
Local video VLMs for offline usage:
vidspy = ViDSPy(
vlm_backend="huggingface",
vlm_model="llava-hf/llava-v1.6-mistral-7b-hf",
device="cuda"
)
๐ Video Modules
ViDSPy provides several module types for different use cases:
from vidspy import VideoPredict, VideoChainOfThought, VideoReAct, VideoEnsemble
# Simple prediction
predictor = VideoPredict("prompt -> video_path")
# Chain-of-thought reasoning
cot = VideoChainOfThought("prompt -> video")
# ReAct-style iterative refinement
react = VideoReAct("prompt -> video", max_iterations=3)
# Ensemble multiple approaches
ensemble = VideoEnsemble([
VideoPredict(),
VideoChainOfThought(),
], selection_metric=composite_reward)
๐ ๏ธ Setup VBench Models
# Via CLI
vidspy setup
# Via Python
from vidspy import setup_vbench_models
setup_vbench_models() # Downloads to ~/.cache/vbench
๐ Full Example
import os
from vidspy import (
ViDSPy,
VideoChainOfThought,
Example,
composite_reward,
)
# Set API key
os.environ["OPENROUTER_API_KEY"] = "your-api-key"
# Initialize
vidspy = ViDSPy(vlm_backend="openrouter")
# Prepare training data
trainset = [
Example(
prompt="a person walking through a forest",
video_path="data/walk_forest.mp4"
),
Example(
prompt="a car driving on a highway",
video_path="data/car_highway.mp4"
),
# ... more examples
]
# Split for validation
valset = trainset[-2:]
trainset = trainset[:-2]
# Create and optimize module
module = VideoChainOfThought("prompt -> video")
optimized = vidspy.optimize(
module,
trainset,
valset=valset,
metric=composite_reward,
optimizer="mipro_v2",
num_candidates=10,
)
# Evaluate on test set
testset = [Example(prompt="a boat on a lake", video_path="data/boat.mp4")]
results = vidspy.evaluate(optimized, testset)
print(f"Mean Score: {results['mean_score']:.4f}")
print(f"Quality: {results['details'][0].get('quality_score', 'N/A')}")
print(f"Alignment: {results['details'][0].get('alignment_score', 'N/A')}")
# Generate new videos
result = optimized("a butterfly landing on a flower")
print(f"Generated: {result.video_path}")
print(f"Enhanced prompt: {result.enhanced_prompt}")
๐ CLI Reference
# Show help
vidspy --help
# Setup VBench models
vidspy setup
vidspy setup --cache-dir /path/to/cache
# Check dependencies
vidspy setup --check-only
# Optimize a module
vidspy optimize trainset.json --optimizer mipro_v2 --output optimized_model
# Evaluate a module
vidspy evaluate testset.json --module optimized_model --output results.json
# Show information
vidspy info
๐๏ธ Project Structure
vidspy/
โโโ pyproject.toml # Package configuration
โโโ README.md # This file
โโโ .env.example # Environment variables template
โโโ vidspy_config.yaml.example # Configuration file template
โโโ vidspy/
โ โโโ __init__.py # Main exports
โ โโโ core.py # ViDSPy main class, Example
โ โโโ signatures.py # VideoSignature, etc.
โ โโโ modules.py # VideoPredict, VideoChainOfThought
โ โโโ optimizers.py # VidBootstrapFewShot, VidMIPROv2, etc.
โ โโโ metrics.py # VBench wrappers, composite_reward
โ โโโ providers.py # OpenRouterVLM, HuggingFaceVLM
โ โโโ setup.py # Setup utilities
โ โโโ cli.py # Command-line interface
โโโ examples/
โ โโโ basic_usage.py
โโโ tests/
โโโ test_basic.py
๐ฌ Target Thresholds
For production-quality videos, aim for:
- Human Anatomy: โฅ 0.85
- Text-Video Alignment: โฅ 0.80
๐ References
- DSPy - Declarative Self-improving Python
- VBench - Video generation benchmark
- OpenRouter - Unified AI API
๐ License
MIT License - see LICENSE for details.
๐ค Contributing
Contributions welcome! Please read our contributing guide first.
โญ Star History
If you find ViDSPy useful, please consider giving it a star!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vidspy-0.1.1.tar.gz.
File metadata
- Download URL: vidspy-0.1.1.tar.gz
- Upload date:
- Size: 40.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54a6e0cf0935e50010633ebe4733ba99a983c632833c5615bcf753f618a0ee3e
|
|
| MD5 |
8d1617fe43c90cc01c8c8eccd3d0a403
|
|
| BLAKE2b-256 |
d7b38ecb7aca9d565d4874d297f1fe0f42b42bf91d306bbde197cdbae0d1eefd
|
Provenance
The following attestation bundles were made for vidspy-0.1.1.tar.gz:
Publisher:
publish.yml on leockl/vidspy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vidspy-0.1.1.tar.gz -
Subject digest:
54a6e0cf0935e50010633ebe4733ba99a983c632833c5615bcf753f618a0ee3e - Sigstore transparency entry: 834712318
- Sigstore integration time:
-
Permalink:
leockl/vidspy@17f61a44cf9fc6173605cd54d28b25832c8224fd -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/leockl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@17f61a44cf9fc6173605cd54d28b25832c8224fd -
Trigger Event:
release
-
Statement type:
File details
Details for the file vidspy-0.1.1-py3-none-any.whl.
File metadata
- Download URL: vidspy-0.1.1-py3-none-any.whl
- Upload date:
- Size: 37.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
538e390023380d492c3e77cd6696769d6830686f5c1c9ff3814fe327714d5f3d
|
|
| MD5 |
7bca2b76b2f878fcb1131dc547462a86
|
|
| BLAKE2b-256 |
d6358ca66c8b09d3caddb361e50d93c974e25d2fcb73adffff1d10cf9923d696
|
Provenance
The following attestation bundles were made for vidspy-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on leockl/vidspy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vidspy-0.1.1-py3-none-any.whl -
Subject digest:
538e390023380d492c3e77cd6696769d6830686f5c1c9ff3814fe327714d5f3d - Sigstore transparency entry: 834712324
- Sigstore integration time:
-
Permalink:
leockl/vidspy@17f61a44cf9fc6173605cd54d28b25832c8224fd -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/leockl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@17f61a44cf9fc6173605cd54d28b25832c8224fd -
Trigger Event:
release
-
Statement type: