DSPy-based prompt optimizer for video-analyzer
Project description
video-analyzer-tune
DSPy-based prompt optimizer for video-analyzer.
Automatically improves the two prompts that video-analyzer uses — the per-frame analysis prompt and the final video reconstruction prompt — based on examples of what good output looks like for your specific content and use case.
Overview
video-analyzer works in two stages: it analyzes each video frame individually (building up a running log of observations), then synthesizes all the frame notes into a final video description. Both stages are driven by prompt files that you can customize.
video-analyzer-tune uses DSPy MIPROv2 to optimize both prompts end-to-end. You provide a few examples of what ideal output looks like — both at the frame level and the final description level — and the tuner finds better prompt instructions automatically.
The main video-analyzer package is not affected in any way. Tuned prompts are written as new files that you point to via your config.
Requirements
- Python 3.8+
video-analyzer >= 0.1.1- An Ollama instance with a vision model, or an OpenAI-compatible API
Installation
pip install video-analyzer-tune
Quick Start
Step 1 — Generate output with frames kept
Run video-analyzer on a representative video and keep the extracted frames:
video-analyzer my_video.mp4 --keep-frames
This produces an output/ directory containing:
analysis.json— frame-by-frame notes and the final descriptionframes/— the extracted frame images
Step 2 — Edit analysis.json with your ideal output
Open output/analysis.json and edit two things:
Required: Edit video_description.response to show what the ideal final description looks like for your use case.
Recommended: Edit each frame_analyses[i].response to show what ideal frame notes look like. This gives the optimizer a signal at both stages of the pipeline and produces better results.
{
"frame_analyses": [
{
"frame": 0,
"timestamp": 0.0,
"response": "Your ideal frame note here — what details matter for your use case"
}
],
"video_description": {
"response": "Your ideal final description here — the style, length, and focus you want"
}
}
The more videos you edit and include as training examples, the better the results.
Step 3 — Create training_data.json
{
"examples": [
{ "output_dir": "output" }
]
}
Add one entry per video you edited:
{
"examples": [
{ "output_dir": "output/video1" },
{ "output_dir": "output/video2" },
{ "output_dir": "output/video3" }
]
}
Step 4 — Run the tuner
video-analyzer-tune --training-data training_data.json --output-dir tuned_prompts/
This runs MIPROv2 optimization, which will take some time depending on --num-candidates and --num-trials.
Step 5 — Update your config
When tuning completes, the tool prints a config snippet to paste into your config/config.json:
"prompt_dir": "tuned_prompts",
"prompts": [
{"name": "Frame Analysis", "path": "frame_analysis_tuned.txt"},
{"name": "Video Reconstruction", "path": "describe_tuned.txt"}
]
Run video-analyzer as normal — it will use your tuned prompts automatically.
Training Data Format
training_data.json
{
"examples": [
{ "output_dir": "path/to/output" }
]
}
Paths can be absolute or relative to the location of training_data.json.
What to edit in analysis.json
| Field | Required | Description |
|---|---|---|
video_description.response |
Yes | Your ideal final video description |
frame_analyses[i].response |
Recommended | Your ideal frame note for each frame |
prompt |
No | Leave as-is |
transcript |
No | Leave as-is |
CLI Reference
| Flag | Default | Description |
|---|---|---|
--training-data |
required | Path to training_data.json |
--output-dir |
tuned_prompts |
Directory to write tuned prompt files |
--client |
ollama |
LLM client: ollama or openai_api |
--model |
llama3.2-vision |
Vision model to use for optimization runs |
--ollama-url |
http://localhost:11434 |
Ollama server URL |
--api-key |
— | API key (required when --client openai_api) |
--api-url |
— | API endpoint URL (required when --client openai_api) |
--num-candidates |
10 |
Number of prompt variations generated per module. Higher = more thorough but slower. Range: 5–20 |
--num-trials |
20 |
Number of optimization trials. Higher = better results but slower. Range: 10–50 |
--max-bootstrapped-demos |
3 |
Max few-shot examples generated by bootstrapping |
--max-labeled-demos |
4 |
Max few-shot examples taken from your training data |
--description-weight |
0.7 |
How much the final description quality influences the score (0.0–1.0). The remainder weights frame analysis quality. Use 0.5 if you care equally about both; use 1.0 to optimize only for the final description |
--log-level |
INFO |
Logging level: DEBUG / INFO / WARNING / ERROR |
LLM Configuration
Using Ollama (default)
video-analyzer-tune \
--training-data training_data.json \
--output-dir tuned_prompts/ \
--model llama3.2-vision
Using an OpenAI-compatible API (e.g. OpenRouter)
video-analyzer-tune \
--training-data training_data.json \
--output-dir tuned_prompts/ \
--client openai_api \
--model meta-llama/llama-3.2-11b-vision-instruct \
--api-url https://openrouter.ai/api/v1 \
--api-key YOUR_API_KEY
How It Works
video-analyzer uses two prompt files:
frame_analysis.txt— called once per frame with the image and all previous frame notes. Produces the per-frame observation log.describe.txt— called once at the end with all frame notes and the audio transcript. Produces the final video description.
video-analyzer-tune wraps both prompts in a DSPy pipeline that mirrors the exact processing logic of video-analyzer. It then runs MIPROv2 — a Bayesian optimizer that generates candidate instruction variations and scores them against your training examples.
Scoring uses an LLM-as-judge approach: the same model evaluates how well the generated output matches your ideal examples on a 1–5 scale. Frame note quality and final description quality are combined using the configurable --description-weight.
After optimization, the improved instruction text is written into new .txt files that preserve all the {TOKEN} placeholders ({PREVIOUS_FRAMES}, {FRAME_NOTES}, etc.) that video-analyzer uses for its string replacement — making the output files drop-in compatible.
Tips for Better Results
- Use multiple videos. Even 3–5 diverse examples significantly improves optimization quality.
- Edit frame notes too. If you only edit the final description, the optimizer has less signal about what good intermediate analysis looks like.
- Be specific in your edits. The more clearly your ideal examples demonstrate the style and focus you want, the better the optimizer can learn from them.
- Use the same model for tuning as for inference. The optimized prompts are tuned to the specific model's behavior.
- Increase
--num-candidatesand--num-trialsfor better results if you have the time. Start with defaults and increase from there. - Use
--description-weight 0.5if you read the frame notes directly and care as much about their quality as the final description.
License
Apache License 2.0 — same as video-analyzer.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file video_analyzer_tune-0.1.0.tar.gz.
File metadata
- Download URL: video_analyzer_tune-0.1.0.tar.gz
- Upload date:
- Size: 23.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a2244b865a54a78d534a7abe66800a7af9274aa7f11af3b09e139f3e0bc5782
|
|
| MD5 |
bcd025de8ed2cb65baf392fec0a73a9d
|
|
| BLAKE2b-256 |
2182ca9a9f7c7dff092ef5cccac76a47d805344908afd37c9e86a1bfd0e85dd6
|
File details
Details for the file video_analyzer_tune-0.1.0-py3-none-any.whl.
File metadata
- Download URL: video_analyzer_tune-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ced4091ec7572acb06c59a404ab1e03061ea6c80b544cd762b73b6e56653d62
|
|
| MD5 |
427a3ece98736e13cd46c951233d670f
|
|
| BLAKE2b-256 |
e39cc9a65b24008953eddbcdb97ee1622a7ac6497a600ce127a6004fe6a616c8
|