Skip to main content

DSPy-based prompt optimizer for video-analyzer

Project description

video-analyzer-tune

DSPy-based prompt optimizer for video-analyzer.

Automatically improves the two prompts that video-analyzer uses — the per-frame analysis prompt and the final video reconstruction prompt — based on examples of what good output looks like for your specific content and use case.

Overview

video-analyzer works in two stages: it analyzes each video frame individually (building up a running log of observations), then synthesizes all the frame notes into a final video description. Both stages are driven by prompt files that you can customize.

video-analyzer-tune uses DSPy MIPROv2 to optimize both prompts end-to-end. You provide a few examples of what ideal output looks like — both at the frame level and the final description level — and the tuner finds better prompt instructions automatically.

The main video-analyzer package is not affected in any way. Tuned prompts are written as new files that you point to via your config.

Requirements

  • Python 3.8+
  • video-analyzer >= 0.1.1
  • An Ollama instance with a vision model, or an OpenAI-compatible API

Installation

pip install video-analyzer-tune

Quick Start

Step 1 — Generate output with frames kept

Run video-analyzer on a representative video and keep the extracted frames:

video-analyzer my_video.mp4 --keep-frames

This produces an output/ directory containing:

  • analysis.json — frame-by-frame notes and the final description
  • frames/ — the extracted frame images

Step 2 — Edit analysis.json with your ideal output

Open output/analysis.json and edit two things:

Required: Edit video_description.response to show what the ideal final description looks like for your use case.

Recommended: Edit each frame_analyses[i].response to show what ideal frame notes look like. This gives the optimizer a signal at both stages of the pipeline and produces better results.

{
  "frame_analyses": [
    {
      "frame": 0,
      "timestamp": 0.0,
      "response": "Your ideal frame note here — what details matter for your use case"
    }
  ],
  "video_description": {
    "response": "Your ideal final description here — the style, length, and focus you want"
  }
}

The more videos you edit and include as training examples, the better the results.

Step 3 — Create training_data.json

{
  "examples": [
    { "output_dir": "output" }
  ]
}

Add one entry per video you edited:

{
  "examples": [
    { "output_dir": "output/video1" },
    { "output_dir": "output/video2" },
    { "output_dir": "output/video3" }
  ]
}

Step 4 — Run the tuner

video-analyzer-tune --training-data training_data.json --output-dir tuned_prompts/

This runs MIPROv2 optimization, which will take some time depending on --num-candidates and --num-trials.

Step 5 — Update your config

When tuning completes, the tool prints a config snippet to paste into your config/config.json:

"prompt_dir": "tuned_prompts",
"prompts": [
  {"name": "Frame Analysis", "path": "frame_analysis_tuned.txt"},
  {"name": "Video Reconstruction", "path": "describe_tuned.txt"}
]

Run video-analyzer as normal — it will use your tuned prompts automatically.

Training Data Format

training_data.json

{
  "examples": [
    { "output_dir": "path/to/output" }
  ]
}

Paths can be absolute or relative to the location of training_data.json.

What to edit in analysis.json

Field Required Description
video_description.response Yes Your ideal final video description
frame_analyses[i].response Recommended Your ideal frame note for each frame
prompt No Leave as-is
transcript No Leave as-is

CLI Reference

Flag Default Description
--training-data required Path to training_data.json
--output-dir tuned_prompts Directory to write tuned prompt files
--client ollama LLM client: ollama or openai_api
--model llama3.2-vision Vision model to use for optimization runs
--ollama-url http://localhost:11434 Ollama server URL
--api-key API key (required when --client openai_api)
--api-url API endpoint URL (required when --client openai_api)
--num-candidates 10 Number of prompt variations generated per module. Higher = more thorough but slower. Range: 5–20
--num-trials 20 Number of optimization trials. Higher = better results but slower. Range: 10–50
--max-bootstrapped-demos 3 Max few-shot examples generated by bootstrapping
--max-labeled-demos 4 Max few-shot examples taken from your training data
--description-weight 0.7 How much the final description quality influences the score (0.0–1.0). The remainder weights frame analysis quality. Use 0.5 if you care equally about both; use 1.0 to optimize only for the final description
--log-level INFO Logging level: DEBUG / INFO / WARNING / ERROR

LLM Configuration

Using Ollama (default)

video-analyzer-tune \
  --training-data training_data.json \
  --output-dir tuned_prompts/ \
  --model llama3.2-vision

Using an OpenAI-compatible API (e.g. OpenRouter)

video-analyzer-tune \
  --training-data training_data.json \
  --output-dir tuned_prompts/ \
  --client openai_api \
  --model meta-llama/llama-3.2-11b-vision-instruct \
  --api-url https://openrouter.ai/api/v1 \
  --api-key YOUR_API_KEY

How It Works

video-analyzer uses two prompt files:

  1. frame_analysis.txt — called once per frame with the image and all previous frame notes. Produces the per-frame observation log.
  2. describe.txt — called once at the end with all frame notes and the audio transcript. Produces the final video description.

video-analyzer-tune wraps both prompts in a DSPy pipeline that mirrors the exact processing logic of video-analyzer. It then runs MIPROv2 — a Bayesian optimizer that generates candidate instruction variations and scores them against your training examples.

Scoring uses an LLM-as-judge approach: the same model evaluates how well the generated output matches your ideal examples on a 1–5 scale. Frame note quality and final description quality are combined using the configurable --description-weight.

After optimization, the improved instruction text is written into new .txt files that preserve all the {TOKEN} placeholders ({PREVIOUS_FRAMES}, {FRAME_NOTES}, etc.) that video-analyzer uses for its string replacement — making the output files drop-in compatible.

Tips for Better Results

  • Use multiple videos. Even 3–5 diverse examples significantly improves optimization quality.
  • Edit frame notes too. If you only edit the final description, the optimizer has less signal about what good intermediate analysis looks like.
  • Be specific in your edits. The more clearly your ideal examples demonstrate the style and focus you want, the better the optimizer can learn from them.
  • Use the same model for tuning as for inference. The optimized prompts are tuned to the specific model's behavior.
  • Increase --num-candidates and --num-trials for better results if you have the time. Start with defaults and increase from there.
  • Use --description-weight 0.5 if you read the frame notes directly and care as much about their quality as the final description.

License

Apache License 2.0 — same as video-analyzer.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

video_analyzer_tune-0.1.0.tar.gz (23.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

video_analyzer_tune-0.1.0-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file video_analyzer_tune-0.1.0.tar.gz.

File metadata

  • Download URL: video_analyzer_tune-0.1.0.tar.gz
  • Upload date:
  • Size: 23.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for video_analyzer_tune-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7a2244b865a54a78d534a7abe66800a7af9274aa7f11af3b09e139f3e0bc5782
MD5 bcd025de8ed2cb65baf392fec0a73a9d
BLAKE2b-256 2182ca9a9f7c7dff092ef5cccac76a47d805344908afd37c9e86a1bfd0e85dd6

See more details on using hashes here.

File details

Details for the file video_analyzer_tune-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for video_analyzer_tune-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ced4091ec7572acb06c59a404ab1e03061ea6c80b544cd762b73b6e56653d62
MD5 427a3ece98736e13cd46c951233d670f
BLAKE2b-256 e39cc9a65b24008953eddbcdb97ee1622a7ac6497a600ce127a6004fe6a616c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page