Skip to main content

MCP server that turns any video (file, URL, or Jira attachment) into frames + transcript for Claude.

Project description

video-vision-mcp

CI Python License: MIT

An MCP server that gives Claude Code the ability to analyze any video — a local file, a URL, or a Jira ticket attachment — through one set of tools.

Claude can't watch video natively (only text + the first frame of an image). This server converts a video into sampled frame images + an audio transcript, or — when a Gemini key is present — a native Gemini analysis of the whole video. It works alongside mcp-atlassian and shares its .env.

Scenario: open a Jira ticket with a video bug report → one command (analyze_video jira_issue_key=DEV-123) → you see the frames and the transcript (or Gemini's analysis if a key is configured), without juggling two MCP servers.

Three backend tiers (auto-selected)

Tier Needs What it does
1 — local (default) nothing ffmpeg frames + whisper.cpp transcript. Free, fully local, always works.
2 — cloud ASR OPENAI_API_KEY or GROQ_API_KEY Local frames, but transcription via OpenAI Whisper / Groq for higher quality.
3 — native Gemini GEMINI_API_KEY Gemini ingests the whole video (visual + audio) in one call, with MM:SS timestamps. Default when the key is set.

Precedence: Gemini > OpenAI > Groq > local. Set VIDEO_MCP_DISABLE_GEMINI=true to force tiers 1/2 even with a Gemini key. The backend used is named in every result.

Privacy: tier 1 never uploads anything. Tiers 2/3 print a one-time notice in the session the first time video content is sent to a third party.

Tools

  • analyze_video — frames + transcript + metadata (the main tool).
  • get_video_transcript_only — transcript text only.
  • extract_frames_at — frames at specific timestamps ("00:42", "1:05", 12.5).
  • list_recent_analyses — cached analyses + backend used.
  • compare_backends — same video via tier 1 and tier 3 side by side.

Install

Requires Python ≥ 3.10. A single install pulls everything — backends, plus the ffmpeg and whisper.cpp dependencies. Nothing is ever installed globally on your machine (no brew/apt/winget, no sudo).

Use it (recommended)

With uv you don't install it explicitly — uvx runs the published package on demand (see Register in Claude Code). To install into an environment instead:

uv pip install video-vision-mcp     # or: pip install video-vision-mcp

From source (development)

git clone https://github.com/KitDevUA/video-vision-mcp.git
cd video-vision-mcp
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"          # all backends bundled

Dependencies — fully self-contained

  • ffmpeg / ffprobe: if they are already on your PATH, those system binaries are used. Otherwise the bundled static-ffmpeg package supplies them (fetched once into its own local cache — never a system-wide install).
  • whisper.cpp (tier 1 transcription): shipped as the bundled pywhispercpp binding (prebuilt wheels; builds from source only if no wheel exists for your platform/Python). A whisper-cli already on PATH is used if present.
  • whisper model: the ggml model (base by default) downloads from Hugging Face into the cache on first transcription. Override with VIDEO_MCP_WHISPER_MODEL (tiny/base/small/medium/large-v3) or VIDEO_MCP_WHISPER_MODEL_PATH.
  • cloud-only: set OPENAI_API_KEY / GROQ_API_KEY (tier 2) or GEMINI_API_KEY (tier 3); whisper.cpp is then never invoked.

Configure

cp env.example .env
# edit .env — nothing is required for tier 1

See env.example for every variable. The .env format matches mcp-atlassian, so Jira creds (JIRA_URL / JIRA_USERNAME / JIRA_API_TOKEN) can be shared.

Register in Claude Code

Add to your project .mcp.json (or global config), next to mcp-atlassian — see .mcp.json.example:

{
  "mcpServers": {
    "video-vision": {
      "command": "uvx",
      "args": ["video-vision-mcp"],
      "env": { "VIDEO_MCP_ENV": "/abs/path/to/.env" }
    }
  }
}

uvx downloads and runs the published package automatically — no manual install step. VIDEO_MCP_ENV is optional (tier 1 needs no keys); point it at your .env if you use Jira or cloud backends. For local development against a checkout, use "args": ["--from", "/abs/path/to/video-vision-mcp", "video-vision-mcp"] instead. Restart Claude Code; the video-vision tools then appear.

Cache

Results are cached at ~/.cache/video-vision-mcp/ keyed by (file hash, backend) — re-analyzing the same video is instant, and switching backends keeps each result separately. Downloaded URLs/Jira files and whisper models live under the same dir. Override with VIDEO_MCP_CACHE_DIR.

How it fits with mcp-atlassian

mcp-atlassian can download a Jira attachment but can't analyze it. This server takes over from there: pass jira_issue_key and it fetches the attachment over Jira REST itself (same creds), so you stay in one tool call. If the Jira token is missing/invalid you get a clear error pointing at .env, not a silent failure.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

video_vision_mcp-0.1.0.tar.gz (157.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

video_vision_mcp-0.1.0-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file video_vision_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: video_vision_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 157.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for video_vision_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c93daaba1bce00c45ceedda3033ed7c34f9680c8aba9c8fb1845f4bd54f42afd
MD5 996378a3c9fa7fbcff6411392d88a828
BLAKE2b-256 4af1ecb49b8dff086c5e103add07a90cf938da386b0df430c5b2626d6bba6c96

See more details on using hashes here.

Provenance

The following attestation bundles were made for video_vision_mcp-0.1.0.tar.gz:

Publisher: publish.yml on KitDevUA/video-vision-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file video_vision_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for video_vision_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c86aa82deb9d22dc4369429eceeae36a78703b8cf680d80a01ea1c73a65c266
MD5 0d548542439a80ba25d5723df738e957
BLAKE2b-256 bea2165be603c0d5b2edefe1c000a829378fe855b502f62dfb3a01a5eac91ffa

See more details on using hashes here.

Provenance

The following attestation bundles were made for video_vision_mcp-0.1.0-py3-none-any.whl:

Publisher: publish.yml on KitDevUA/video-vision-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page