MCP server that turns any video (file, URL, or Jira attachment) into frames + transcript for Claude.
Project description
video-vision-mcp
An MCP server that gives Claude Code the ability to analyze any video — a local file, a URL, or a Jira ticket attachment — through one set of tools.
Claude can't watch video natively (only text + the first frame of an image).
This server converts a video into sampled frame images + an audio transcript,
or — when a Gemini key is present — a native Gemini analysis of the whole
video. It works alongside mcp-atlassian
and shares its .env.
Scenario: open a Jira ticket with a video bug report → one command (
analyze_video jira_issue_key=DEV-123) → you see the frames and the transcript (or Gemini's analysis if a key is configured), without juggling two MCP servers.
Three backend tiers (auto-selected)
| Tier | Needs | What it does |
|---|---|---|
| 1 — local (default) | nothing | ffmpeg frames + whisper.cpp transcript. Free, fully local, always works. |
| 2 — cloud ASR | OPENAI_API_KEY or GROQ_API_KEY |
Local frames, but transcription via OpenAI Whisper / Groq for higher quality. |
| 3 — native Gemini | GEMINI_API_KEY |
Gemini ingests the whole video (visual + audio) in one call, with MM:SS timestamps. Default when the key is set. |
Precedence: Gemini > OpenAI > Groq > local. Set VIDEO_MCP_DISABLE_GEMINI=true
to force tiers 1/2 even with a Gemini key. The backend used is named in every result.
Privacy: tier 1 never uploads anything. Tiers 2/3 print a one-time notice in the session the first time video content is sent to a third party.
Tools
analyze_video— frames + transcript + metadata (the main tool).get_video_transcript_only— transcript text only.extract_frames_at— frames at specific timestamps ("00:42","1:05",12.5).list_recent_analyses— cached analyses + backend used.compare_backends— same video via tier 1 and tier 3 side by side.
Install
Requires Python ≥ 3.10. A single install pulls everything — backends, plus the ffmpeg and whisper.cpp dependencies. Nothing is ever installed globally on your machine (no brew/apt/winget, no sudo).
Use it (recommended)
With uv you don't install it explicitly — uvx runs
the published package on demand (see Register in Claude Code).
To install into an environment instead:
uv pip install video-vision-mcp # or: pip install video-vision-mcp
From source (development)
git clone https://github.com/KitDevUA/video-vision-mcp.git
cd video-vision-mcp
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]" # all backends bundled
Dependencies — fully self-contained
- ffmpeg / ffprobe: if they are already on your
PATH, those system binaries are used. Otherwise the bundledstatic-ffmpegpackage supplies them (fetched once into its own local cache — never a system-wide install). - whisper.cpp (tier 1 transcription): shipped as the bundled
pywhispercppbinding (prebuilt wheels; builds from source only if no wheel exists for your platform/Python). Awhisper-clialready onPATHis used if present. - whisper model: the ggml model (
baseby default) downloads from Hugging Face into the cache on first transcription. Override withVIDEO_MCP_WHISPER_MODEL(tiny/base/small/medium/large-v3) orVIDEO_MCP_WHISPER_MODEL_PATH. - cloud-only: set
OPENAI_API_KEY/GROQ_API_KEY(tier 2) orGEMINI_API_KEY(tier 3); whisper.cpp is then never invoked.
Configure
cp env.example .env
# edit .env — nothing is required for tier 1
See env.example for every variable. The .env format matches mcp-atlassian,
so Jira creds (JIRA_URL / JIRA_USERNAME / JIRA_API_TOKEN) can be shared.
Register in Claude Code
Add to your project .mcp.json (or global config), next to mcp-atlassian — see
.mcp.json.example:
{
"mcpServers": {
"video-vision": {
"command": "uvx",
"args": ["video-vision-mcp"],
"env": { "VIDEO_MCP_ENV": "/abs/path/to/.env" }
}
}
}
uvx downloads and runs the published package automatically — no manual install
step. VIDEO_MCP_ENV is optional (tier 1 needs no keys); point it at your .env
if you use Jira or cloud backends. For local development against a checkout, use
"args": ["--from", "/abs/path/to/video-vision-mcp", "video-vision-mcp"] instead.
Restart Claude Code; the video-vision tools then appear.
Cache
Results are cached at ~/.cache/video-vision-mcp/ keyed by (file hash,
backend) — re-analyzing the same video is instant, and switching backends
keeps each result separately. Downloaded URLs/Jira files and whisper models live
under the same dir. Override with VIDEO_MCP_CACHE_DIR.
How it fits with mcp-atlassian
mcp-atlassian can download a Jira attachment but can't analyze it. This server
takes over from there: pass jira_issue_key and it fetches the attachment over
Jira REST itself (same creds), so you stay in one tool call. If the Jira token is
missing/invalid you get a clear error pointing at .env, not a silent failure.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file video_vision_mcp-0.1.0.tar.gz.
File metadata
- Download URL: video_vision_mcp-0.1.0.tar.gz
- Upload date:
- Size: 157.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c93daaba1bce00c45ceedda3033ed7c34f9680c8aba9c8fb1845f4bd54f42afd
|
|
| MD5 |
996378a3c9fa7fbcff6411392d88a828
|
|
| BLAKE2b-256 |
4af1ecb49b8dff086c5e103add07a90cf938da386b0df430c5b2626d6bba6c96
|
Provenance
The following attestation bundles were made for video_vision_mcp-0.1.0.tar.gz:
Publisher:
publish.yml on KitDevUA/video-vision-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
video_vision_mcp-0.1.0.tar.gz -
Subject digest:
c93daaba1bce00c45ceedda3033ed7c34f9680c8aba9c8fb1845f4bd54f42afd - Sigstore transparency entry: 2025085372
- Sigstore integration time:
-
Permalink:
KitDevUA/video-vision-mcp@841d7c8b9092cdab3c84ae9a32d54954aa08b87f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/KitDevUA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@841d7c8b9092cdab3c84ae9a32d54954aa08b87f -
Trigger Event:
release
-
Statement type:
File details
Details for the file video_vision_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: video_vision_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c86aa82deb9d22dc4369429eceeae36a78703b8cf680d80a01ea1c73a65c266
|
|
| MD5 |
0d548542439a80ba25d5723df738e957
|
|
| BLAKE2b-256 |
bea2165be603c0d5b2edefe1c000a829378fe855b502f62dfb3a01a5eac91ffa
|
Provenance
The following attestation bundles were made for video_vision_mcp-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on KitDevUA/video-vision-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
video_vision_mcp-0.1.0-py3-none-any.whl -
Subject digest:
5c86aa82deb9d22dc4369429eceeae36a78703b8cf680d80a01ea1c73a65c266 - Sigstore transparency entry: 2025085558
- Sigstore integration time:
-
Permalink:
KitDevUA/video-vision-mcp@841d7c8b9092cdab3c84ae9a32d54954aa08b87f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/KitDevUA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@841d7c8b9092cdab3c84ae9a32d54954aa08b87f -
Trigger Event:
release
-
Statement type: