Local video search and media analysis: ffprobe metadata, frame sampling, contact sheets, subtitle search, clip export, and pluggable person/object detection.
Project description
reelgrep
Local video search and media analysis. Find people, outfits, objects, scenes, spoken phrases, and useful clips inside your own video library.
Status: v0.4.0. Align command shipped: when a clean official transcript exists (PDF / TXT / MD) you can map it onto the Whisper timestamps to get accurate timing PLUS the institution's exact text. Web UI, transcription, and the TypeScript MCP wrapper (separate repo) all still work as before.
reelgrep indexes video files on disk: it runs ffprobe for metadata, samples frames at a configurable interval, builds contact sheets, extracts embedded and sidecar subtitles into a SQLite FTS5 table you can grep, and exports clips, screenshots, and animated WebP loops with a JSON manifest sidecar for each output. Person and object detection is pluggable, with a face-embedding backend and an Ollama vision-LLM backend; both accept confirmed positive AND negative reference images so lookalikes do not slip through. Everything runs on your machine - frames, clips, manifests, and the index all stay on disk under your home directory by default, and nothing leaves the box without you configuring it to.
Why
- Lectures and conference talks: jump to the moment the speaker discussed X.
- Personal and family videos: find clips of a specific person across years of footage.
- TV episodes and movies: build contact sheets, cut highlight reels, export GIFs.
- Training and onboarding videos: extract slides, search transcripts, build searchable archives.
- All processing is local. Frames, clips, manifests, and the index stay on disk under your home directory.
Install
# CLI only (no model deps; subtitle search + ingest + export work):
pipx install reelgrep
# With local face recognition (insightface + onnxruntime, ~300MB on first model load):
pipx install "reelgrep[face]"
# With Ollama vision-LLM backend (httpx; assumes ollama serve is reachable):
pipx install "reelgrep[vision]"
# With local Whisper transcription (faster-whisper + ctranslate2, ~75MB-2.9GB depending on model):
pipx install "reelgrep[whisper]"
# With the local browser UI (starlette + uvicorn):
pipx install "reelgrep[web]"
# With prose-transcript alignment (pypdf + rapidfuzz, ~1MB):
pipx install "reelgrep[align]"
# Everything:
pipx install "reelgrep[face,vision,whisper,web,align]"
System dependency: ffmpeg and ffprobe must be on PATH. On Ubuntu:
sudo apt install ffmpeg
On macOS:
brew install ffmpeg
Quickstart
Ingest a video
reelgrep ingest ~/Videos/some-talk.mp4
Output:
ingested: /home/you/Videos/some-talk.mp4
hash: blake2b:8f3a91c4e6b7d2a05f1c4e6b7d2a05f1c4e6b7d2a05f1c4e6b7d2a05f1c4e6b7
duration: 00:42:18.500
subtitle tracks: 1 (cues: 482)
frames sampled: 508
db: /home/you/.local/share/reelgrep/index.sqlite
Ingest probes the file, samples one frame every five seconds by default (--every 5), pulls any embedded subtitle streams plus matching .srt/.vtt sidecars, and writes everything into the local index. Re-running on the same file is a no-op unless you pass --force.
Search what was said
reelgrep search-subtitles ~/Videos/some-talk.mp4 "kubernetes"
Output:
00:04:12.300 so this is where kubernetes comes in
00:11:45.880 kubernetes scheduling is fundamentally a bin-packing problem
00:27:03.120 the kubernetes control plane has five core components
3 matches
Requires either embedded subtitles, a sidecar .srt/.vtt next to the video, or Whisper-transcribed cues (see next section). The [whisper] extra adds local transcription so screen recordings, lectures, and other un-captioned videos become searchable.
Transcribe a video without subtitles
reelgrep transcribe ~/Videos/lecture.mp4 --model tiny
Output:
transcribing lecture.mp4 with whisper:tiny...
transcribed: /home/you/Videos/lecture.mp4
language: en
model: whisper:tiny
cues: 165
span: 00:00:01.460 -> 00:18:44.070
Real numbers from an 18-minute 720p lecture screen recording: tiny model finishes in ~26 seconds on CPU and produces searchable cues. Larger models (small, medium, large-v3, large-v3-turbo) trade speed for accuracy. After transcribing, reelgrep search-subtitles works against the new cues immediately.
The cues are stored alongside any embedded or sidecar subtitles with source='whisper', so the index treats them uniformly. Re-running transcribe on the same video is a no-op unless you pass --force. Pass --no-db to print the cues as JSON to stdout instead of writing to the index.
You can also fold transcription into ingest itself: reelgrep ingest ~/lecture.mp4 --transcribe runs Whisper only when the normal embedded/sidecar pass finds nothing.
Align an official transcript onto Whisper timestamps
If the institution ships a clean prose transcript next to the video (Canvas / Kaltura courses, conference talk hosts that post the speaker's text afterwards), Whisper's transcription is the wrong source of truth - the official transcript is cleaner and uses correct terminology. reelgrep align maps the official text onto the Whisper-derived timestamps so you keep accurate timing AND the canonical wording.
reelgrep align ~/Videos/lecture.mp4 --transcript ~/Videos/lecture_transcript.pdf --out lecture.srt
Output:
aligned: /home/you/Videos/lecture.mp4
transcript: /home/you/Videos/lecture_transcript.pdf
language: en
cues: 221 (matched 2631/2650 transcript words, coverage 99.3%)
avg similarity: 0.98
srt: /home/you/lecture.srt
Real numbers from an 18-minute USF lecture aligned against the course's official PDF transcript: 221 cues, 99.3% coverage of transcript words, 0.98 average similarity. The aligned cues preserve official terminology ("module one" vs Whisper's "module 1"), proper punctuation, and capitalization that Whisper either drops or mis-spells.
Accepts .txt, .md, .pdf transcripts. Auto-runs whisper:tiny if no cues exist for the video yet, so the typical flow is one-shot. Cues land in the subtitles table with source='aligned' so they coexist with whisper, embedded, and sidecar sources. The optional --out file.srt writes a standard SRT file you can hand to a video player.
Cues whose similarity to the transcript falls below --min-similarity (default 0.55) keep their original Whisper text rather than being fabricated - if the transcript doesn't actually match the audio for a stretch (Q&A inserted, slide change, etc.), the engine refuses to invent alignment.
Browse the whole library in a local web UI
reelgrep serve
Opens http://127.0.0.1:8765/ in your default browser. The UI surfaces every ingested video in a sidebar, every cue (embedded, sidecar, or Whisper) in a searchable Subtitles tab per video, every sampled frame in a paginated grid with a lightbox, every person-search result with thumbnails grouped by confidence, and every export artifact with its manifest sidecar link.
The headline feature is the search bar in the header: type a phrase once and the UI fans out FTS5 queries across every video in the index, then groups the hits by video. Clicking a hit jumps you straight into that video's Subtitles tab with the term highlighted. With 36 lectures transcribed via whisper:small, a single query against "database" returns the full hit list across the semester in under a second.
The server binds to loopback only by default (--host 127.0.0.1), reads exclusively from the local SQLite index, and serves frame and export files via an allow-list (paths must already be referenced in the index - it is not a general filesystem proxy). Pass --no-open-browser to skip the auto-launch, --port to change the port, and --reload for frontend development.
Find a specific person
reelgrep find-person ~/Videos/some-talk.mp4 \
--label speaker_a \
--positive ~/refs/speaker_a/headshot1.jpg \
--positive ~/refs/speaker_a/headshot2.jpg \
--negative ~/refs/false_positives/looks_similar_but_isnt.jpg \
--out ./speaker_a_matches
Output:
label: speaker_a
backend: face_embed
threshold: 0.3
matches: 12 / 25 (showing top 12)
00:00:14.500 conf 0.71 face cosine 0.71 vs centroid; margin 0.18 over nearest negative
00:00:42.000 conf 0.68 face cosine 0.68 vs centroid; margin 0.15 over nearest negative
...
manifest: /home/you/speaker_a_matches/find-person.manifest.json
exports: /home/you/speaker_a_matches (12 files)
Why negatives matter
Face matching from cast or speaker headshots alone is unreliable - lookalikes, twins, family members, and similar-looking people in similar settings will all score high. reelgrep treats matching as a precision-over-recall job: positive examples anchor the search, and negative examples (a known false positive caught in a prior run, a sibling's photo, a stock image of the same demographic) push lookalikes below the acceptance threshold. The default backend uses cosine distance to the positive centroid minus the nearest negative cosine; the more negatives you provide, the fewer false positives you get back.
Cut a sub-clip
reelgrep export-clip ~/Videos/some-talk.mp4 --start 0:10:00 --end 0:10:30 --out highlight.mp4
Stream-copies by default (fast, no re-encode). Pass --reencode if the source codec or container is awkward for downstream tools. Writes highlight.mp4 plus highlight.mp4.manifest.json next to it.
Build a contact sheet
reelgrep contact-sheet ~/Videos/some-talk.mp4 --out sheet.jpg --cols 6 --every 30
Samples a frame every 30 seconds, lays them out in a 6-column grid, writes sheet.jpg plus a manifest. Pass --use-cached to reuse frames already sampled during ingest instead of re-sampling.
Render a webp loop
reelgrep make-gif ~/Videos/some-talk.mp4 --start 0:10:00 --duration 5 --out highlight.webp
The output is animated WebP, not GIF - smaller files, better quality, supported in modern browsers and chat clients. Defaults: 12 fps, 480px wide. Tune with --fps and --width.
Backends
reelgrep separates "where is the video file?" from "what do I want to do with it?" via a small backend layer:
- local (default): pass a file path, it is used directly.
- jellyfin: resolve a Jellyfin item name or 32-hex
ItemIdto its local file path via the Jellyfin HTTP API, then pipe into other commands. Configured viaJELLYFIN_URLandJELLYFIN_API_KEY(same names as thejellyfin-mcpproject).
Example:
export JELLYFIN_URL=http://jellyfin.local:8096
export JELLYFIN_API_KEY=<key>
reelgrep jellyfin resolve "Talk: Container Networking" | xargs -I {} reelgrep ingest {}
Person and visual search models
Two backends ship in v0.1.0, both pluggable, both opt-in via extras:
- face_embed (default,
[face]extra): insightface ArcFace 512-dim embeddings. Fast on CPU, deterministic, well-suited to face matching with the positive/negative anchor pattern. Default acceptance threshold0.30(cosine margin over nearest negative). - ollama_vision (
[vision]extra): per-frame chat against a local Ollama vision model (defaultqwen2-vl:7b, configurable viaOLLAMA_VISION_MODEL). Slower per frame, but handles "find frames where the speaker is wearing a red jacket" or "find shots of the building exterior" - kinds of queries that pure face embeddings cannot answer. Default acceptance threshold0.65.
Switch engines with --backend ollama_vision on the find-person command. Both engines accept the same --positive / --negative / --threshold / --top-k flags.
Storage and privacy
- The index database lives at
~/.local/share/reelgrep/index.sqliteby default. Override withREELGREP_HOMEorREELGREP_DB. - Sampled frames cache to
~/.local/share/reelgrep/cache/frames/<hash>/and subtitles to~/.local/share/reelgrep/cache/subtitles/<hash>/. - Every export (clip, gif, screenshot, contact sheet) writes a JSON manifest sidecar next to it with the parameters and source hash so outputs are reproducible.
- No telemetry. No background network calls. The Ollama backend talks to the Ollama URL you configure (default
http://127.0.0.1:11434). The Jellyfin adapter talks only to the URL you set. Everything else stays local. - You are responsible for confirming you have the rights to analyze and store frames, clips, and derived data from the videos you process.
Configuration reference
| Variable | Default | Description |
|---|---|---|
REELGREP_HOME |
~/.local/share/reelgrep |
Root directory for the index and cache. |
REELGREP_DB |
<home>/index.sqlite |
Override the SQLite index path independently of REELGREP_HOME. |
REELGREP_CACHE |
<home>/cache |
Override the frame / subtitle cache directory. |
REELGREP_FFMPEG |
ffmpeg |
Path or name of the ffmpeg binary to invoke. |
REELGREP_FFPROBE |
ffprobe |
Path or name of the ffprobe binary to invoke. |
JELLYFIN_URL |
(unset) | Base URL of a Jellyfin server for the jellyfin backend. |
JELLYFIN_API_KEY |
(unset) | API key for the Jellyfin server. |
OLLAMA_URL |
http://127.0.0.1:11434 |
Base URL of the Ollama server for the ollama_vision backend. |
OLLAMA_VISION_MODEL |
qwen2-vl:7b |
Model id Ollama should serve for vision requests. |
Commands
| Command | What it does |
|---|---|
reelgrep ingest <video> |
Probe, extract subtitles, sample frames, persist to the index. See Ingest a video. |
reelgrep info <hash-or-name> |
Print metadata + counts for an indexed video. |
reelgrep ls |
List indexed videos (most-recent first). |
reelgrep export-clip <video> --start --end --out |
Cut a sub-clip, stream-copy by default. See Cut a sub-clip. |
reelgrep make-gif <video> --start --duration --out |
Render an animated WebP loop. See Render a webp loop. |
reelgrep contact-sheet <video> --out |
Build a grid of thumbnails. See Build a contact sheet. |
reelgrep search-subtitles <video> <query> |
FTS5 search over indexed subtitle cues. See Search what was said. |
reelgrep transcribe <video> --model |
Whisper-transcribe and index cues for an un-captioned video. See Transcribe a video without subtitles. |
reelgrep align <video> --transcript <file> |
Map a clean prose transcript onto Whisper timestamps. See Align an official transcript onto Whisper timestamps. |
reelgrep find-person <video> --label --positive --out |
Locate frames containing a person. See Find a specific person. |
reelgrep serve [--port 8765] |
Open the local browser UI for the whole index. See Browse the whole library. |
reelgrep jellyfin resolve <query> |
Resolve a Jellyfin item to its local file path for piping. |
reelgrep --db PATH <subcommand> |
One-shot override for the index database path. |
Development
git clone https://github.com/solomonneas/reelgrep
cd reelgrep
python3 -m venv .venv
.venv/bin/pip install -e ".[dev,face,vision,whisper,web,align]"
.venv/bin/pytest
.venv/bin/ruff check .
Tests marked integration shell out to the real ffmpeg, ffprobe, and insightface stacks. To skip them in a quick local loop:
.venv/bin/pytest -m "not integration"
Roadmap
Queued for later releases:
- Writing thumbnails and chapters back to Jellyfin.
- Cross-video person clustering ("find all distinct faces in this whole library").
- True wav2vec2 word-level alignment for cases where cue-level timing isn't tight enough.
Shipped in v0.4.0: prose-transcript alignment (reelgrep align) via the [align] extra, plus a public Python library API (reelgrep.index.ingest_video) for embedders. See Align an official transcript onto Whisper timestamps. The MCP wrapper for agentic use also shipped as its own repo at solomonneas/reelgrep-mcp (npm install -g reelgrep-mcp).
Shipped in v0.3.0: local browser UI (reelgrep serve) backed by a Starlette JSON API + vanilla HTML/CSS/JS frontend, with cross-library subtitle search as the headline feature. See Browse the whole library in a local web UI.
Shipped in v0.2.0: local Whisper transcription via the [whisper] extra and the new reelgrep transcribe command. See Transcribe a video without subtitles.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reelgrep-0.4.0.tar.gz.
File metadata
- Download URL: reelgrep-0.4.0.tar.gz
- Upload date:
- Size: 117.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9370343a5ae27b613460b50c0e812f5fa660cc6d5421462f67705984c2cd8e2e
|
|
| MD5 |
d4ac5cc66398efd6ac2c3f1c28a9a119
|
|
| BLAKE2b-256 |
2cc4a7270be5ffe594d8f9d9d23b50806a11209f31e625188f319e0fc300d881
|
File details
Details for the file reelgrep-0.4.0-py3-none-any.whl.
File metadata
- Download URL: reelgrep-0.4.0-py3-none-any.whl
- Upload date:
- Size: 86.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ecca8bc896ace2e72118a60f97a3f3e0a83180048ad4b3b1822ed69371c0315
|
|
| MD5 |
8715b1e60a194b976def826cb68f44d3
|
|
| BLAKE2b-256 |
4d2fa454b21555c276be476836b9c649e4657c756e0a5b7a651b58b146c4685c
|