Evidence-grounded presentation-to-video pipeline. Every claim is traceable — no hallucinations.
Project description
SlideSherlock
SlideSherlock is an evidence-grounded pipeline that converts PowerPoint presentations into narrated explainer videos. Every narrated claim is traceable to specific slide content — no hallucinations, no invented facts.
Try it in 60 seconds
git clone https://github.com/sachinkg12/SlideSherlock.git
cd SlideSherlock
cp .env.example .env # Add OPENAI_API_KEY for AI narration (optional)
docker compose up -d # Start all 6 services
curl http://localhost:8000/health # → {"status": "ok"}
Then open http://localhost:8000/docs for the interactive API, or use the CLI:
slidesherlock run your_deck.pptx --preset draft -o output/
open output/final.mp4
No Docker? Use SQLite mode: DATABASE_URL=sqlite:///./slidesherlock.db slidesherlock run deck.pptx
Why SlideSherlock?
Existing slide-to-video tools either read bullet points verbatim or hallucinate content that doesn't exist in the source material. SlideSherlock solves this with three novel mechanisms:
-
Evidence Index — Every piece of PPTX content (text, shapes, images, connectors) receives a stable, content-addressable evidence ID (
SHA-256(job|slide|kind|offset)). All downstream narration must cite these IDs. -
Verifier Loop — A closed-loop control system: generate script → verify against evidence (PASS / REWRITE / REMOVE) → regenerate → re-verify until convergence. Not post-hoc filtering — inline verification with iterative correction.
-
Dual-Provenance Knowledge Graph — Two independent graphs are built per slide: G_native from PPT XML (shapes, connectors, groups) and G_vision from rendered PNGs + OCR. These merge into G_unified where each node carries provenance (NATIVE / VISION / BOTH), confidence scores, and
needs_reviewflags.
Features
| Feature | Description |
|---|---|
| 10-Stage Pipeline | Ingest → Evidence → Render → Graph → Script → Verify → Translate → Narrate → Audio → Video |
| AI Narration | Optional GPT-4o(-mini) rewrite: evidence-grounded template → natural presenter delivery (two-pass, hallucination-free) |
| Vision Understanding | Optional GPT-4o vision extractor for photo captions, diagram entities, and OCR — cached by image hash |
| Quality Presets | Draft (fast), Standard (subtitles + crossfade), Pro (vision AI + BGM + loudness normalization) |
| Multi-Language | Generate variants from one PPTX. Shared evidence and graphs; only language-dependent stages re-run |
| Web UI | React Mission Control dashboard: pipeline track, focus panel, activity feed, dark/light theme, color-blind safe |
| CLI | slidesherlock run deck.pptx --preset pro --ai-narration with structured JSON logging for experiments |
| Docker | docker compose up — one command for the full 6-service stack |
| 167 Tests | Automated test suite covering evidence grounding, verification, graph fusion, and pipeline stages |
Quick Start
Docker (recommended)
git clone https://github.com/sachinkg12/SlideSherlock.git
cd SlideSherlock
cp .env.example .env # Add your OPENAI_API_KEY (optional; stub used otherwise)
docker compose up # Starts API, worker, Postgres, Redis, MinIO, pgAdmin
- API: http://localhost:8000
- API Docs (interactive): http://localhost:8000/docs (Swagger UI) or http://localhost:8000/redoc (ReDoc)
- Web UI: http://localhost:3000 (if running
pnpm devinapps/web/)
Local Development
make setup # Create venv + install deps
make up # Start Postgres, Redis, MinIO (Docker)
make migrate # Run database migrations
make api # Start FastAPI server (port 8000)
make worker # Start pipeline worker (separate terminal)
CLI (no Redis/RQ needed)
slidesherlock run deck.pptx # Draft preset, output to ./output/
slidesherlock run deck.pptx --preset pro -o results/ # Pro preset, custom output
slidesherlock run deck.pptx --preset pro --ai-narration # Enable GPT-4o narration rewrite
slidesherlock run deck.pptx --preset standard --lang hi-IN # Add Hindi second-language variant
slidesherlock run deck.pptx --preset pro --dry-run # Metrics only, no audio/video
slidesherlock doctor # Check system dependencies
slidesherlock doctor --json # Machine-readable JSON output
Each CLI run produces metrics.json and run_log.json (structured log for experiment aggregation). Full runs also produce final.mp4. Use --dry-run for quick validation without video encode.
Local LLM (no API key needed)
SlideSherlock supports 10 OpenAI-compatible LLM providers. To use Ollama for fully local operation:
# Install and start Ollama
ollama pull llama3.1:8b # text model for narration
ollama pull llava:7b # vision model for image understanding
# Run SlideSherlock with Ollama
LLM_PROVIDER=ollama slidesherlock run deck.pptx --preset pro --ai-narration
Other supported providers: OpenAI, Groq, Together, OpenRouter, DeepInfra, Anyscale, LM Studio, vLLM, LocalAI. See packages/core/llm_config.py for the full registry.
Architecture
The pipeline follows the Open/Closed Principle: each stage implements a Stage protocol. Adding a new stage requires no changes to existing code — just add a class and register it.
class Stage(Protocol):
name: str
def run(self, ctx: PipelineContext) -> StageResult: ...
Pipeline Stages
| Stage | Key Modules | Output |
|---|---|---|
| Ingest | ppt_parser, image_extract, image_classifier |
ppt/slide_*.json, images/ |
| Evidence | evidence_index, photo_understand, diagram_understand |
evidence/index.json |
| Render | LibreOffice + pdf2image | render/deck.pdf, render/slides/*.png |
| Graph | native_graph, vision_graph, merge_engine |
graphs/unified/slide_*.json |
| Script | explain_plan, script_generator, script_context |
script/{variant}/script.json |
| Verify | verifier (closed-loop rewrite) |
verify_report.json, coverage.json |
| Translate | translator_provider (l2 variants only) |
Translated script + notes |
| Narrate | narrate (GPT-4o, optional) |
ai_narration.json |
| Audio | audio_prepare, tts_provider |
audio/{variant}/slide_*.wav |
| Video | timeline_builder, overlay_renderer, composer |
output/{variant}/final.mp4 |
No-Hallucination Design
The verifier checks every narration claim against the evidence index. Claims grounded in evidence PASS. Claims about visual content with low confidence get REWRITE (hedging language added, e.g., "appears to show"). Claims with no supporting evidence are REMOVEd — fabricated content is never narrated. The loop iterates until all claims converge.
AI Narration
SlideSherlock includes a dedicated NarrateStage that produces natural presenter-style narration without sacrificing the no-hallucination guarantee. It uses a two-pass design:
- Pass 1 — Evidence-grounded template. The deterministic script generator produces narration where every sentence cites evidence IDs. The verifier loop validates and rewrites until every claim is grounded.
- Pass 2 — Natural rewrite. GPT-4o(-mini) rewrites each verified segment for natural delivery, but is constrained to the validated factual content. No new claims can be introduced — the rewriter only changes phrasing, rhythm, and pronunciation.
NarrateStage uses requests directly (not the openai SDK) because the SDK's httpx transport deadlocks inside RQ-forked workers.
Configuration:
| Variable | Default | Purpose |
|---|---|---|
OPENAI_API_KEY |
(unset) | Required to enable AI narration |
LLM_PROVIDER |
stub |
Set to openai to activate, or use the UI/CLI flag |
NARRATE_MODEL |
gpt-4o-mini |
Override to gpt-4o for highest quality |
NARRATE_PARALLEL |
5 |
Concurrent rewrite calls (per slide) |
Cost estimates:
| Model | Cost per slide | 16-slide deck |
|---|---|---|
gpt-4o-mini (default) |
~$0.001 | ~$0.02 |
gpt-4o (full) |
~$0.01 | ~$0.16 |
Three ways to enable:
- Web UI — AI Narration toggle on the upload page
- CLI —
slidesherlock run deck.pptx --ai-narration - Env — export
LLM_PROVIDER=openaibefore starting the worker (withOPENAI_API_KEY)
Vision Understanding
The OpenAIVisionExtractor optionally enriches each slide with computer vision:
- Default: stub provider — no API calls, deterministic output, free
- Real vision: set
VISION_PROVIDER=openai+OPENAI_API_KEY - Default vision model:
gpt-4o-mini(override withOPENAI_VISION_MODEL=gpt-4o) - Vision cache: enabled by default; results cached by image SHA-256 in MinIO to avoid duplicate API calls across re-runs
For each slide image the vision extractor produces:
- Photo captions — natural-language description of photographic content
- Diagram entities — boxes, arrows, labels, and their spatial relationships
- OCR text — text rendered as images (chart labels, callouts, decorative type)
All vision-derived facts are written to the evidence index with IMAGE_* / DIAGRAM_* kinds, so the verifier can ground image-related claims to them.
Configuration
All configuration is environment-variable driven. Copy .env.example to .env for local development; for Docker, infra vars are pre-configured in docker-compose.yml.
LLM
| Variable | Default | Purpose |
|---|---|---|
OPENAI_API_KEY |
(unset) | API key for LLM, vision, and narration |
LLM_PROVIDER |
stub |
stub (deterministic) or openai (enables AI narration) |
NARRATE_MODEL |
gpt-4o-mini |
Narration rewrite model |
NARRATE_PARALLEL |
5 |
Parallel narration calls per slide |
Vision
| Variable | Default | Purpose |
|---|---|---|
VISION_PROVIDER |
stub |
stub or openai |
OPENAI_VISION_MODEL |
gpt-4o-mini |
Vision extractor model |
VISION_CACHE_ENABLED |
true |
Cache vision results by image hash |
VISION_PHOTO_CONFIDENCE |
0.6 |
Min confidence for photo captions |
VISION_DIAGRAM_CONFIDENCE |
0.6 |
Min confidence for diagram entities |
VISION_OCR_CONFIDENCE |
0.5 |
Min confidence for OCR text |
TTS
| Variable | Default | Purpose |
|---|---|---|
USE_SYSTEM_TTS |
false |
macOS system TTS via say (avoids pyttsx3 fork hang) |
AUDIO_VOICE_PROVIDER |
stub |
stub, system, or future cloud providers |
Video
| Variable | Default | Purpose |
|---|---|---|
VIDEO_TRANSITION |
cut |
cut or crossfade between slides |
VIDEO_INTRO_ENABLED |
false |
Show intro card |
VIDEO_OUTRO_ENABLED |
false |
Show outro card |
AUDIO_BGM_ENABLED |
false |
Background music bed |
AUDIO_BGM_DUCK_DB |
-18 |
dB to duck BGM under narration |
AUDIO_LOUDNESS_NORM |
false |
EBU R128 loudness normalization |
Subtitles
| Variable | Default | Purpose |
|---|---|---|
SUBTITLES_ENABLED |
false |
Generate .srt sidecar |
SUBTITLES_BURN_IN |
false |
Burn subtitles into the video frame |
Presets
| Variable | Default | Purpose |
|---|---|---|
SLIDESHERLOCK_PRESET |
draft |
draft, standard, or pro (sets all video/audio/subtitle/vision flags) |
API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/jobs/quick |
POST | Upload PPTX + create project + start pipeline (one step) |
/jobs |
POST | Create a job (advanced) |
/jobs/{id} |
GET | Job status |
/jobs/{id}/upload_pptx |
POST | Upload PPTX to existing job |
/jobs/{id}/progress |
GET | Per-stage progress for UI polling |
/jobs/{id}/metrics |
GET | Pipeline metrics (durations, counts, coverage) |
/jobs/{id}/evidence-trail |
GET | Live verifier decisions (PASS/REWRITE/REMOVE) |
/jobs/{id}/output/{variant}/final.mp4 |
GET | Stream video (HTTP Range support for seeking) |
/jobs/{id}/output/{path} |
GET | Stream any artifact under jobs/{id}/ from MinIO |
/projects |
POST | Create a project |
/projects/{id} |
GET | Get project |
/health |
GET | Health check |
Web UI
React 18 + TypeScript + Vite + Tailwind CSS + Framer Motion. Mission Control design with dark/light theme support.
cd apps/web && pnpm install && pnpm dev # http://localhost:3000
Note: the web app uses pnpm (not npm). It also auto-enters demo mode if the backend is unreachable, or visit
/?demo=true.
Three screens:
- Upload — Drag-drop PPTX, preset selector (draft/standard/pro), AI Narration toggle
- Progress — Mission Control: horizontal pipeline track + focus panel + activity feed + confetti on completion
- Result — Pipeline report, video player with seek-drag and volume slider, download buttons
Mission Control layout:
- PipelineTrack — horizontal flow of stage dots connected by animated lines; the active stage pulses
- FocusPanel — large card for the current stage with rotating icon, live status text, and per-stage metrics
- ActivityFeed — timestamped event log: stage start/finish events plus verifier verdicts (PASS/REWRITE/REMOVE)
- ThemeToggle — sun/moon button in the header; preference persisted to
localStorage - VideoPlayer — custom component with seek-drag, volume slider, and playback controls
Accessibility:
- Color-blind safe palette — blue/orange (not red/green), shape and icon indicators primary, color secondary (WCAG 1.4.1 compliant)
- Dark and light themes — full CSS custom-property theming
- Keyboard navigation — all interactive elements reachable via tab
The stage registry in apps/web/src/config/stages.ts follows the Open/Closed Principle: adding a new pipeline stage to the UI requires only one new entry — PipelineTrack, FocusPanel, and ActivityFeed all derive from this registry.
Quality Presets
| Feature | Draft | Standard | Pro |
|---|---|---|---|
| Vision (OpenAI) | Off | Off | On |
| AI Narration | Off (orthogonal) | Off (orthogonal) | Off (orthogonal) |
| Notes Overlay | Off | Off | Off |
| Transitions | Cut | Crossfade | Crossfade |
| Subtitles | On (sidecar) | On (sidecar) | On (sidecar) |
| Intro / Outro | Off | On | On |
| Background Music | Off | Off | On (ducked under narration) |
| Loudness Normalization | Off | EBU R128 | EBU R128 |
| Typical runtime (16 slides) | ~30s | ~60s | ~3 min |
AI Narration is orthogonal to presets — toggle it independently to enable the GPT-4o natural delivery rewrite on top of any preset.
SLIDESHERLOCK_PRESET=pro make worker # Pro preset
SLIDESHERLOCK_PRESET=draft make worker # Draft preset
Testing
make test # 167 tests across core pipeline and API
make lint # black --check + flake8 (max-line-length=100)
make doctor # Check LibreOffice, FFmpeg, Poppler, Tesseract
Batch Experiments
Run the pipeline on a corpus of PPTXs for paper data collection:
python scripts/batch_run.py /path/to/pptx_dir --preset draft --workers 3 --output results/
Produces batch_summary.json and batch_summary.csv (one row per file, stage timings as columns) for direct use in paper tables.
Deployment
Docker Compose (local or VM)
docker compose up
This starts 6 services:
| Service | Port | Purpose |
|---|---|---|
postgres |
5433 | Job + artifact metadata |
redis |
6379 | RQ job queue |
minio |
9000/9001 | S3-compatible artifact storage |
pgadmin |
5050 | Postgres web UI |
api |
8000 | FastAPI service |
worker |
— | RQ worker running the 10-stage pipeline |
The Dockerfile bundles all system dependencies (LibreOffice, FFmpeg, Poppler, Tesseract).
GCP Compute Engine (production demo)
Recommended VM: e2-medium (2 vCPU, 4 GB RAM) — covered by the GCP $200 free credit for several months. Zero code changes:
gcloud compute ssh slidesherlock-vm
git clone https://github.com/sachinkg12/SlideSherlock.git
cd SlideSherlock && cp .env.example .env && docker compose up -d
Open ports 8000 (API) and 3000 (Web UI). Pre-loaded demo mode (read-only) is recommended for public reviewer access.
System Dependencies
Checked via make doctor. All bundled in the Docker image.
| Dependency | Purpose | Install |
|---|---|---|
| LibreOffice | PPTX → PDF | brew install --cask libreoffice |
| FFmpeg | Video composition | brew install ffmpeg |
| Poppler | PDF → PNG | brew install poppler |
| Tesseract | OCR (vision graph) | brew install tesseract |
Citation
If you use SlideSherlock in your research, please cite:
@software{gupta_slidesherlock_2026,
author = {Gupta, Sachin},
title = {SlideSherlock: Evidence-Grounded Presentation-to-Video Pipeline},
year = {2026},
doi = {10.5281/zenodo.19413324},
url = {https://github.com/sachinkg12/SlideSherlock},
license = {Apache-2.0}
}
License
Copyright 2026 Sachin Gupta
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slidesherlock-1.1.0.tar.gz.
File metadata
- Download URL: slidesherlock-1.1.0.tar.gz
- Upload date:
- Size: 383.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
406df7aac9e279a79ad31ee0dd0e19d72e32bd205596492aca1458bf38bf662e
|
|
| MD5 |
125a2415ee9349b6a2e38d1d1832f0be
|
|
| BLAKE2b-256 |
2f05bf78816e2b4d57e66675d174a07401f516489ed80f51b67b00d2c8791b8a
|
File details
Details for the file slidesherlock-1.1.0-py3-none-any.whl.
File metadata
- Download URL: slidesherlock-1.1.0-py3-none-any.whl
- Upload date:
- Size: 296.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1b079ea6806b3c4c00987dff640939d8187b4117259a3e1782b63dd6a2f003a
|
|
| MD5 |
a70194bee759fb55cd4d75abca584815
|
|
| BLAKE2b-256 |
f392eeb4a67a37b0e9a28dd1a153cbd9c98bbb9ba7c16bb4ade22cc1605ee783
|