Evidence-grounded presentation-to-video pipeline. Every claim is traceable — no hallucinations.

These details have not been verified by PyPI

Project links

Project description

SlideSherlock

SlideSherlock is an evidence-grounded pipeline that converts PowerPoint presentations into narrated explainer videos. Every narrated claim is traceable to specific slide content — no hallucinations, no invented facts.

Try it in 60 seconds

git clone https://github.com/sachinkg12/SlideSherlock.git
cd SlideSherlock
cp .env.example .env          # Add OPENAI_API_KEY for AI narration (optional)
docker compose up -d           # Start all 6 services
curl http://localhost:8000/health   # → {"status": "ok"}

Then open http://localhost:8000/docs for the interactive API, or use the CLI:

slidesherlock run your_deck.pptx --preset draft -o output/
open output/final.mp4

No Docker? Use SQLite mode: DATABASE_URL=sqlite:///./slidesherlock.db slidesherlock run deck.pptx

Why SlideSherlock?

Existing slide-to-video tools either read bullet points verbatim or hallucinate content that doesn't exist in the source material. SlideSherlock solves this with three novel mechanisms:

Evidence Index — Every piece of PPTX content (text, shapes, images, connectors) receives a stable, content-addressable evidence ID (SHA-256(job|slide|kind|offset)). All downstream narration must cite these IDs.
Verifier Loop — A closed-loop control system: generate script → verify against evidence (PASS / REWRITE / REMOVE) → regenerate → re-verify until convergence. Not post-hoc filtering — inline verification with iterative correction.
Dual-Provenance Knowledge Graph — Two independent graphs are built per slide: G_native from PPT XML (shapes, connectors, groups) and G_vision from rendered PNGs + OCR. These merge into G_unified where each node carries provenance (NATIVE / VISION / BOTH), confidence scores, and needs_review flags.

Features

Feature	Description
10-Stage Pipeline	Ingest → Evidence → Render → Graph → Script → Verify → Translate → Narrate → Audio → Video
AI Narration	Optional GPT-4o(-mini) rewrite: evidence-grounded template → natural presenter delivery (two-pass, hallucination-free)
Vision Understanding	Optional GPT-4o vision extractor for photo captions, diagram entities, and OCR — cached by image hash
Quality Presets	Draft (fast), Standard (subtitles + crossfade), Pro (vision AI + BGM + loudness normalization)
Multi-Language	Generate variants from one PPTX. Shared evidence and graphs; only language-dependent stages re-run
Web UI	React Mission Control dashboard: pipeline track, focus panel, activity feed, dark/light theme, color-blind safe
CLI	`slidesherlock run deck.pptx --preset pro --ai-narration` with structured JSON logging for experiments
Docker	`docker compose up` — one command for the full 6-service stack
167 Tests	Automated test suite covering evidence grounding, verification, graph fusion, and pipeline stages

Quick Start

Docker (recommended)

git clone https://github.com/sachinkg12/SlideSherlock.git
cd SlideSherlock
cp .env.example .env       # Add your OPENAI_API_KEY (optional; stub used otherwise)
docker compose up           # Starts API, worker, Postgres, Redis, MinIO, pgAdmin

API: http://localhost:8000
API Docs (interactive): http://localhost:8000/docs (Swagger UI) or http://localhost:8000/redoc (ReDoc)
Web UI: http://localhost:3000 (if running pnpm dev in apps/web/)

Local Development

make setup                  # Create venv + install deps
make up                     # Start Postgres, Redis, MinIO (Docker)
make migrate                # Run database migrations
make api                    # Start FastAPI server (port 8000)
make worker                 # Start pipeline worker (separate terminal)

CLI (no Redis/RQ needed)

slidesherlock run deck.pptx                                  # Draft preset, output to ./output/
slidesherlock run deck.pptx --preset pro -o results/         # Pro preset, custom output
slidesherlock run deck.pptx --preset pro --ai-narration      # Enable GPT-4o narration rewrite
slidesherlock run deck.pptx --preset standard --lang hi-IN   # Add Hindi second-language variant
slidesherlock run deck.pptx --preset pro --dry-run             # Metrics only, no audio/video
slidesherlock doctor                                          # Check system dependencies
slidesherlock doctor --json                                   # Machine-readable JSON output

Each CLI run produces metrics.json and run_log.json (structured log for experiment aggregation). Full runs also produce final.mp4. Use --dry-run for quick validation without video encode.

Local LLM (no API key needed)

SlideSherlock supports 10 OpenAI-compatible LLM providers. To use Ollama for fully local operation:

# Install and start Ollama
ollama pull llama3.1:8b        # text model for narration
ollama pull llava:7b            # vision model for image understanding

# Run SlideSherlock with Ollama
LLM_PROVIDER=ollama slidesherlock run deck.pptx --preset pro --ai-narration

Other supported providers: OpenAI, Groq, Together, OpenRouter, DeepInfra, Anyscale, LM Studio, vLLM, LocalAI. See packages/core/llm_config.py for the full registry.

Architecture

SlideSherlock Pipeline Architecture

The pipeline follows the Open/Closed Principle: each stage implements a Stage protocol. Adding a new stage requires no changes to existing code — just add a class and register it.

class Stage(Protocol):
    name: str
    def run(self, ctx: PipelineContext) -> StageResult: ...

Pipeline Stages

Stage	Key Modules	Output
Ingest	`ppt_parser`, `image_extract`, `image_classifier`	`ppt/slide_*.json`, `images/`
Evidence	`evidence_index`, `photo_understand`, `diagram_understand`	`evidence/index.json`
Render	LibreOffice + pdf2image	`render/deck.pdf`, `render/slides/*.png`
Graph	`native_graph`, `vision_graph`, `merge_engine`	`graphs/unified/slide_*.json`
Script	`explain_plan`, `script_generator`, `script_context`	`script/{variant}/script.json`
Verify	`verifier` (closed-loop rewrite)	`verify_report.json`, `coverage.json`
Translate	`translator_provider` (l2 variants only)	Translated script + notes
Narrate	`narrate` (GPT-4o, optional)	`ai_narration.json`
Audio	`audio_prepare`, `tts_provider`	`audio/{variant}/slide_*.wav`
Video	`timeline_builder`, `overlay_renderer`, `composer`	`output/{variant}/final.mp4`

No-Hallucination Design

The verifier checks every narration claim against the evidence index. Claims grounded in evidence PASS. Claims about visual content with low confidence get REWRITE (hedging language added, e.g., "appears to show"). Claims with no supporting evidence are REMOVEd — fabricated content is never narrated. The loop iterates until all claims converge.

AI Narration

SlideSherlock includes a dedicated NarrateStage that produces natural presenter-style narration without sacrificing the no-hallucination guarantee. It uses a two-pass design:

Pass 1 — Evidence-grounded template. The deterministic script generator produces narration where every sentence cites evidence IDs. The verifier loop validates and rewrites until every claim is grounded.
Pass 2 — Natural rewrite. GPT-4o(-mini) rewrites each verified segment for natural delivery, but is constrained to the validated factual content. No new claims can be introduced — the rewriter only changes phrasing, rhythm, and pronunciation.

NarrateStage uses requests directly (not the openai SDK) because the SDK's httpx transport deadlocks inside RQ-forked workers.

Configuration:

Variable	Default	Purpose
`OPENAI_API_KEY`	(unset)	Required to enable AI narration
`LLM_PROVIDER`	`stub`	Set to `openai` to activate, or use the UI/CLI flag
`NARRATE_MODEL`	`gpt-4o-mini`	Override to `gpt-4o` for highest quality
`NARRATE_PARALLEL`	`5`	Concurrent rewrite calls (per slide)

Cost estimates:

Model	Cost per slide	16-slide deck
`gpt-4o-mini` (default)	~$0.001	~$0.02
`gpt-4o` (full)	~$0.01	~$0.16

Three ways to enable:

Web UI — AI Narration toggle on the upload page
CLI — slidesherlock run deck.pptx --ai-narration
Env — export LLM_PROVIDER=openai before starting the worker (with OPENAI_API_KEY)

Vision Understanding

The OpenAIVisionExtractor optionally enriches each slide with computer vision:

Default: stub provider — no API calls, deterministic output, free
Real vision: set VISION_PROVIDER=openai + OPENAI_API_KEY
Default vision model: gpt-4o-mini (override with OPENAI_VISION_MODEL=gpt-4o)
Vision cache: enabled by default; results cached by image SHA-256 in MinIO to avoid duplicate API calls across re-runs

For each slide image the vision extractor produces:

Photo captions — natural-language description of photographic content
Diagram entities — boxes, arrows, labels, and their spatial relationships
OCR text — text rendered as images (chart labels, callouts, decorative type)

All vision-derived facts are written to the evidence index with IMAGE_* / DIAGRAM_* kinds, so the verifier can ground image-related claims to them.

Configuration

All configuration is environment-variable driven. Copy .env.example to .env for local development; for Docker, infra vars are pre-configured in docker-compose.yml.

LLM

Variable	Default	Purpose
`OPENAI_API_KEY`	(unset)	API key for LLM, vision, and narration
`LLM_PROVIDER`	`stub`	`stub` (deterministic) or `openai` (enables AI narration)
`NARRATE_MODEL`	`gpt-4o-mini`	Narration rewrite model
`NARRATE_PARALLEL`	`5`	Parallel narration calls per slide

Vision

Variable	Default	Purpose
`VISION_PROVIDER`	`stub`	`stub` or `openai`
`OPENAI_VISION_MODEL`	`gpt-4o-mini`	Vision extractor model
`VISION_CACHE_ENABLED`	`true`	Cache vision results by image hash
`VISION_PHOTO_CONFIDENCE`	`0.6`	Min confidence for photo captions
`VISION_DIAGRAM_CONFIDENCE`	`0.6`	Min confidence for diagram entities
`VISION_OCR_CONFIDENCE`	`0.5`	Min confidence for OCR text

TTS

Variable	Default	Purpose
`USE_SYSTEM_TTS`	`false`	macOS system TTS via `say` (avoids pyttsx3 fork hang)
`AUDIO_VOICE_PROVIDER`	`stub`	`stub`, `system`, or future cloud providers

Video

Variable	Default	Purpose
`VIDEO_TRANSITION`	`cut`	`cut` or `crossfade` between slides
`VIDEO_INTRO_ENABLED`	`false`	Show intro card
`VIDEO_OUTRO_ENABLED`	`false`	Show outro card
`AUDIO_BGM_ENABLED`	`false`	Background music bed
`AUDIO_BGM_DUCK_DB`	`-18`	dB to duck BGM under narration
`AUDIO_LOUDNESS_NORM`	`false`	EBU R128 loudness normalization

Subtitles

Variable	Default	Purpose
`SUBTITLES_ENABLED`	`false`	Generate `.srt` sidecar
`SUBTITLES_BURN_IN`	`false`	Burn subtitles into the video frame

Presets

Variable	Default	Purpose
`SLIDESHERLOCK_PRESET`	`draft`	`draft`, `standard`, or `pro` (sets all video/audio/subtitle/vision flags)

API Endpoints

Endpoint	Method	Purpose
`/jobs/quick`	POST	Upload PPTX + create project + start pipeline (one step)
`/jobs`	POST	Create a job (advanced)
`/jobs/{id}`	GET	Job status
`/jobs/{id}/upload_pptx`	POST	Upload PPTX to existing job
`/jobs/{id}/progress`	GET	Per-stage progress for UI polling
`/jobs/{id}/metrics`	GET	Pipeline metrics (durations, counts, coverage)
`/jobs/{id}/evidence-trail`	GET	Live verifier decisions (PASS/REWRITE/REMOVE)
`/jobs/{id}/output/{variant}/final.mp4`	GET	Stream video (HTTP Range support for seeking)
`/jobs/{id}/output/{path}`	GET	Stream any artifact under `jobs/{id}/` from MinIO
`/projects`	POST	Create a project
`/projects/{id}`	GET	Get project
`/health`	GET	Health check

Web UI

React 18 + TypeScript + Vite + Tailwind CSS + Framer Motion. Mission Control design with dark/light theme support.

cd apps/web && pnpm install && pnpm dev    # http://localhost:3000

Note: the web app uses pnpm (not npm). It also auto-enters demo mode if the backend is unreachable, or visit /?demo=true.

Three screens:

Upload — Drag-drop PPTX, preset selector (draft/standard/pro), AI Narration toggle
Progress — Mission Control: horizontal pipeline track + focus panel + activity feed + confetti on completion
Result — Pipeline report, video player with seek-drag and volume slider, download buttons

Mission Control layout:

PipelineTrack — horizontal flow of stage dots connected by animated lines; the active stage pulses
FocusPanel — large card for the current stage with rotating icon, live status text, and per-stage metrics
ActivityFeed — timestamped event log: stage start/finish events plus verifier verdicts (PASS/REWRITE/REMOVE)
ThemeToggle — sun/moon button in the header; preference persisted to localStorage
VideoPlayer — custom component with seek-drag, volume slider, and playback controls

Accessibility:

Color-blind safe palette — blue/orange (not red/green), shape and icon indicators primary, color secondary (WCAG 1.4.1 compliant)
Dark and light themes — full CSS custom-property theming
Keyboard navigation — all interactive elements reachable via tab

The stage registry in apps/web/src/config/stages.ts follows the Open/Closed Principle: adding a new pipeline stage to the UI requires only one new entry — PipelineTrack, FocusPanel, and ActivityFeed all derive from this registry.

Quality Presets

Feature	Draft	Standard	Pro
Vision (OpenAI)	Off	Off	On
AI Narration	Off (orthogonal)	Off (orthogonal)	Off (orthogonal)
Notes Overlay	Off	Off	Off
Transitions	Cut	Crossfade	Crossfade
Subtitles	On (sidecar)	On (sidecar)	On (sidecar)
Intro / Outro	Off	On	On
Background Music	Off	Off	On (ducked under narration)
Loudness Normalization	Off	EBU R128	EBU R128
Typical runtime (16 slides)	~30s	~60s	~3 min

AI Narration is orthogonal to presets — toggle it independently to enable the GPT-4o natural delivery rewrite on top of any preset.

SLIDESHERLOCK_PRESET=pro make worker        # Pro preset
SLIDESHERLOCK_PRESET=draft make worker      # Draft preset

Testing

make test               # 167 tests across core pipeline and API
make lint               # black --check + flake8 (max-line-length=100)
make doctor             # Check LibreOffice, FFmpeg, Poppler, Tesseract

Batch Experiments

Run the pipeline on a corpus of PPTXs for paper data collection:

python scripts/batch_run.py /path/to/pptx_dir --preset draft --workers 3 --output results/

Produces batch_summary.json and batch_summary.csv (one row per file, stage timings as columns) for direct use in paper tables.

Deployment

Docker Compose (local or VM)

docker compose up

This starts 6 services:

Service	Port	Purpose
`postgres`	5433	Job + artifact metadata
`redis`	6379	RQ job queue
`minio`	9000/9001	S3-compatible artifact storage
`pgadmin`	5050	Postgres web UI
`api`	8000	FastAPI service
`worker`	—	RQ worker running the 10-stage pipeline

The Dockerfile bundles all system dependencies (LibreOffice, FFmpeg, Poppler, Tesseract).

GCP Compute Engine (production demo)

Recommended VM: e2-medium (2 vCPU, 4 GB RAM) — covered by the GCP $200 free credit for several months. Zero code changes:

gcloud compute ssh slidesherlock-vm
git clone https://github.com/sachinkg12/SlideSherlock.git
cd SlideSherlock && cp .env.example .env && docker compose up -d

Open ports 8000 (API) and 3000 (Web UI). Pre-loaded demo mode (read-only) is recommended for public reviewer access.

System Dependencies

Checked via make doctor. All bundled in the Docker image.

Dependency	Purpose	Install
LibreOffice	PPTX → PDF	`brew install --cask libreoffice`
FFmpeg	Video composition	`brew install ffmpeg`
Poppler	PDF → PNG	`brew install poppler`
Tesseract	OCR (vision graph)	`brew install tesseract`

Citation

If you use SlideSherlock in your research, please cite:

@software{gupta_slidesherlock_2026,
  author    = {Gupta, Sachin},
  title     = {SlideSherlock: Evidence-Grounded Presentation-to-Video Pipeline},
  year      = {2026},
  doi       = {10.5281/zenodo.19413324},
  url       = {https://github.com/sachinkg12/SlideSherlock},
  license   = {Apache-2.0}
}

License

Apache License 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slidesherlock-1.1.0.tar.gz (383.2 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

slidesherlock-1.1.0-py3-none-any.whl (296.9 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file slidesherlock-1.1.0.tar.gz.

File metadata

Download URL: slidesherlock-1.1.0.tar.gz
Upload date: Apr 13, 2026
Size: 383.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for slidesherlock-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`406df7aac9e279a79ad31ee0dd0e19d72e32bd205596492aca1458bf38bf662e`
MD5	`125a2415ee9349b6a2e38d1d1832f0be`
BLAKE2b-256	`2f05bf78816e2b4d57e66675d174a07401f516489ed80f51b67b00d2c8791b8a`

See more details on using hashes here.

File details

Details for the file slidesherlock-1.1.0-py3-none-any.whl.

File metadata

Download URL: slidesherlock-1.1.0-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 296.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for slidesherlock-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c1b079ea6806b3c4c00987dff640939d8187b4117259a3e1782b63dd6a2f003a`
MD5	`a70194bee759fb55cd4d75abca584815`
BLAKE2b-256	`f392eeb4a67a37b0e9a28dd1a153cbd9c98bbb9ba7c16bb4ade22cc1605ee783`

See more details on using hashes here.

slidesherlock 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SlideSherlock

Try it in 60 seconds

Why SlideSherlock?

Features

Quick Start

Docker (recommended)

Local Development

CLI (no Redis/RQ needed)

Local LLM (no API key needed)

Architecture

Pipeline Stages

No-Hallucination Design

AI Narration

Vision Understanding

Configuration

LLM

Vision

TTS

Video

Subtitles

Presets

API Endpoints

Web UI

Quality Presets

Testing

Batch Experiments

Deployment

Docker Compose (local or VM)

GCP Compute Engine (production demo)

System Dependencies

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes