Extract slides and transcripts from lecture videos with minimal dependencies
Project description
Features
- Scene detection using global pixel difference (research-based method optimized for lecture videos)
- Automatic slide extraction with simple numbered filenames (slide_001, slide_002, ...)
- Audio transcription through a running OpenAI-compatible Whisper service
- Markdown export - single
slides.mdfile (LLM-friendly) or split mode with separate files - OCR with Tesseract
- AI descriptions through a running
llama.cppservice
Requirements
- Python ≥ 3.10
- FFmpeg (must be installed separately and available in PATH)
- Whisper server speaking the OpenAI
/v1/audio/transcriptionsAPI on127.0.0.1:8427(e.g.whisper.cpp'swhisper-server, faster-whisper-server, LocalAI, Vox-Box) - llama.cpp running a completion API on
127.0.0.1:8081
Installing FFmpeg
macOS:
brew install ffmpeg
Ubuntu/Debian:
sudo apt-get install ffmpeg
Windows: Download from ffmpeg.org or use:
winget install ffmpeg
Installation
pip install slidegeist
Developer Setup
git clone git@github.com:itpplasma/slidegeist.git
cd slidegeist
pip install -e ".[dev]"
Quick Start
Process a lecture video to extract slides and transcript:
slidegeist lecture.mp4 --out output/
This creates:
output/
├── slides.md # Combined file with table of contents and all slides
└── slides/
├── slide_001.jpg # Slide images (1-based numbering)
├── slide_002.jpg
└── slide_003.jpg
For separate slide files (useful for navigation in some tools), use --split:
slidegeist lecture.mp4 --split
This creates:
output/
├── index.md # Overview with links to all slides
├── slide_001.md # Slide 1 with transcript and OCR
├── slide_002.md # Slide 2 with transcript and OCR
├── slide_003.md # Slide 3 with transcript and OCR
└── slides/
├── slide_001.jpg # Slide images
├── slide_002.jpg
└── slide_003.jpg
Usage
Full Processing
# Basic usage (uses the configured remote services)
slidegeist video.mp4
# Specify output directory
slidegeist video.mp4 --out my-output/
# Use smaller/faster model
slidegeist video.mp4 --model base
# Adjust scene detection sensitivity (0.0-1.0, default 0.025).
# Acts as the *starting point* for the Opencast optimizer.
# Lower values bias toward more segments; higher values toward fewer.
slidegeist video.mp4 --scene-threshold 0.015
# Explicit process command (same as default)
slidegeist process video.mp4
Individual Operations
# Extract only slides (no transcription)
slidegeist slides video.mp4
CLI Options
slidegeist <video> [options]
slidegeist {process,slides} <video> [options]
Options:
--out DIR Output directory (default: video filename)
--split Create separate markdown files (index.md + slide_NNN.md)
instead of single slides.md (default: combined file)
--scene-threshold NUM Initial scene detection sensitivity 0.0-1.0 (default: 0.025)
Used as the optimizer's starting threshold; it will
auto-adjust to reach a stable segment count.
--model NAME Whisper model: tiny, base, small, medium, large, large-v2, large-v3
(default: large-v3)
--format FMT Image format: jpg or png (default: jpg)
-v, --verbose Enable verbose logging
Output Format
Default: Combined slides.md (Recommended)
By default, Slidegeist creates a single slides.md file containing:
- Video metadata (source, duration, model used)
- Table of contents with clickable links to each slide
- All slides with images, transcripts, and OCR content
Benefits:
- Single file is easy to process with LLMs
- No navigation between files needed
- Smaller overall output size
Example structure:
# Lecture Slides
**Video:** lecture.mp4
**Duration:** 45:30
**Transcription Model:** large-v3
## Table of Contents
- [Slide 1](#slide_001) • 00:00-05:15
- [Slide 2](#slide_002) • 05:15-12:30
...
---
## Slide 1
**Time:** 00:00 - 05:15

**Slide Content:**
Introduction to Quantum Mechanics
**Transcript:**
Today we discuss quantum mechanics and its implications...
---
## Slide 2
...
Split Mode (--split flag)
With --split, creates separate files for each slide (useful for some viewers/tools):
- Index:
index.md- Overview with links to individual slide files - Slide markdown:
slide_001.md,slide_002.md, ... - Per-slide files with YAML front matter - Slide images:
slides/slide_001.jpg,slides/slide_002.jpg, ...
Each split slide file contains:
---
id: slide_001
index: 1
time_start: 0.0
time_end: 315.0
image: slides/slide_001.jpg
---
# Slide 1
[](slides/slide_001.jpg)
## Transcript
Today we discuss quantum mechanics...
## Slide Content
Introduction to Quantum Mechanics
**Visual Elements:** diagram, formula
How It Works
- Scene Detection: Uses FFmpeg's scene filter (SAD-based) with an Opencast-style optimizer to identify slide changes
- Iteratively adjusts the scene threshold to target ~30 segments per hour (typical slide pace)
- Treats
--scene-thresholdas the initial threshold; the optimizer raises or lowers it until the slide count converges - Merges segments shorter than 2 seconds to suppress rapid flickers
- Based on Opencast's VideoSegmenterService implementation
- Slide Extraction: Extracts frames at 80% through each segment into
slides/directory with simpleslide_XXX.jpgnames - Transcription: Extracts audio with FFmpeg and submits it, in 2-minute chunks, to the running OpenAI-compatible Whisper HTTP API
- OCR: Uses Tesseract OCR on extracted slide images
- AI descriptions: Sends OCR and transcript context to the running
llama.cppserver - Export: Generates Markdown files with YAML front matter, linking slides to their transcripts and OCR content
Performance
Model Recommendations:
large-v3-turbo: Fast remote transcription when your Whisper server exposes itlarge-v3: Best accuracy (default) - recommended for productionmedium: Good balance - 2x faster, slightly lower accuracybase: Quick testing - 5x faster, noticeably lower accuracytiny: Very fast - 10x faster, lowest accuracy
Troubleshooting
Remote Services
# Verify llama.cpp
curl http://127.0.0.1:8081/health
# Verify the Whisper server
curl -I http://127.0.0.1:8427/v1/audio/transcriptions
Set SLIDEGEIST_LLAMACPP_URL or SLIDEGEIST_WHISPER_URL if the services listen on different addresses.
Limitations
- Scene detection may need threshold tuning for some videos (default 0.025 works well for most lectures; because the optimizer auto-adjusts, use lower values like 0.015 to bias toward more slides or 0.03+ to bias toward fewer major transitions)
Advanced Threshold Tuning
- The Opencast optimizer targets roughly 30 segments per hour. That goal works well for standard lectures but you can steer it:
- Lower
--scene-thresholdto encourage more segments before optimization. Useful when the optimizer consistently undershoots the actual slide count. - Raise
--scene-thresholdto bias toward fewer segments when the optimizer overshoots and splits slides too often.
- Lower
--scene-thresholdis still bounded between 0.0 and 1.0. Values outside this range will be rejected by the CLI validator.- No speaker diarization
- No automatic slide deduplication
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run linter
ruff check slidegeist/
# Run type checker
mypy slidegeist/
Legal Notice
Slidegeist is provided for educational and research purposes only. Users must ensure they have the legal right to access, download, or process any video files they use with this tool. The author does not endorse or facilitate copyright infringement or violation of platform terms of service.
License
MIT License - Copyright (c) 2025 Christopher Albert
See LICENSE for details.
Contributing
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slidegeist-2026.4.23.tar.gz.
File metadata
- Download URL: slidegeist-2026.4.23.tar.gz
- Upload date:
- Size: 62.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bedca4196111cc5f238c55f24006fc71210538da4245b9f7f8119af38c71b2e
|
|
| MD5 |
4d8caba5694ddaeb3403e564f800e50b
|
|
| BLAKE2b-256 |
9cbdcd6988c3957cd6fd324c89260a685a421658b1d7bc3020529816daa61ca4
|
Provenance
The following attestation bundles were made for slidegeist-2026.4.23.tar.gz:
Publisher:
release.yml on krystophny/slidegeist
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slidegeist-2026.4.23.tar.gz -
Subject digest:
2bedca4196111cc5f238c55f24006fc71210538da4245b9f7f8119af38c71b2e - Sigstore transparency entry: 1361510812
- Sigstore integration time:
-
Permalink:
krystophny/slidegeist@9a5813acac6f612fe3ca22ea77cefd35f770e503 -
Branch / Tag:
refs/tags/v2026.04.23 - Owner: https://github.com/krystophny
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9a5813acac6f612fe3ca22ea77cefd35f770e503 -
Trigger Event:
push
-
Statement type:
File details
Details for the file slidegeist-2026.4.23-py3-none-any.whl.
File metadata
- Download URL: slidegeist-2026.4.23-py3-none-any.whl
- Upload date:
- Size: 44.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b20482591175b1c72f227435646eb6133d0578c264ecd79f00fcedee665d0373
|
|
| MD5 |
5c7a481f9e089448888835494d72514e
|
|
| BLAKE2b-256 |
388dd229f92f052ceb36ad409c49f14e0c8cdc616ed985508d420b50a9d3f8c4
|
Provenance
The following attestation bundles were made for slidegeist-2026.4.23-py3-none-any.whl:
Publisher:
release.yml on krystophny/slidegeist
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slidegeist-2026.4.23-py3-none-any.whl -
Subject digest:
b20482591175b1c72f227435646eb6133d0578c264ecd79f00fcedee665d0373 - Sigstore transparency entry: 1361510814
- Sigstore integration time:
-
Permalink:
krystophny/slidegeist@9a5813acac6f612fe3ca22ea77cefd35f770e503 -
Branch / Tag:
refs/tags/v2026.04.23 - Owner: https://github.com/krystophny
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9a5813acac6f612fe3ca22ea77cefd35f770e503 -
Trigger Event:
push
-
Statement type: