slidegeist

Extract slides and transcripts from lecture videos with minimal dependencies

These details have not been verified by PyPI

Project description

slidegeist_logo

Features

Scene detection using global pixel difference (research-based method optimized for lecture videos)
Automatic slide extraction with simple numbered filenames (slide_001, slide_002, ...)
Audio transcription through a running OpenAI-compatible Whisper service
Markdown export - single slides.md file (LLM-friendly) or split mode with separate files
OCR with Tesseract
AI descriptions through a running llama.cpp service

Requirements

Python ≥ 3.10
FFmpeg (must be installed separately and available in PATH)
Whisper server speaking the OpenAI /v1/audio/transcriptions API on 127.0.0.1:8427 (e.g. whisper.cpp's whisper-server, faster-whisper-server, LocalAI, Vox-Box)
llama.cpp running a completion API on 127.0.0.1:8081

Installing FFmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt-get install ffmpeg

Windows: Download from ffmpeg.org or use:

winget install ffmpeg

Installation

pip install slidegeist

Developer Setup

git clone git@github.com:itpplasma/slidegeist.git
cd slidegeist
pip install -e ".[dev]"

Quick Start

Process a lecture video to extract slides and transcript:

slidegeist lecture.mp4 --out output/

This creates:

output/
├── slides.md                        # Combined file with table of contents and all slides
└── slides/
    ├── slide_001.jpg                # Slide images (1-based numbering)
    ├── slide_002.jpg
    └── slide_003.jpg

For separate slide files (useful for navigation in some tools), use --split:

slidegeist lecture.mp4 --split

This creates:

output/
├── index.md                         # Overview with links to all slides
├── slide_001.md                     # Slide 1 with transcript and OCR
├── slide_002.md                     # Slide 2 with transcript and OCR
├── slide_003.md                     # Slide 3 with transcript and OCR
└── slides/
    ├── slide_001.jpg                # Slide images
    ├── slide_002.jpg
    └── slide_003.jpg

Usage

Full Processing

# Basic usage (uses the configured remote services)
slidegeist video.mp4

# Specify output directory
slidegeist video.mp4 --out my-output/

# Use smaller/faster model
slidegeist video.mp4 --model base

# Adjust scene detection sensitivity (0.0-1.0, default 0.025).
# Acts as the *starting point* for the Opencast optimizer.
# Lower values bias toward more segments; higher values toward fewer.
slidegeist video.mp4 --scene-threshold 0.015

# Explicit process command (same as default)
slidegeist process video.mp4

Individual Operations

# Extract only slides (no transcription)
slidegeist slides video.mp4

CLI Options

slidegeist <video> [options]
slidegeist {process,slides} <video> [options]

Options:
  --out DIR              Output directory (default: video filename)
  --split               Create separate markdown files (index.md + slide_NNN.md)
                        instead of single slides.md (default: combined file)
  --scene-threshold NUM  Initial scene detection sensitivity 0.0-1.0 (default: 0.025)
                         Used as the optimizer's starting threshold; it will
                         auto-adjust to reach a stable segment count.
  --model NAME          Whisper model: tiny, base, small, medium, large, large-v2, large-v3
                        (default: large-v3)
  --format FMT          Image format: jpg or png (default: jpg)
  -v, --verbose         Enable verbose logging

Output Format

Default: Combined slides.md (Recommended)

By default, Slidegeist creates a single slides.md file containing:

Video metadata (source, duration, model used)
Table of contents with clickable links to each slide
All slides with images, transcripts, and OCR content

Benefits:

Single file is easy to process with LLMs
No navigation between files needed
Smaller overall output size

Example structure:

# Lecture Slides

**Video:** lecture.mp4
**Duration:** 45:30
**Transcription Model:** large-v3

## Table of Contents

- [Slide 1](#slide_001) • 00:00-05:15
- [Slide 2](#slide_002) • 05:15-12:30
...

---

## Slide 1

**Time:** 00:00 - 05:15

![Slide](slides/slide_001.jpg)

**Slide Content:**
Introduction to Quantum Mechanics

**Transcript:**
Today we discuss quantum mechanics and its implications...

---

## Slide 2
...

Split Mode (--split flag)

With --split, creates separate files for each slide (useful for some viewers/tools):

Index: index.md - Overview with links to individual slide files
Slide markdown: slide_001.md, slide_002.md, ... - Per-slide files with YAML front matter
Slide images: slides/slide_001.jpg, slides/slide_002.jpg, ...

Each split slide file contains:

---
id: slide_001
index: 1
time_start: 0.0
time_end: 315.0
image: slides/slide_001.jpg
---

# Slide 1

[![Slide Image](slides/slide_001.jpg)](slides/slide_001.jpg)

## Transcript

Today we discuss quantum mechanics...

## Slide Content

Introduction to Quantum Mechanics

**Visual Elements:** diagram, formula

How It Works

Scene Detection: Uses FFmpeg's scene filter (SAD-based) with an Opencast-style optimizer to identify slide changes
- Iteratively adjusts the scene threshold to target ~30 segments per hour (typical slide pace)
- Treats --scene-threshold as the initial threshold; the optimizer raises or lowers it until the slide count converges
- Merges segments shorter than 2 seconds to suppress rapid flickers
- Based on Opencast's VideoSegmenterService implementation
Slide Extraction: Extracts frames at 80% through each segment into slides/ directory with simple slide_XXX.jpg names
Transcription: Extracts audio with FFmpeg and submits it, in 2-minute chunks, to the running OpenAI-compatible Whisper HTTP API
OCR: Uses Tesseract OCR on extracted slide images
AI descriptions: Sends OCR and transcript context to the running llama.cpp server
Export: Generates Markdown files with YAML front matter, linking slides to their transcripts and OCR content

Performance

Model Recommendations:

large-v3-turbo: Fast remote transcription when your Whisper server exposes it
large-v3: Best accuracy (default) - recommended for production
medium: Good balance - 2x faster, slightly lower accuracy
base: Quick testing - 5x faster, noticeably lower accuracy
tiny: Very fast - 10x faster, lowest accuracy

Troubleshooting

Remote Services

# Verify llama.cpp
curl http://127.0.0.1:8081/health

# Verify the Whisper server
curl -I http://127.0.0.1:8427/v1/audio/transcriptions

Set SLIDEGEIST_LLAMACPP_URL or SLIDEGEIST_WHISPER_URL if the services listen on different addresses.

Limitations

Scene detection may need threshold tuning for some videos (default 0.025 works well for most lectures; because the optimizer auto-adjusts, use lower values like 0.015 to bias toward more slides or 0.03+ to bias toward fewer major transitions)

Advanced Threshold Tuning

The Opencast optimizer targets roughly 30 segments per hour. That goal works well for standard lectures but you can steer it:
- Lower --scene-threshold to encourage more segments before optimization. Useful when the optimizer consistently undershoots the actual slide count.
- Raise --scene-threshold to bias toward fewer segments when the optimizer overshoots and splits slides too often.
--scene-threshold is still bounded between 0.0 and 1.0. Values outside this range will be rejected by the CLI validator.
No speaker diarization
No automatic slide deduplication

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linter
ruff check slidegeist/

# Run type checker
mypy slidegeist/

Legal Notice

Slidegeist is provided for educational and research purposes only. Users must ensure they have the legal right to access, download, or process any video files they use with this tool. The author does not endorse or facilitate copyright infringement or violation of platform terms of service.

License

See LICENSE for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2026.4.23

Apr 23, 2026

2025.11.3

Nov 3, 2025

2025.10.24

Oct 23, 2025

2025.10.23

Oct 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slidegeist-2026.4.23.tar.gz (62.4 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

slidegeist-2026.4.23-py3-none-any.whl (44.5 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file slidegeist-2026.4.23.tar.gz.

File metadata

Download URL: slidegeist-2026.4.23.tar.gz
Upload date: Apr 23, 2026
Size: 62.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for slidegeist-2026.4.23.tar.gz
Algorithm	Hash digest
SHA256	`2bedca4196111cc5f238c55f24006fc71210538da4245b9f7f8119af38c71b2e`
MD5	`4d8caba5694ddaeb3403e564f800e50b`
BLAKE2b-256	`9cbdcd6988c3957cd6fd324c89260a685a421658b1d7bc3020529816daa61ca4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for slidegeist-2026.4.23.tar.gz:

Publisher: release.yml on krystophny/slidegeist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: slidegeist-2026.4.23.tar.gz
- Subject digest: 2bedca4196111cc5f238c55f24006fc71210538da4245b9f7f8119af38c71b2e
- Sigstore transparency entry: 1361510812
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: krystophny/slidegeist@9a5813acac6f612fe3ca22ea77cefd35f770e503
- Branch / Tag: refs/tags/v2026.04.23
- Owner: https://github.com/krystophny
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@9a5813acac6f612fe3ca22ea77cefd35f770e503
- Trigger Event: push

File details

Details for the file slidegeist-2026.4.23-py3-none-any.whl.

File metadata

Download URL: slidegeist-2026.4.23-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 44.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for slidegeist-2026.4.23-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b20482591175b1c72f227435646eb6133d0578c264ecd79f00fcedee665d0373`
MD5	`5c7a481f9e089448888835494d72514e`
BLAKE2b-256	`388dd229f92f052ceb36ad409c49f14e0c8cdc616ed985508d420b50a9d3f8c4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for slidegeist-2026.4.23-py3-none-any.whl:

Publisher: release.yml on krystophny/slidegeist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: slidegeist-2026.4.23-py3-none-any.whl
- Subject digest: b20482591175b1c72f227435646eb6133d0578c264ecd79f00fcedee665d0373
- Sigstore transparency entry: 1361510814
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: krystophny/slidegeist@9a5813acac6f612fe3ca22ea77cefd35f770e503
- Branch / Tag: refs/tags/v2026.04.23
- Owner: https://github.com/krystophny
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@9a5813acac6f612fe3ca22ea77cefd35f770e503
- Trigger Event: push

slidegeist 2026.4.23

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Features

Requirements

Installing FFmpeg

Installation

Developer Setup

Quick Start

Usage

Full Processing

Individual Operations

CLI Options

Output Format

Default: Combined slides.md (Recommended)

Split Mode (--split flag)

How It Works

Performance

Troubleshooting

Remote Services

Limitations

Advanced Threshold Tuning

Development

Legal Notice

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance