Skip to main content

Extract slides and transcripts from lecture videos with minimal dependencies

Project description

Slidegeist

Extract slides and timestamped transcripts from lecture videos with minimal dependencies.

Features

  • Scene detection using global pixel difference (research-based method optimized for lecture videos)
  • Automatic slide extraction with timestamp ranges in filenames
  • Audio transcription with Whisper large-v3 model (highest quality)
  • MLX acceleration on Apple Silicon Macs for 2-3x faster transcription
  • JSON export with slides grouped by their transcripts

Requirements

  • Python ≥ 3.10
  • FFmpeg (must be installed separately and available in PATH)

Installing FFmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt-get install ffmpeg

Windows: Download from ffmpeg.org or use:

winget install ffmpeg

Installation

# Clone the repository
git clone https://github.com/itpplasma/slidegeist.git
cd slidegeist

# Install with pip
pip install -e .

# On Apple Silicon Macs, install with MLX for 2-3x faster transcription
pip install -e ".[mlx]"

# Or with development dependencies
pip install -e ".[dev]"

Quick Start

Process a lecture video to extract slides and transcript:

slidegeist lecture.mp4 --out output/

This creates:

output/
├── slide_000_00:00:00-00:02:05.jpg  # Slide from 0:00 to 2:05
├── slide_001_00:02:05-00:04:47.jpg  # Slide from 2:05 to 4:47
├── slide_002_00:04:47-00:07:30.jpg  # Slide from 4:47 to 7:30
└── slides.json                      # Slides with transcripts and metadata

Usage

Full Processing

# Basic usage (auto-detects MLX on Apple Silicon, uses large-v3 model)
slidegeist video.mp4

# Specify output directory
slidegeist video.mp4 --out my-output/

# Use GPU explicitly (NVIDIA)
slidegeist video.mp4 --device cuda

# Use smaller/faster model
slidegeist video.mp4 --model base

# Adjust scene detection sensitivity (0.0-1.0, default 0.10)
# Lower values detect more subtle changes, higher values only major transitions
slidegeist video.mp4 --scene-threshold 0.05

# Explicit process command (same as default)
slidegeist process video.mp4

Individual Operations

# Extract only slides (no transcription)
slidegeist slides video.mp4

CLI Options

slidegeist <video> [options]
slidegeist {process,slides} <video> [options]

Options:
  --out DIR              Output directory (default: video filename)
  --scene-threshold NUM  Scene detection sensitivity 0.0-1.0 (default: 0.10)
  --model NAME          Whisper model: tiny, base, small, medium, large, large-v2, large-v3
                        (default: large-v3)
  --device NAME         Device: cpu, cuda, or auto (default: auto)
                        auto = MLX on Apple Silicon if available, else CPU
  --format FMT          Image format: jpg or png (default: jpg)
  -v, --verbose         Enable verbose logging

Output Format

Slide Filenames

Slides are named with their time range: slide_[index]_[HH:MM:SS]-[HH:MM:SS].jpg

  • Index is zero-padded (at least 3 digits)
  • Timestamps in HH:MM:SS format
  • Example: slide_001_00:02:05-00:04:47.jpg is slide 1 covering 2:05 to 4:47

slides.json Format

JSON file with slides grouped by their transcripts:

{
  "metadata": {
    "video_file": "lecture.mp4",
    "duration_seconds": 3600,
    "processed_at": "2025-01-15T10:30:00Z",
    "model": "large-v3"
  },
  "slides": [
    {
      "slide_number": 0,
      "image_path": "slide_000_00:00:00-00:02:05.jpg",
      "time_start": 0,
      "time_end": 125,
      "transcript": "Welcome to today's lecture on quantum mechanics."
    }
  ]
}

How It Works

  1. Scene Detection: Uses global pixel difference detection (research-based method) to identify slide changes
    • Converts frames to binary (black/white) for robustness to lighting changes
    • Computes normalized pixel differences between consecutive frames
    • Based on "An experimental comparative study on slide change detection in lecture videos" (Eruvaram et al., 2018)
  2. Slide Extraction: Extracts the final frame before each scene change using FFmpeg
  3. Transcription: Uses Whisper large-v3 for state-of-the-art speech-to-text with timestamps
    • Auto-detects and uses MLX on Apple Silicon for 2-3x speedup
    • Falls back to faster-whisper on other platforms
  4. Export: Generates JSON file with slides grouped by their transcript text

Performance

Transcription Speed (Apple Silicon with MLX):

  • 1 hour lecture: ~10-15 minutes (large-v3 model)
  • Without MLX: ~25-35 minutes

Model Recommendations:

  • large-v3: Best accuracy (default) - recommended for production
  • medium: Good balance - 2x faster, slightly lower accuracy
  • base: Quick testing - 5x faster, noticeably lower accuracy
  • tiny: Very fast - 10x faster, lowest accuracy

Limitations

  • Scene detection may need threshold tuning for some videos (default 0.10 works well for most lectures)
  • No speaker diarization
  • No automatic slide deduplication

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linter
ruff check slidegeist/

# Run type checker
mypy slidegeist/

Legal Notice

Slidegeist is provided for educational and research purposes only. Users must ensure they have the legal right to access, download, or process any video files they use with this tool. The author does not endorse or facilitate copyright infringement or violation of platform terms of service.

License

MIT License - Copyright (c) 2025 Christopher Albert

See LICENSE for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slidegeist-2025.10.23.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slidegeist-2025.10.23-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file slidegeist-2025.10.23.tar.gz.

File metadata

  • Download URL: slidegeist-2025.10.23.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for slidegeist-2025.10.23.tar.gz
Algorithm Hash digest
SHA256 816ea3a1c8e45e53ef1bd83307d6c488236ee2b4f5fdaa8a9e3af89bddf7e2d0
MD5 52bfd1e5894f16931a9c19cc8013898b
BLAKE2b-256 29ff24d9d4c949d09c81adb2735c263f4d8d5b363b5eb3d8affa086acb995fde

See more details on using hashes here.

Provenance

The following attestation bundles were made for slidegeist-2025.10.23.tar.gz:

Publisher: release.yml on krystophny/slidegeist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file slidegeist-2025.10.23-py3-none-any.whl.

File metadata

File hashes

Hashes for slidegeist-2025.10.23-py3-none-any.whl
Algorithm Hash digest
SHA256 83c8987f9f3b15fba4fe7177bfe2b434870e26b664a1b99416b37d21a24f3146
MD5 e18263cc1a09d6143e5ca2a11b679a6b
BLAKE2b-256 53ee023fa99bbde3a33256036c00dde4a37b87ec744996a7fc457650989dbe65

See more details on using hashes here.

Provenance

The following attestation bundles were made for slidegeist-2025.10.23-py3-none-any.whl:

Publisher: release.yml on krystophny/slidegeist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page