Pipeline to transform webinar videos into searchable slide and transcript PDFs.

These details have not been verified by PyPI

Project description

Video Lectures to Searchable PDFs

Pipeline for turning webinar-style videos into searchable lecture artifacts:

OCR-driven slide PDF
Whisper transcript PDF
Slide-aligned combined PDF

Requirements

Python: 3.10+
System binaries:
- ffmpeg (for audio + frame extraction)
Hardware:
- CPU-only is supported (default will fall back to CPU).
- GPU (CUDA) is recommended for faster Whisper + OCR if available.

On Ubuntu/Debian, install FFmpeg with:

sudo apt-get update
sudo apt-get install -y ffmpeg

Quickstart

python -m venv .venv && source .venv/bin/activate
pip install -e .
vlsp --help

CLI Usage

vlsp run --type local --source /path/to/webinar.mp4
vlsp run --type youtube --source https://youtu.be/xxxx
vlsp run --type gdrive --source https://drive.google.com/file/d/ID/view

Outputs land in data/processed/<video_id>/.

API Server

uvicorn app.server:app --reload --port 8080

POST payload:

{
  "source_type": "youtube",
  "source": "https://youtu.be/... "
}

Architecture Overview

flowchart LR
    subgraph Ingestion
        SRC[(Video Source)]
        SRC -->|local / youtube / gdrive| DL[Downloader]
    end

    DL --> FF[FFmpeg Extractor]
    FF -->|audio| WHISPER[faster-whisper]
    FF -->|frames| OCR[PaddleOCR and optional VLM captions]

    WHISPER --> ALIGN[Slide/Text Aligner]
    OCR --> ALIGN

    ALIGN --> PDFGEN[ReportLab / PyPDF Builder]
    PDFGEN --> OUT[Searchable PDFs]

    OUT -->|persist| STORE[data/processed/<video_id>]
    ALIGN -->|serve| API[(FastAPI + Typer CLI)]

The CLI (vlsp) and FastAPI server share the same pipeline, so you can drive the workflow via command line, HTTP, or by importing the pipeline directly in Python.

End-to-End Workflow

Ingestion: Video is pulled from the specified target (local, youtube, or gdrive). Metadata such as ID, title, and duration is captured for downstream file naming.
Media Extraction: FFmpeg splits the video into a high-quality WAV track and evenly spaced video frames with timestamps.
Speech + Slide Text Understanding:
- faster-whisper produces bilingual-friendly transcripts and per-segment timestamps.
- PaddleOCR extracts slide text from frames.
- (Optional) A vision-language model (e.g. BLIP / LLaVA) can generate rich slide captions; this is disabled by default to keep VRAM usage modest.
Alignment: Transcript chunks are matched to their corresponding slide frames using temporal overlap and cosine similarity on embeddings.
PDF Generation:
- OCR-driven slide PDF for crisp slide reproduction with searchable overlays.
- Whisper transcript PDF containing time-linked dialogues.
- Combined PDF merges slides and transcripts per page for study-ready notes.
Delivery: Artifacts are written to data/processed/<video_id>/ and optionally surfaced via the FastAPI endpoint.

Component Details

Multi-source ingestion (local path, YouTube URL, Google Drive URL)
Media extraction via FFmpeg (audio WAV + timestamped frames)
GPU-friendly AI models:
- faster-whisper (configurable checkpoint)
- PaddleOCR for slide OCR
- Optional VLM (BLIP / LLaVA via 🤗 Transformers) for dense slide captions
PDF creation using ReportLab + PyPDF
Slide-by-slide synchronization with transcript blocks
FastAPI service & Typer CLI orchestrating the workflow

See docs/models.md for recommended checkpoints and VRAM needs.

Configuration

All runtime settings are driven by a Pydantic Settings model and can be overridden via environment variables:

Model selection:
- MODELS__whisper_model – e.g. small, medium, large-v3 (default: medium).
- MODELS__vlm_model – set to a HF model id (e.g. Salesforce/blip-image-captioning-base) to enable captions, or "none" (default) to skip VLM entirely.
- MODELS__device – cuda or cpu (default: cuda, will fall back to CPU if GPU is not available).
Storage paths:
- PATHS__root – project root (default: cwd).
- PATHS__raw_dir, PATHS__processed_dir, PATHS__temp_dir – override data directories if needed.
Binaries:
- FFMPEG_BINARY – override the ffmpeg executable name/path if it is not on PATH.

By default the system runs with VLM captions off, uses ffmpeg from your PATH, and writes results under data/processed/<video_id>/.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Nov 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

video_lectures_to_searchable_pdf-0.1.0.tar.gz (15.9 kB view details)

Uploaded Nov 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

video_lectures_to_searchable_pdf-0.1.0-py3-none-any.whl (17.9 kB view details)

Uploaded Nov 26, 2025 Python 3

File details

Details for the file video_lectures_to_searchable_pdf-0.1.0.tar.gz.

File metadata

Download URL: video_lectures_to_searchable_pdf-0.1.0.tar.gz
Upload date: Nov 26, 2025
Size: 15.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for video_lectures_to_searchable_pdf-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f97ad9a81ff6c52cf3d2cba34b5dac170fb04e0901c492fb961c94a9f766646b`
MD5	`a97a73843c5bd0c9600950361fec317c`
BLAKE2b-256	`66c2862ae773d6f447a6d25b68d46070e1654eec4e662cbc0f32b1daf77cc756`

See more details on using hashes here.

File details

Details for the file video_lectures_to_searchable_pdf-0.1.0-py3-none-any.whl.

File metadata

Download URL: video_lectures_to_searchable_pdf-0.1.0-py3-none-any.whl
Upload date: Nov 26, 2025
Size: 17.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for video_lectures_to_searchable_pdf-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`29d20a4bb1848482b3dcad29d864e7cf18792981673b4a81ecf7a7e8a8a1941c`
MD5	`f23ce6b4477ef02a61ae41d12fb89c9c`
BLAKE2b-256	`bcfe0e5001cfa17cbb3cdc57f0a3b58f660b00d950b7e6f30f8a0f2b8bf50dd9`

See more details on using hashes here.

video-lectures-to-searchable-pdf 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Video Lectures to Searchable PDFs

Requirements

Quickstart

CLI Usage

API Server

Architecture Overview

End-to-End Workflow

Component Details

Configuration

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes